linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel
@ 2018-09-14 16:22 Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 01/20] asm: simd context helper API Jason A. Donenfeld
                   ` (19 more replies)
  0 siblings, 20 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh; +Cc: Jason A. Donenfeld

Changes v3->v4:
  - Remove mistaken double 07/17 patch.
  - Fix whitespace issues in blake2s assembly.
  - It's not possible to put compound literals into __initconst, so
    we now instead just use boring fixed size struct members.
  - Move away from makefile ifdef maze and instead prefer kconfig values,
    which also makes the design a bit more modular too, which could help
    in the future.
  - Port old crypto API implementations (ChaCha20 and Poly1305) to Zinc.
  - Port security/keys/big_key to Zinc as second example of a good usage of
    Zinc.
  - Document precisely what is different between the kernel code and
    CRYPTOGAMS code when the CRYPTOGAMS code is used.
  - Move changelog to top of 00/20 message so that people can
    actually find it.

-----------------------------------------------------------

This patchset is available on git.kernel.org in this branch, where it may be
pulled directly for inclusion into net-next:

  * https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/linux.git/log/?h=jd/wireguard

-----------------------------------------------------------

WireGuard is a secure network tunnel written especially for Linux, which
has faced around three years of serious development, deployment, and
scrutiny. It delivers excellent performance and is extremely easy to
use and configure. It has been designed with the primary goal of being
both easy to audit by virtue of being small and highly secure from a
cryptography and systems security perspective. WireGuard is used by some
massive companies pushing enormous amounts of traffic, and likely
already today you've consumed bytes that at some point transited through
a WireGuard tunnel. Even as an out-of-tree module, WireGuard has been
integrated into various userspace tools, Linux distributions, mobile
phones, and data centers. There are ports in several languages to
several operating systems, and even commercial hardware and services
sold integrating WireGuard. It is time, therefore, for WireGuard to be
properly integrated into Linux.

Ample information, including documentation, installation instructions,
and project details, is available at:

  * https://www.wireguard.com/
  * https://www.wireguard.com/papers/wireguard.pdf

As it is currently an out-of-tree module, it lives in its own git repo
and has its own mailing list, and every commit for the module is tested
against every stable kernel since 3.10 on a variety of architectures
using an extensive test suite:

  * https://git.zx2c4.com/WireGuard
    https://git.kernel.org/pub/scm/linux/kernel/git/zx2c4/WireGuard.git/
  * https://lists.zx2c4.com/mailman/listinfo/wireguard
  * https://www.wireguard.com/build-status/

The project has been broadly discussed at conferences, and was presented
to the Netdev developers in Seoul last November, where a paper was
released detailing some interesting aspects of the project. Dave asked
me after the talk if I would consider sending in a v1 "sooner rather
than later", hence this patchset. A decision is still waiting from the
Linux Plumbers Conference, but an update on these topics may be presented
in Vancouver in a few months. Prior presentations:

  * https://www.wireguard.com/presentations/
  * https://www.wireguard.com/papers/wireguard-netdev22.pdf

The cryptography in the protocol itself has been formally verified by
several independent academic teams with positive results, and I know of
two additional efforts on their way to further corroborate those
findings. The version 1 protocol is "complete", and so the purpose of
this review is to assess the implementation of the protocol. However, it
still may be of interest to know that the thing you're reviewing uses a
protocol with various nice security properties:

  * https://www.wireguard.com/formal-verification/

This patchset is divided into four segments. The first introduces a very
simple helper for working with the FPU state for the purposes of amortizing
SIMD operations. The second segment is a small collection of cryptographic
primitives, split up into several commits by primitive and by hardware. The
third shows usage of Zinc within the existing crypto API and as a replacement
to the existing crypto API. The last is WireGuard itself, presented as an
unintrusive and self-contained virtual network driver.

It is intended that this entire patch series enter the kernel through
DaveM's net-next tree. Subsequently, WireGuard patches will go through
DaveM's net-next tree, while Zinc patches will go through Greg KH's tree.

Enjoy,
Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 01/20] asm: simd context helper API
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 02/20] zinc: introduce minimal cryptography library Jason A. Donenfeld
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Thomas Gleixner, linux-arch

Sometimes it's useful to amortize calls to XSAVE/XRSTOR and the related
FPU/SIMD functions over a number of calls, because FPU restoration is
quite expensive. This adds a simple header for carrying out this pattern:

    simd_context_t simd_context = simd_get();
    while ((item = get_item_from_queue()) != NULL) {
        encrypt_item(item, simd_context);
        simd_context = simd_relax(simd_context);
    }
    simd_put(simd_context);

The relaxation step ensures that we don't trample over preemption, and
the get/put API should be a familiar paradigm in the kernel.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: linux-arch@vger.kernel.org
---
 arch/alpha/include/asm/Kbuild      |  5 ++--
 arch/arc/include/asm/Kbuild        |  1 +
 arch/arm/include/asm/simd.h        | 42 ++++++++++++++++++++++++++++++
 arch/arm64/include/asm/simd.h      | 37 +++++++++++++++++++++-----
 arch/c6x/include/asm/Kbuild        |  3 ++-
 arch/h8300/include/asm/Kbuild      |  3 ++-
 arch/hexagon/include/asm/Kbuild    |  1 +
 arch/ia64/include/asm/Kbuild       |  1 +
 arch/m68k/include/asm/Kbuild       |  1 +
 arch/microblaze/include/asm/Kbuild |  1 +
 arch/mips/include/asm/Kbuild       |  1 +
 arch/nds32/include/asm/Kbuild      |  7 ++---
 arch/nios2/include/asm/Kbuild      |  1 +
 arch/openrisc/include/asm/Kbuild   |  7 ++---
 arch/parisc/include/asm/Kbuild     |  1 +
 arch/powerpc/include/asm/Kbuild    |  3 ++-
 arch/riscv/include/asm/Kbuild      |  3 ++-
 arch/s390/include/asm/Kbuild       |  3 ++-
 arch/sh/include/asm/Kbuild         |  1 +
 arch/sparc/include/asm/Kbuild      |  1 +
 arch/um/include/asm/Kbuild         |  3 ++-
 arch/unicore32/include/asm/Kbuild  |  1 +
 arch/x86/include/asm/simd.h        | 30 ++++++++++++++++++++-
 arch/xtensa/include/asm/Kbuild     |  1 +
 include/asm-generic/simd.h         | 15 +++++++++++
 include/linux/simd.h               | 28 ++++++++++++++++++++
 26 files changed, 180 insertions(+), 21 deletions(-)
 create mode 100644 arch/arm/include/asm/simd.h
 create mode 100644 include/linux/simd.h

diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild
index 0580cb8c84b2..07b2c1025d34 100644
--- a/arch/alpha/include/asm/Kbuild
+++ b/arch/alpha/include/asm/Kbuild
@@ -2,14 +2,15 @@
 
 
 generic-y += compat.h
+generic-y += current.h
 generic-y += exec.h
 generic-y += export.h
 generic-y += fb.h
 generic-y += irq_work.h
+generic-y += kprobes.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += preempt.h
 generic-y += sections.h
+generic-y += simd.h
 generic-y += trace_clock.h
-generic-y += current.h
-generic-y += kprobes.h
diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild
index feed50ce89fa..a7f4255f1649 100644
--- a/arch/arc/include/asm/Kbuild
+++ b/arch/arc/include/asm/Kbuild
@@ -22,6 +22,7 @@ generic-y += parport.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
+generic-y += simd.h
 generic-y += topology.h
 generic-y += trace_clock.h
 generic-y += user.h
diff --git a/arch/arm/include/asm/simd.h b/arch/arm/include/asm/simd.h
new file mode 100644
index 000000000000..bf468993bbef
--- /dev/null
+++ b/arch/arm/include/asm/simd.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <linux/simd.h>
+#ifndef _ASM_SIMD_H
+#define _ASM_SIMD_H
+
+static __must_check inline bool may_use_simd(void)
+{
+	return !in_interrupt();
+}
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+#include <asm/neon.h>
+
+static inline simd_context_t simd_get(void)
+{
+	bool have_simd = may_use_simd();
+	if (have_simd)
+		kernel_neon_begin();
+	return have_simd ? HAVE_FULL_SIMD : HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t prior_context)
+{
+	if (prior_context != HAVE_NO_SIMD)
+		kernel_neon_end();
+}
+#else
+static inline simd_context_t simd_get(void)
+{
+	return HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t prior_context)
+{
+}
+#endif
+
+#endif /* _ASM_SIMD_H */
diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
index 6495cc51246f..058c336de38d 100644
--- a/arch/arm64/include/asm/simd.h
+++ b/arch/arm64/include/asm/simd.h
@@ -1,11 +1,10 @@
-/*
- * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+/* SPDX-License-Identifier: GPL-2.0
  *
- * This program is free software; you can redistribute it and/or modify it
- * under the terms of the GNU General Public License version 2 as published
- * by the Free Software Foundation.
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
  */
 
+#include <linux/simd.h>
 #ifndef __ASM_SIMD_H
 #define __ASM_SIMD_H
 
@@ -16,6 +15,8 @@
 #include <linux/types.h>
 
 #ifdef CONFIG_KERNEL_MODE_NEON
+#include <asm/neon.h>
+#include <asm/simd.h>
 
 DECLARE_PER_CPU(bool, kernel_neon_busy);
 
@@ -40,12 +41,36 @@ static __must_check inline bool may_use_simd(void)
 		!this_cpu_read(kernel_neon_busy);
 }
 
+static inline simd_context_t simd_get(void)
+{
+	bool have_simd = may_use_simd();
+	if (have_simd)
+		kernel_neon_begin();
+	return have_simd ? HAVE_FULL_SIMD : HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t prior_context)
+{
+	if (prior_context != HAVE_NO_SIMD)
+		kernel_neon_end();
+}
+
 #else /* ! CONFIG_KERNEL_MODE_NEON */
 
-static __must_check inline bool may_use_simd(void) {
+static __must_check inline bool may_use_simd(void)
+{
 	return false;
 }
 
+static inline simd_context_t simd_get(void)
+{
+	return HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t prior_context)
+{
+}
+
 #endif /* ! CONFIG_KERNEL_MODE_NEON */
 
 #endif
diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild
index 33a2c94fed0d..22f3d8333c74 100644
--- a/arch/c6x/include/asm/Kbuild
+++ b/arch/c6x/include/asm/Kbuild
@@ -5,8 +5,8 @@ generic-y += compat.h
 generic-y += current.h
 generic-y += device.h
 generic-y += div64.h
-generic-y += dma.h
 generic-y += dma-mapping.h
+generic-y += dma.h
 generic-y += emergency-restart.h
 generic-y += exec.h
 generic-y += extable.h
@@ -30,6 +30,7 @@ generic-y += pgalloc.h
 generic-y += preempt.h
 generic-y += segment.h
 generic-y += serial.h
+generic-y += simd.h
 generic-y += tlbflush.h
 generic-y += topology.h
 generic-y += trace_clock.h
diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild
index a5d0b2991f47..f5c2f12d593e 100644
--- a/arch/h8300/include/asm/Kbuild
+++ b/arch/h8300/include/asm/Kbuild
@@ -8,8 +8,8 @@ generic-y += current.h
 generic-y += delay.h
 generic-y += device.h
 generic-y += div64.h
-generic-y += dma.h
 generic-y += dma-mapping.h
+generic-y += dma.h
 generic-y += emergency-restart.h
 generic-y += exec.h
 generic-y += extable.h
@@ -39,6 +39,7 @@ generic-y += preempt.h
 generic-y += scatterlist.h
 generic-y += sections.h
 generic-y += serial.h
+generic-y += simd.h
 generic-y += sizes.h
 generic-y += spinlock.h
 generic-y += timex.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index dd2fd9c0d292..217d4695fd8a 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -29,6 +29,7 @@ generic-y += rwsem.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
+generic-y += simd.h
 generic-y += sizes.h
 generic-y += topology.h
 generic-y += trace_clock.h
diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 557bbc8ba9f5..41c5ebdf79e5 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -4,6 +4,7 @@ generic-y += irq_work.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += preempt.h
+generic-y += simd.h
 generic-y += trace_clock.h
 generic-y += vtime.h
 generic-y += word-at-a-time.h
diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild
index a4b8d3331a9e..73898dd1a4d0 100644
--- a/arch/m68k/include/asm/Kbuild
+++ b/arch/m68k/include/asm/Kbuild
@@ -19,6 +19,7 @@ generic-y += mm-arch-hooks.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += sections.h
+generic-y += simd.h
 generic-y += spinlock.h
 generic-y += topology.h
 generic-y += trace_clock.h
diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild
index 569ba9e670c1..7a877eea99d3 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -25,6 +25,7 @@ generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
 generic-y += serial.h
+generic-y += simd.h
 generic-y += syscalls.h
 generic-y += topology.h
 generic-y += trace_clock.h
diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild
index 58351e48421e..e8868e0fb2c3 100644
--- a/arch/mips/include/asm/Kbuild
+++ b/arch/mips/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += qrwlock.h
 generic-y += qspinlock.h
 generic-y += sections.h
 generic-y += segment.h
+generic-y += simd.h
 generic-y += trace_clock.h
 generic-y += unaligned.h
 generic-y += user.h
diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild
index dbc4e5422550..603c1d020620 100644
--- a/arch/nds32/include/asm/Kbuild
+++ b/arch/nds32/include/asm/Kbuild
@@ -7,14 +7,14 @@ generic-y += bug.h
 generic-y += bugs.h
 generic-y += checksum.h
 generic-y += clkdev.h
-generic-y += cmpxchg.h
 generic-y += cmpxchg-local.h
+generic-y += cmpxchg.h
 generic-y += compat.h
 generic-y += cputime.h
 generic-y += device.h
 generic-y += div64.h
-generic-y += dma.h
 generic-y += dma-mapping.h
+generic-y += dma.h
 generic-y += emergency-restart.h
 generic-y += errno.h
 generic-y += exec.h
@@ -46,14 +46,15 @@ generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
 generic-y += shmbuf.h
+generic-y += simd.h
 generic-y += sizes.h
 generic-y += stat.h
 generic-y += switch_to.h
 generic-y += timex.h
 generic-y += topology.h
 generic-y += trace_clock.h
-generic-y += xor.h
 generic-y += unaligned.h
 generic-y += user.h
 generic-y += vga.h
 generic-y += word-at-a-time.h
+generic-y += xor.h
diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild
index 8fde4fa2c34f..571a9d9ad107 100644
--- a/arch/nios2/include/asm/Kbuild
+++ b/arch/nios2/include/asm/Kbuild
@@ -33,6 +33,7 @@ generic-y += preempt.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
+generic-y += simd.h
 generic-y += spinlock.h
 generic-y += topology.h
 generic-y += trace_clock.h
diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild
index eb87cd8327c8..5e9f2f4c4d39 100644
--- a/arch/openrisc/include/asm/Kbuild
+++ b/arch/openrisc/include/asm/Kbuild
@@ -28,12 +28,13 @@ generic-y += module.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += qspinlock_types.h
-generic-y += qspinlock.h
-generic-y += qrwlock_types.h
 generic-y += qrwlock.h
+generic-y += qrwlock_types.h
+generic-y += qspinlock.h
+generic-y += qspinlock_types.h
 generic-y += sections.h
 generic-y += segment.h
+generic-y += simd.h
 generic-y += string.h
 generic-y += switch_to.h
 generic-y += topology.h
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 2013d639e735..97970b4d05ab 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -17,6 +17,7 @@ generic-y += percpu.h
 generic-y += preempt.h
 generic-y += seccomp.h
 generic-y += segment.h
+generic-y += simd.h
 generic-y += topology.h
 generic-y += trace_clock.h
 generic-y += user.h
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 3196d227e351..64290f48e733 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -4,7 +4,8 @@ generic-y += irq_regs.h
 generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
+generic-y += msi.h
 generic-y += preempt.h
 generic-y += rwsem.h
+generic-y += simd.h
 generic-y += vtime.h
-generic-y += msi.h
diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
index efdbe311e936..6669b7374c0a 100644
--- a/arch/riscv/include/asm/Kbuild
+++ b/arch/riscv/include/asm/Kbuild
@@ -5,9 +5,9 @@ generic-y += compat.h
 generic-y += cputime.h
 generic-y += device.h
 generic-y += div64.h
-generic-y += dma.h
 generic-y += dma-contiguous.h
 generic-y += dma-mapping.h
+generic-y += dma.h
 generic-y += emergency-restart.h
 generic-y += errno.h
 generic-y += exec.h
@@ -46,6 +46,7 @@ generic-y += setup.h
 generic-y += shmbuf.h
 generic-y += shmparam.h
 generic-y += signal.h
+generic-y += simd.h
 generic-y += socket.h
 generic-y += sockios.h
 generic-y += stat.h
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index e3239772887a..7a26dc6ce815 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -7,9 +7,9 @@ generated-y += unistd_nr.h
 generic-y += asm-offsets.h
 generic-y += cacheflush.h
 generic-y += device.h
+generic-y += div64.h
 generic-y += dma-contiguous.h
 generic-y += dma-mapping.h
-generic-y += div64.h
 generic-y += emergency-restart.h
 generic-y += export.h
 generic-y += fb.h
@@ -22,6 +22,7 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += preempt.h
 generic-y += rwsem.h
+generic-y += simd.h
 generic-y += trace_clock.h
 generic-y += unaligned.h
 generic-y += word-at-a-time.h
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 6a5609a55965..8e64ff35a933 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += percpu.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += serial.h
+generic-y += simd.h
 generic-y += sizes.h
 generic-y += trace_clock.h
 generic-y += xor.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 410b263ef5c8..72b9e08fb350 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -17,5 +17,6 @@ generic-y += msi.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += serial.h
+generic-y += simd.h
 generic-y += trace_clock.h
 generic-y += word-at-a-time.h
diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild
index b10dde6cb793..d37288b08dd2 100644
--- a/arch/um/include/asm/Kbuild
+++ b/arch/um/include/asm/Kbuild
@@ -16,15 +16,16 @@ generic-y += io.h
 generic-y += irq_regs.h
 generic-y += irq_work.h
 generic-y += kdebug.h
+generic-y += kprobes.h
 generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
+generic-y += simd.h
 generic-y += switch_to.h
 generic-y += topology.h
 generic-y += trace_clock.h
 generic-y += word-at-a-time.h
 generic-y += xor.h
-generic-y += kprobes.h
diff --git a/arch/unicore32/include/asm/Kbuild b/arch/unicore32/include/asm/Kbuild
index bfc7abe77905..98a908720bbd 100644
--- a/arch/unicore32/include/asm/Kbuild
+++ b/arch/unicore32/include/asm/Kbuild
@@ -27,6 +27,7 @@ generic-y += preempt.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
+generic-y += simd.h
 generic-y += sizes.h
 generic-y += syscalls.h
 generic-y += topology.h
diff --git a/arch/x86/include/asm/simd.h b/arch/x86/include/asm/simd.h
index a341c878e977..79411178988a 100644
--- a/arch/x86/include/asm/simd.h
+++ b/arch/x86/include/asm/simd.h
@@ -1,4 +1,11 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <linux/simd.h>
+#ifndef _ASM_SIMD_H
+#define _ASM_SIMD_H
 
 #include <asm/fpu/api.h>
 
@@ -10,3 +17,24 @@ static __must_check inline bool may_use_simd(void)
 {
 	return irq_fpu_usable();
 }
+
+static inline simd_context_t simd_get(void)
+{
+	bool have_simd = false;
+#if !defined(CONFIG_UML)
+	have_simd = may_use_simd();
+	if (have_simd)
+		kernel_fpu_begin();
+#endif
+	return have_simd ? HAVE_FULL_SIMD : HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t prior_context)
+{
+#if !defined(CONFIG_UML)
+	if (prior_context != HAVE_NO_SIMD)
+		kernel_fpu_end();
+#endif
+}
+
+#endif /* _ASM_SIMD_H */
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index 82c756431b49..7950f359649d 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -24,6 +24,7 @@ generic-y += percpu.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += sections.h
+generic-y += simd.h
 generic-y += topology.h
 generic-y += trace_clock.h
 generic-y += word-at-a-time.h
diff --git a/include/asm-generic/simd.h b/include/asm-generic/simd.h
index d0343d58a74a..fad899a5a92d 100644
--- a/include/asm-generic/simd.h
+++ b/include/asm-generic/simd.h
@@ -1,5 +1,9 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
+#include <linux/simd.h>
+#ifndef _ASM_SIMD_H
+#define _ASM_SIMD_H
+
 #include <linux/hardirq.h>
 
 /*
@@ -13,3 +17,14 @@ static __must_check inline bool may_use_simd(void)
 {
 	return !in_interrupt();
 }
+
+static inline simd_context_t simd_get(void)
+{
+	return HAVE_NO_SIMD;
+}
+
+static inline void simd_put(simd_context_t prior_context)
+{
+}
+
+#endif /* _ASM_SIMD_H */
diff --git a/include/linux/simd.h b/include/linux/simd.h
new file mode 100644
index 000000000000..f62d047188bf
--- /dev/null
+++ b/include/linux/simd.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _SIMD_H
+#define _SIMD_H
+
+typedef enum {
+	HAVE_NO_SIMD,
+	HAVE_FULL_SIMD
+} simd_context_t;
+
+#include <linux/sched.h>
+#include <asm/simd.h>
+
+static inline simd_context_t simd_relax(simd_context_t prior_context)
+{
+#ifdef CONFIG_PREEMPT
+	if (prior_context != HAVE_NO_SIMD && need_resched()) {
+		simd_put(prior_context);
+		return simd_get();
+	}
+#endif
+	return prior_context;
+}
+
+#endif /* _SIMD_H */
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 02/20] zinc: introduce minimal cryptography library
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 01/20] asm: simd context helper API Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 03/20] zinc: ChaCha20 generic C implementation Jason A. Donenfeld
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson

Zinc stands for "Zinc Is Neat Crypto" or "Zinc as IN Crypto" or maybe
just "Zx2c4's INsane Cryptolib." It's also short, easy to type, and
plays nicely with the recent trend of naming crypto libraries after
elements. The guiding principle is "don't overdo it". It's less of a
library and more of a directory tree for organizing well-curated direct
implementations of cryptography primitives.

Zinc is a new cryptography API that is much more minimal and lower-level
than the current one. It intends to complement it and provide a basis
upon which the current crypto API might build, as the provider of
software implementations of cryptographic primitives. It is motivated by
three primary observations in crypto API design:

  * Highly composable "cipher modes" and related abstractions from
    90s cryptographers did not turn out to be as terrific an idea as
    hoped, leading to a host of API misuse problems.

  * Most programmers are afraid of crypto code, and so prefer to
    integrate it into libraries in a highly abstracted manner, so as to
    shield themselves from implementation details. Cryptographers, on
    the other hand, prefer simple direct implementations, which they're
    able to verify for high assurance and optimize in accordance with
    their expertise.

  * Overly abstracted and flexible cryptography APIs lead to a host of
    dangerous problems and performance issues. The kernel is in the
    business usually not of coming up with new uses of crypto, but
    rather implementing various constructions, which means it essentially
    needs a library of primitives, not a highly abstracted enterprise-ready
    pluggable system, with a few particular exceptions.

This last observation has seen itself play out several times over and
over again within the kernel:

  * The perennial move of actual primitives away from crypto/ and into
    lib/, so that users can actually call these functions directly with
    no overhead and without lots of allocations, function pointers,
    string specifier parsing, and general clunkiness. For example:
    sha256, chacha20, siphash, sha1, and so forth live in lib/ rather
    than in crypto/. Zinc intends to stop the cluttering of lib/ and
    introduce these direct primitives into their proper place, lib/zinc/.

  * An abundance of misuse bugs with the present crypto API that have
    been very unpleasant to clean up.

  * A hesitance to even use cryptography, because of the overhead and
    headaches involved in accessing the routines.

Zinc goes in a rather different direction. Rather than providing a
thoroughly designed and abstracted API, Zinc gives you simple functions,
which implement some primitive, or some particular and specific
construction of primitives. It is not dynamic in the least, though one
could imagine implementing a complex dynamic dispatch mechanism (such as
the current crypto API) on top of these basic functions. After all,
dynamic dispatch is usually needed for applications with cipher agility,
such as IPsec, dm-crypt, AF_ALG, and so forth, and the existing crypto
API will continue to play that role. However, Zinc will provide a non-
haphazard way of directly utilizing crypto routines in applications
that do have neither the need nor desire for abstraction and dynamic
dispatch.

It also organizes the implementations in a simple, straight-forward,
and direct manner, making it enjoyable and intuitive to work on.
Rather than moving optimized assembly implementations into arch/, it
keeps them all together in lib/zinc/, making it simple and obvious to
compare and contrast what's happening. This is, notably, exactly what
the lib/raid6/ tree does, and that seems to work out rather well. It's
also the pattern of most successful crypto libraries. The architecture-
specific glue-code is made a part of each translation unit, rather than
being in a separate one, so that generic and architecture-optimized code
are combined at compile-time, and incompatibility branches compiled out by
the optimizer.

All implementations have been extensively tested and fuzzed, and are
selected for their quality, trustworthiness, and performance. Wherever
possible and performant, formally verified implementations are used,
such as those from HACL* [1] and Fiat-Crypto [2]. The routines also take
special care to zero out secrets using memzero_explicit (and future work
is planned to have gcc do this more reliably and performantly with
compiler plugins). The performance of the selected implementations is
state-of-the-art and unrivaled on a broad array of hardware, though of
course we will continue to fine tune these to the hardware demands
needed by kernel contributors. Each implementation also comes with
extensive self-tests and crafted test vectors, pulled from various
places such as Wycheproof [9].

Regularity of function signatures is important, so that users can easily
"guess" the name of the function they want. Though, individual
primitives are oftentimes not trivially interchangeable, having been
designed for different things and requiring different parameters and
semantics, and so the function signatures they provide will directly
reflect the realities of the primitives' usages, rather than hiding it
behind (inevitably leaky) abstractions. Also, in contrast to the current
crypto API, Zinc functions can work on stack buffers, and can be called
with different keys, without requiring allocations or locking.

SIMD is used automatically when available, though some routines may
benefit from either having their SIMD disabled for particular
invocations, or to have the SIMD initialization calls amortized over
several invocations of the function, and so Zinc utilizes function
signatures enabling that in conjunction with the recently introduced
simd_context_t.

More generally, Zinc provides function signatures that allow just what
is required by the various callers. This isn't to say that users of the
functions will be permitted to pollute the function semantics with weird
particular needs, but we are trying very hard not to overdo it, and that
means looking carefully at what's actually necessary, and doing just that,
and not much more than that. Remember: practicality and cleanliness rather
than over-zealous infrastructure.

Zinc provides also an opening for the best implementers in academia to
contribute their time and effort to the kernel, by being sufficiently
simple and inviting. In discussing this commit with some of the best and
brightest over the last few years, there are many who are eager to
devote rare talent and energy to this effort.

Following the merging of this, I expect for the primitives that
currently exist in lib/ to work their way into lib/zinc/, after intense
scrutiny of each implementation, potentially replacing them with either
formally-verified implementations, or better studied and faster
state-of-the-art implementations.

Also following the merging of this, I expect for the old crypto API
implementations to be ported over to use Zinc for their software-based
implementations.

As Zinc is simply library code, its config options are un-menued, with
the exception of CONFIG_ZINC_DEBUG, which enables various selftests and
BUG_ONs.

[1] https://github.com/project-everest/hacl-star
[2] https://github.com/mit-plv/fiat-crypto
[3] https://cr.yp.to/ecdh.html
[4] https://cr.yp.to/chacha.html
[5] https://cr.yp.to/snuffle/xsalsa-20081128.pdf
[6] https://cr.yp.to/mac.html
[7] https://blake2.net/
[8] https://tools.ietf.org/html/rfc8439
[9] https://github.com/google/wycheproof

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
 MAINTAINERS       |  8 ++++++++
 lib/Kconfig       |  2 ++
 lib/Makefile      |  2 ++
 lib/zinc/Kconfig  | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/zinc/Makefile |  8 ++++++++
 lib/zinc/main.c   | 31 +++++++++++++++++++++++++++++++
 6 files changed, 97 insertions(+)
 create mode 100644 lib/zinc/Kconfig
 create mode 100644 lib/zinc/Makefile
 create mode 100644 lib/zinc/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2ef884b883c3..d2092e52320d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16160,6 +16160,14 @@ Q:	https://patchwork.linuxtv.org/project/linux-media/list/
 S:	Maintained
 F:	drivers/media/dvb-frontends/zd1301_demod*
 
+ZINC CRYPTOGRAPHY LIBRARY
+M:	Jason A. Donenfeld <Jason@zx2c4.com>
+M:	Samuel Neves <sneves@dei.uc.pt>
+S:	Maintained
+F:	lib/zinc/
+F:	include/zinc/
+L:	linux-crypto@vger.kernel.org
+
 ZPOOL COMPRESSED PAGE STORAGE API
 M:	Dan Streetman <ddstreet@ieee.org>
 L:	linux-mm@kvack.org
diff --git a/lib/Kconfig b/lib/Kconfig
index a3928d4438b5..3e6848269c66 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -485,6 +485,8 @@ config GLOB_SELFTEST
 	  module load) by a small amount, so you're welcome to play with
 	  it, but you probably don't need it.
 
+source "lib/zinc/Kconfig"
+
 #
 # Netlink attribute parsing support is select'ed if needed
 #
diff --git a/lib/Makefile b/lib/Makefile
index ca3f7ebb900d..3f16e35d2c11 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -214,6 +214,8 @@ obj-$(CONFIG_PERCPU_TEST) += percpu_test.o
 
 obj-$(CONFIG_ASN1) += asn1_decoder.o
 
+obj-$(CONFIG_ZINC) += zinc/
+
 obj-$(CONFIG_FONT_SUPPORT) += fonts/
 
 obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
new file mode 100644
index 000000000000..5980c411af0d
--- /dev/null
+++ b/lib/zinc/Kconfig
@@ -0,0 +1,46 @@
+config ZINC
+	tristate
+
+config ZINC_DEBUG
+	bool "Zinc cryptography library debugging and self-tests"
+	depends on ZINC
+	help
+	  This builds a series of self-tests for the Zinc crypto library, which
+	  help diagnose any cryptographic algorithm implementation issues that
+	  might be at the root cause of potential bugs. It also adds various
+	  debugging traps.
+
+	  Unless you're developing and testing cryptographic routines, or are
+	  especially paranoid about correctness on your hardware, you may say
+	  N here.
+
+config ZINC_ARCH_ARM
+	def_bool y
+	depends on ARM
+	depends on ZINC
+	imply VFP
+	imply VFPv3 if CPU_V7
+	imply NEON if CPU_V7
+	imply KERNEL_MODE_NEON if CPU_V7
+
+config ZINC_ARCH_ARM64
+	def_bool y
+	depends on ARM64
+	depends on ZINC
+
+config ZINC_ARCH_X86_64
+	def_bool y
+	depends on X86_64
+	depends on !UML
+	depends on ZINC
+
+config ZINC_ARCH_MIPS
+	def_bool y
+	depends on MIPS
+	depends on ZINC
+
+config ZINC_ARCH_MIPS64
+	def_bool y
+	depends on MIPS
+	depends on 64BIT
+	depends on ZINC
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
new file mode 100644
index 000000000000..dad47573de42
--- /dev/null
+++ b/lib/zinc/Makefile
@@ -0,0 +1,8 @@
+ccflags-y := -O3
+ccflags-y += -Wframe-larger-than=8192
+ccflags-y += -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt'
+ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
+
+zinc-y += main.o
+
+obj-$(CONFIG_ZINC) := zinc.o
diff --git a/lib/zinc/main.c b/lib/zinc/main.c
new file mode 100644
index 000000000000..ceece33ff5a7
--- /dev/null
+++ b/lib/zinc/main.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+
+#ifdef DEBUG
+#define selftest(which) do { \
+	if (!which ## _selftest()) \
+		return -ENOTRECOVERABLE; \
+} while (0)
+#else
+#define selftest(which)
+#endif
+
+static int __init mod_init(void)
+{
+	return 0;
+}
+
+static void __exit mod_exit(void)
+{
+}
+
+module_init(mod_init);
+module_exit(mod_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Zinc cryptography library");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 03/20] zinc: ChaCha20 generic C implementation
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 01/20] asm: simd context helper API Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 02/20] zinc: introduce minimal cryptography library Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 04/20] zinc: ChaCha20 ARM and ARM64 implementations Jason A. Donenfeld
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson

This implements the ChaCha20 permutation as a single C statement, by way
of the comma operator, which the compiler is able to simplify
terrifically.

Information: https://cr.yp.to/chacha.html

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
 include/zinc/chacha20.h      |  54 +++++++++++
 lib/zinc/Kconfig             |   5 ++
 lib/zinc/Makefile            |   4 +
 lib/zinc/chacha20/chacha20.c | 168 +++++++++++++++++++++++++++++++++++
 lib/zinc/main.c              |   5 ++
 5 files changed, 236 insertions(+)
 create mode 100644 include/zinc/chacha20.h
 create mode 100644 lib/zinc/chacha20/chacha20.c

diff --git a/include/zinc/chacha20.h b/include/zinc/chacha20.h
new file mode 100644
index 000000000000..3c2c2f72d88a
--- /dev/null
+++ b/include/zinc/chacha20.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_CHACHA20_H
+#define _ZINC_CHACHA20_H
+
+#include <asm/unaligned.h>
+#include <linux/simd.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+enum {
+	CHACHA20_IV_SIZE = 16,
+	CHACHA20_KEY_SIZE = 32,
+	CHACHA20_BLOCK_SIZE = 64,
+	CHACHA20_BLOCK_WORDS = CHACHA20_BLOCK_SIZE / sizeof(u32),
+	HCHACHA20_KEY_SIZE = 32,
+	HCHACHA20_NONCE_SIZE = 16
+};
+
+struct chacha20_ctx {
+	u32 key[8];
+	u32 counter[4];
+} __aligned(32);
+
+void chacha20_fpu_init(void);
+
+static inline void chacha20_init(struct chacha20_ctx *state,
+				 const u8 key[CHACHA20_KEY_SIZE],
+				 const u64 nonce)
+{
+	state->key[0] = get_unaligned_le32(key + 0);
+	state->key[1] = get_unaligned_le32(key + 4);
+	state->key[2] = get_unaligned_le32(key + 8);
+	state->key[3] = get_unaligned_le32(key + 12);
+	state->key[4] = get_unaligned_le32(key + 16);
+	state->key[5] = get_unaligned_le32(key + 20);
+	state->key[6] = get_unaligned_le32(key + 24);
+	state->key[7] = get_unaligned_le32(key + 28);
+	state->counter[0] = state->counter[1] = 0;
+	state->counter[2] = nonce & U32_MAX;
+	state->counter[3] = nonce >> 32;
+}
+void chacha20(struct chacha20_ctx *state, u8 *dst, const u8 *src, u32 len,
+	      simd_context_t simd_context);
+
+/* Derived key should be 32-bit aligned */
+void hchacha20(u8 derived_key[CHACHA20_KEY_SIZE],
+	       const u8 nonce[HCHACHA20_NONCE_SIZE],
+	       const u8 key[HCHACHA20_KEY_SIZE], simd_context_t simd_context);
+
+#endif /* _ZINC_CHACHA20_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index 5980c411af0d..e7d396d61607 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -1,6 +1,11 @@
 config ZINC
 	tristate
 
+config ZINC_CHACHA20
+	bool
+	select ZINC
+	select CRYPTO_ALGAPI
+
 config ZINC_DEBUG
 	bool "Zinc cryptography library debugging and self-tests"
 	depends on ZINC
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index dad47573de42..0b5a964bfba6 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -3,6 +3,10 @@ ccflags-y += -Wframe-larger-than=8192
 ccflags-y += -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt'
 ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
 
+ifeq ($(CONFIG_ZINC_CHACHA20),y)
+zinc-y += chacha20/chacha20.o
+endif
+
 zinc-y += main.o
 
 obj-$(CONFIG_ZINC) := zinc.o
diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c
new file mode 100644
index 000000000000..1d9168e6c142
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20.c
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * Implementation of the ChaCha20 stream cipher.
+ *
+ * Information: https://cr.yp.to/chacha.html
+ */
+
+#include <zinc/chacha20.h>
+
+#include <linux/kernel.h>
+#include <crypto/algapi.h>
+
+#ifndef HAVE_CHACHA20_ARCH_IMPLEMENTATION
+void __init chacha20_fpu_init(void)
+{
+}
+static inline bool chacha20_arch(u8 *out, const u8 *in, const size_t len,
+				 const u32 key[8], const u32 counter[4],
+				 simd_context_t simd_context)
+{
+	return false;
+}
+static inline bool hchacha20_arch(u8 *derived_key, const u8 *nonce,
+				  const u8 *key, simd_context_t simd_context)
+{
+	return false;
+}
+#endif
+
+#define EXPAND_32_BYTE_K 0x61707865U, 0x3320646eU, 0x79622d32U, 0x6b206574U
+
+#define QUARTER_ROUND(x, a, b, c, d) ( \
+	x[a] += x[b], \
+	x[d] = rol32((x[d] ^ x[a]), 16), \
+	x[c] += x[d], \
+	x[b] = rol32((x[b] ^ x[c]), 12), \
+	x[a] += x[b], \
+	x[d] = rol32((x[d] ^ x[a]), 8), \
+	x[c] += x[d], \
+	x[b] = rol32((x[b] ^ x[c]), 7) \
+)
+
+#define C(i, j) (i * 4 + j)
+
+#define DOUBLE_ROUND(x) ( \
+	/* Column Round */ \
+	QUARTER_ROUND(x, C(0, 0), C(1, 0), C(2, 0), C(3, 0)), \
+	QUARTER_ROUND(x, C(0, 1), C(1, 1), C(2, 1), C(3, 1)), \
+	QUARTER_ROUND(x, C(0, 2), C(1, 2), C(2, 2), C(3, 2)), \
+	QUARTER_ROUND(x, C(0, 3), C(1, 3), C(2, 3), C(3, 3)), \
+	/* Diagonal Round */ \
+	QUARTER_ROUND(x, C(0, 0), C(1, 1), C(2, 2), C(3, 3)), \
+	QUARTER_ROUND(x, C(0, 1), C(1, 2), C(2, 3), C(3, 0)), \
+	QUARTER_ROUND(x, C(0, 2), C(1, 3), C(2, 0), C(3, 1)), \
+	QUARTER_ROUND(x, C(0, 3), C(1, 0), C(2, 1), C(3, 2)) \
+)
+
+#define TWENTY_ROUNDS(x) ( \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x), \
+	DOUBLE_ROUND(x) \
+)
+
+static void chacha20_block_generic(__le32 *stream, u32 *state)
+{
+	u32 x[CHACHA20_BLOCK_WORDS];
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(x); ++i)
+		x[i] = state[i];
+
+	TWENTY_ROUNDS(x);
+
+	for (i = 0; i < ARRAY_SIZE(x); ++i)
+		stream[i] = cpu_to_le32(x[i] + state[i]);
+
+	++state[12];
+}
+
+static void chacha20_generic(u8 *out, const u8 *in, u32 len, const u32 key[8],
+			     const u32 counter[4])
+{
+	__le32 buf[CHACHA20_BLOCK_WORDS];
+	u32 x[] = {
+		EXPAND_32_BYTE_K,
+		key[0], key[1], key[2], key[3],
+		key[4], key[5], key[6], key[7],
+		counter[0], counter[1], counter[2], counter[3]
+	};
+
+	if (out != in)
+		memmove(out, in, len);
+
+	while (len >= CHACHA20_BLOCK_SIZE) {
+		chacha20_block_generic(buf, x);
+		crypto_xor(out, (u8 *)buf, CHACHA20_BLOCK_SIZE);
+		len -= CHACHA20_BLOCK_SIZE;
+		out += CHACHA20_BLOCK_SIZE;
+	}
+	if (len) {
+		chacha20_block_generic(buf, x);
+		crypto_xor(out, (u8 *)buf, len);
+	}
+}
+
+void chacha20(struct chacha20_ctx *state, u8 *dst, const u8 *src, u32 len,
+	      simd_context_t simd_context)
+{
+	if (!chacha20_arch(dst, src, len, state->key, state->counter,
+			   simd_context))
+		chacha20_generic(dst, src, len, state->key, state->counter);
+	state->counter[0] += (len + 63) / 64;
+}
+EXPORT_SYMBOL(chacha20);
+
+static void hchacha20_generic(u8 derived_key[CHACHA20_KEY_SIZE],
+			      const u8 nonce[HCHACHA20_NONCE_SIZE],
+			      const u8 key[HCHACHA20_KEY_SIZE])
+{
+	__le32 *out = (__force __le32 *)derived_key;
+	u32 x[] = { EXPAND_32_BYTE_K,
+		    get_unaligned_le32(key + 0),
+		    get_unaligned_le32(key + 4),
+		    get_unaligned_le32(key + 8),
+		    get_unaligned_le32(key + 12),
+		    get_unaligned_le32(key + 16),
+		    get_unaligned_le32(key + 20),
+		    get_unaligned_le32(key + 24),
+		    get_unaligned_le32(key + 28),
+		    get_unaligned_le32(nonce + 0),
+		    get_unaligned_le32(nonce + 4),
+		    get_unaligned_le32(nonce + 8),
+		    get_unaligned_le32(nonce + 12)
+	};
+
+	TWENTY_ROUNDS(x);
+
+	out[0] = cpu_to_le32(x[0]);
+	out[1] = cpu_to_le32(x[1]);
+	out[2] = cpu_to_le32(x[2]);
+	out[3] = cpu_to_le32(x[3]);
+	out[4] = cpu_to_le32(x[12]);
+	out[5] = cpu_to_le32(x[13]);
+	out[6] = cpu_to_le32(x[14]);
+	out[7] = cpu_to_le32(x[15]);
+}
+
+/* Derived key should be 32-bit aligned */
+void hchacha20(u8 derived_key[CHACHA20_KEY_SIZE],
+	       const u8 nonce[HCHACHA20_NONCE_SIZE],
+	       const u8 key[HCHACHA20_KEY_SIZE], simd_context_t simd_context)
+{
+	if (!hchacha20_arch(derived_key, nonce, key, simd_context))
+		hchacha20_generic(derived_key, nonce, key);
+}
+/* Deliberately not EXPORT_SYMBOL'd, since there are few reasons why somebody
+ * should be using this directly, rather than via xchacha20. Revisit only in
+ * the unlikely event that somebody has a good reason to export this.
+ */
diff --git a/lib/zinc/main.c b/lib/zinc/main.c
index ceece33ff5a7..7e8e84b706b7 100644
--- a/lib/zinc/main.c
+++ b/lib/zinc/main.c
@@ -3,6 +3,8 @@
  * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
  */
 
+#include <zinc/chacha20.h>
+
 #include <linux/init.h>
 #include <linux/module.h>
 
@@ -17,6 +19,9 @@
 
 static int __init mod_init(void)
 {
+#ifdef CONFIG_ZINC_CHACHA20
+	chacha20_fpu_init();
+#endif
 	return 0;
 }
 
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 04/20] zinc: ChaCha20 ARM and ARM64 implementations
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (2 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 03/20] zinc: ChaCha20 generic C implementation Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 05/20] zinc: ChaCha20 x86_64 implementation Jason A. Donenfeld
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Andy Polyakov, Russell King,
	linux-arm-kernel

These NEON and non-NEON implementations come from Andy Polyakov's
implementation. They are exactly the same as Andy Polyakov's original,
with the following exceptions:

- Entries and exits use the proper kernel convention macro.
- CPU feature checking is done in C by the glue code, so that has been
  removed from the assembly.
- The function names have been renamed to fit kernel conventions.
- Labels have been renamed (prefixed with .L) to fit kernel conventions.
- Constants have been rearranged so that they are closer to the code
  that is using them. [ARM only]
- The neon code can jump to the scalar code when it makes sense to do
  so.
- The neon_512 function as a separate function has been removed, leaving
  the decision up to the main neon entry point. [ARM64 only]

After '/^#/d;/^\..*[^:]$/d', the code has the following diff in actual
instructions from the original.

ARM:

-ChaCha20_ctr32:
-.LChaCha20_ctr32:
+ENTRY(chacha20_arm)
 	ldr	r12,[sp,#0]		@ pull pointer to counter and nonce
 	stmdb	sp!,{r0-r2,r4-r11,lr}
-	sub	r14,pc,#16		@ ChaCha20_ctr32
-	adr	r14,.LChaCha20_ctr32
 	cmp	r2,#0			@ len==0?
 	itt	eq
 	addeq	sp,sp,#4*3
-	beq	.Lno_data
-	cmp	r2,#192			@ test len
-	bls	.Lshort
-	ldr	r4,[r14,#-32]
-	ldr	r4,[r14,r4]
-	ldr	r4,[r4]
-	tst	r4,#ARMV7_NEON
-	bne	.LChaCha20_neon
+	beq	.Lno_data_arm
 .Lshort:
 	ldmia	r12,{r4-r7}		@ load counter and nonce
 	sub	sp,sp,#4*(16)		@ off-load area
-	sub	r14,r14,#64		@ .Lsigma
+	sub	r14,pc,#100		@ .Lsigma
+	adr	r14,.Lsigma		@ .Lsigma
 	stmdb	sp!,{r4-r7}		@ copy counter and nonce
 	ldmia	r3,{r4-r11}		@ load key
 	ldmia	r14,{r0-r3}		@ load sigma
@@ -617,14 +615,25 @@

 .Ldone:
 	add	sp,sp,#4*(32+3)
-.Lno_data:
+.Lno_data_arm:
 	ldmia	sp!,{r4-r11,pc}
+ENDPROC(chacha20_arm)

-ChaCha20_neon:
+ENTRY(chacha20_neon)
 	ldr		r12,[sp,#0]		@ pull pointer to counter and nonce
 	stmdb		sp!,{r0-r2,r4-r11,lr}
-.LChaCha20_neon:
-	adr		r14,.Lsigma
+	cmp		r2,#0			@ len==0?
+	itt		eq
+	addeq		sp,sp,#4*3
+	beq		.Lno_data_neon
+	cmp		r2,#192			@ test len
+	bls		.Lshort
+.Lchacha20_neon_begin:
+	adr		r14,.Lsigma2
 	vstmdb		sp!,{d8-d15}		@ ABI spec says so
 	stmdb		sp!,{r0-r3}

@@ -1265,4 +1274,6 @@
 	add		sp,sp,#4*(32+4)
 	vldmia		sp,{d8-d15}
 	add		sp,sp,#4*(16+3)
+.Lno_data_neon:
 	ldmia		sp!,{r4-r11,pc}
+ENDPROC(chacha20_neon)

ARM64:

-ChaCha20_ctr32:
+ENTRY(chacha20_arm)
 	cbz	x2,.Labort
-	adr	x5,.LOPENSSL_armcap_P
-	cmp	x2,#192
-	b.lo	.Lshort
-	ldrsw	x6,[x5]
-	ldr	x6,[x5]
-	ldr	w17,[x6,x5]
-	tst	w17,#ARMV7_NEON
-	b.ne	ChaCha20_neon
-
 .Lshort:
 	stp	x29,x30,[sp,#-96]!
 	add	x29,sp,#0
@@ -279,8 +274,13 @@
 	ldp	x27,x28,[x29,#80]
 	ldp	x29,x30,[sp],#96
 	ret
+ENDPROC(chacha20_arm)
+
+ENTRY(chacha20_neon)
+	cbz	x2,.Labort_neon
+	cmp	x2,#192
+	b.lo	.Lshort

-ChaCha20_neon:
 	stp	x29,x30,[sp,#-96]!
 	add	x29,sp,#0

@@ -763,16 +763,6 @@
 	ldp	x27,x28,[x29,#80]
 	ldp	x29,x30,[sp],#96
 	ret
-ChaCha20_512_neon:
-	stp	x29,x30,[sp,#-96]!
-	add	x29,sp,#0
-
-	adr	x5,.Lsigma
-	stp	x19,x20,[sp,#16]
-	stp	x21,x22,[sp,#32]
-	stp	x23,x24,[sp,#48]
-	stp	x25,x26,[sp,#64]
-	stp	x27,x28,[sp,#80]

 .L512_or_more_neon:
 	sub	sp,sp,#128+64
@@ -1920,4 +1910,6 @@
 	ldp	x25,x26,[x29,#64]
 	ldp	x27,x28,[x29,#80]
 	ldp	x29,x30,[sp],#96
+.Labort_neon:
 	ret
+ENDPROC(chacha20_neon)

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
 lib/zinc/Makefile                     |    8 +
 lib/zinc/chacha20/chacha20-arm-glue.h |   50 +
 lib/zinc/chacha20/chacha20-arm.S      | 1473 +++++++++++++++++++
 lib/zinc/chacha20/chacha20-arm64.S    | 1942 +++++++++++++++++++++++++
 4 files changed, 3473 insertions(+)
 create mode 100644 lib/zinc/chacha20/chacha20-arm-glue.h
 create mode 100644 lib/zinc/chacha20/chacha20-arm.S
 create mode 100644 lib/zinc/chacha20/chacha20-arm64.S

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 0b5a964bfba6..8d14cb13349a 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -5,6 +5,14 @@ ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
 
 ifeq ($(CONFIG_ZINC_CHACHA20),y)
 zinc-y += chacha20/chacha20.o
+ifeq ($(CONFIG_ZINC_ARCH_ARM),y)
+zinc-y += chacha20/chacha20-arm.o
+CFLAGS_chacha20.o += -include $(srctree)/$(src)/chacha20/chacha20-arm-glue.h
+endif
+ifeq ($(CONFIG_ZINC_ARCH_ARM64),y)
+zinc-y += chacha20/chacha20-arm64.o
+CFLAGS_chacha20.o += -include $(srctree)/$(src)/chacha20/chacha20-arm-glue.h
+endif
 endif
 
 zinc-y += main.o
diff --git a/lib/zinc/chacha20/chacha20-arm-glue.h b/lib/zinc/chacha20/chacha20-arm-glue.h
new file mode 100644
index 000000000000..d32361514f3a
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm-glue.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/chacha20.h>
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+
+asmlinkage void chacha20_arm(u8 *out, const u8 *in, const size_t len,
+			     const u32 key[8], const u32 counter[4]);
+#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&                                     \
+	(defined(CONFIG_64BIT) || __LINUX_ARM_ARCH__ >= 7)
+#define ARM_USE_NEON
+asmlinkage void chacha20_neon(u8 *out, const u8 *in, const size_t len,
+			      const u32 key[8], const u32 counter[4]);
+#endif
+
+static bool chacha20_use_neon __ro_after_init;
+
+void __init chacha20_fpu_init(void)
+{
+#if defined(CONFIG_ARM64)
+	chacha20_use_neon = elf_hwcap & HWCAP_ASIMD;
+#elif defined(CONFIG_ARM)
+	chacha20_use_neon = elf_hwcap & HWCAP_NEON;
+#endif
+}
+
+static inline bool chacha20_arch(u8 *dst, const u8 *src, const size_t len,
+				 const u32 key[8], const u32 counter[4],
+				 simd_context_t simd_context)
+{
+#if defined(ARM_USE_NEON)
+	if (simd_context == HAVE_FULL_SIMD && chacha20_use_neon) {
+		chacha20_neon(dst, src, len, key, counter);
+		return true;
+	}
+#endif
+	chacha20_arm(dst, src, len, key, counter);
+	return true;
+}
+
+static inline bool hchacha20_arch(u8 *derived_key, const u8 *nonce,
+				  const u8 *key, simd_context_t simd_context)
+{
+	return false;
+}
+
+#define HAVE_CHACHA20_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/chacha20/chacha20-arm.S b/lib/zinc/chacha20/chacha20-arm.S
new file mode 100644
index 000000000000..0ea1db1492eb
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm.S
@@ -0,0 +1,1473 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#include <linux/linkage.h>
+
+.text
+#if defined(__thumb2__) || defined(__clang__)
+.syntax	unified
+#endif
+#if defined(__thumb2__)
+.thumb
+#else
+.code	32
+#endif
+
+#if defined(__thumb2__) || defined(__clang__)
+#define ldrhsb	ldrbhs
+#endif
+
+.align	5
+.Lsigma:
+.long	0x61707865,0x3320646e,0x79622d32,0x6b206574	@ endian-neutral
+.Lone:
+.long	1,0,0,0
+.word -1
+
+.align	5
+ENTRY(chacha20_arm)
+	ldr	r12,[sp,#0]		@ pull pointer to counter and nonce
+	stmdb	sp!,{r0-r2,r4-r11,lr}
+	cmp	r2,#0			@ len==0?
+#ifdef	__thumb2__
+	itt	eq
+#endif
+	addeq	sp,sp,#4*3
+	beq	.Lno_data_arm
+.Lshort:
+	ldmia	r12,{r4-r7}		@ load counter and nonce
+	sub	sp,sp,#4*(16)		@ off-load area
+#if __LINUX_ARM_ARCH__ < 7 && !defined(__thumb2__)
+	sub	r14,pc,#100		@ .Lsigma
+#else
+	adr	r14,.Lsigma		@ .Lsigma
+#endif
+	stmdb	sp!,{r4-r7}		@ copy counter and nonce
+	ldmia	r3,{r4-r11}		@ load key
+	ldmia	r14,{r0-r3}		@ load sigma
+	stmdb	sp!,{r4-r11}		@ copy key
+	stmdb	sp!,{r0-r3}		@ copy sigma
+	str	r10,[sp,#4*(16+10)]	@ off-load "rx"
+	str	r11,[sp,#4*(16+11)]	@ off-load "rx"
+	b	.Loop_outer_enter
+
+.align	4
+.Loop_outer:
+	ldmia	sp,{r0-r9}		@ load key material
+	str	r11,[sp,#4*(32+2)]	@ save len
+	str	r12,  [sp,#4*(32+1)]	@ save inp
+	str	r14,  [sp,#4*(32+0)]	@ save out
+.Loop_outer_enter:
+	ldr	r11, [sp,#4*(15)]
+	ldr	r12,[sp,#4*(12)]	@ modulo-scheduled load
+	ldr	r10, [sp,#4*(13)]
+	ldr	r14,[sp,#4*(14)]
+	str	r11, [sp,#4*(16+15)]
+	mov	r11,#10
+	b	.Loop
+
+.align	4
+.Loop:
+	subs	r11,r11,#1
+	add	r0,r0,r4
+	mov	r12,r12,ror#16
+	add	r1,r1,r5
+	mov	r10,r10,ror#16
+	eor	r12,r12,r0,ror#16
+	eor	r10,r10,r1,ror#16
+	add	r8,r8,r12
+	mov	r4,r4,ror#20
+	add	r9,r9,r10
+	mov	r5,r5,ror#20
+	eor	r4,r4,r8,ror#20
+	eor	r5,r5,r9,ror#20
+	add	r0,r0,r4
+	mov	r12,r12,ror#24
+	add	r1,r1,r5
+	mov	r10,r10,ror#24
+	eor	r12,r12,r0,ror#24
+	eor	r10,r10,r1,ror#24
+	add	r8,r8,r12
+	mov	r4,r4,ror#25
+	add	r9,r9,r10
+	mov	r5,r5,ror#25
+	str	r10,[sp,#4*(16+13)]
+	ldr	r10,[sp,#4*(16+15)]
+	eor	r4,r4,r8,ror#25
+	eor	r5,r5,r9,ror#25
+	str	r8,[sp,#4*(16+8)]
+	ldr	r8,[sp,#4*(16+10)]
+	add	r2,r2,r6
+	mov	r14,r14,ror#16
+	str	r9,[sp,#4*(16+9)]
+	ldr	r9,[sp,#4*(16+11)]
+	add	r3,r3,r7
+	mov	r10,r10,ror#16
+	eor	r14,r14,r2,ror#16
+	eor	r10,r10,r3,ror#16
+	add	r8,r8,r14
+	mov	r6,r6,ror#20
+	add	r9,r9,r10
+	mov	r7,r7,ror#20
+	eor	r6,r6,r8,ror#20
+	eor	r7,r7,r9,ror#20
+	add	r2,r2,r6
+	mov	r14,r14,ror#24
+	add	r3,r3,r7
+	mov	r10,r10,ror#24
+	eor	r14,r14,r2,ror#24
+	eor	r10,r10,r3,ror#24
+	add	r8,r8,r14
+	mov	r6,r6,ror#25
+	add	r9,r9,r10
+	mov	r7,r7,ror#25
+	eor	r6,r6,r8,ror#25
+	eor	r7,r7,r9,ror#25
+	add	r0,r0,r5
+	mov	r10,r10,ror#16
+	add	r1,r1,r6
+	mov	r12,r12,ror#16
+	eor	r10,r10,r0,ror#16
+	eor	r12,r12,r1,ror#16
+	add	r8,r8,r10
+	mov	r5,r5,ror#20
+	add	r9,r9,r12
+	mov	r6,r6,ror#20
+	eor	r5,r5,r8,ror#20
+	eor	r6,r6,r9,ror#20
+	add	r0,r0,r5
+	mov	r10,r10,ror#24
+	add	r1,r1,r6
+	mov	r12,r12,ror#24
+	eor	r10,r10,r0,ror#24
+	eor	r12,r12,r1,ror#24
+	add	r8,r8,r10
+	mov	r5,r5,ror#25
+	str	r10,[sp,#4*(16+15)]
+	ldr	r10,[sp,#4*(16+13)]
+	add	r9,r9,r12
+	mov	r6,r6,ror#25
+	eor	r5,r5,r8,ror#25
+	eor	r6,r6,r9,ror#25
+	str	r8,[sp,#4*(16+10)]
+	ldr	r8,[sp,#4*(16+8)]
+	add	r2,r2,r7
+	mov	r10,r10,ror#16
+	str	r9,[sp,#4*(16+11)]
+	ldr	r9,[sp,#4*(16+9)]
+	add	r3,r3,r4
+	mov	r14,r14,ror#16
+	eor	r10,r10,r2,ror#16
+	eor	r14,r14,r3,ror#16
+	add	r8,r8,r10
+	mov	r7,r7,ror#20
+	add	r9,r9,r14
+	mov	r4,r4,ror#20
+	eor	r7,r7,r8,ror#20
+	eor	r4,r4,r9,ror#20
+	add	r2,r2,r7
+	mov	r10,r10,ror#24
+	add	r3,r3,r4
+	mov	r14,r14,ror#24
+	eor	r10,r10,r2,ror#24
+	eor	r14,r14,r3,ror#24
+	add	r8,r8,r10
+	mov	r7,r7,ror#25
+	add	r9,r9,r14
+	mov	r4,r4,ror#25
+	eor	r7,r7,r8,ror#25
+	eor	r4,r4,r9,ror#25
+	bne	.Loop
+
+	ldr	r11,[sp,#4*(32+2)]	@ load len
+
+	str	r8, [sp,#4*(16+8)]	@ modulo-scheduled store
+	str	r9, [sp,#4*(16+9)]
+	str	r12,[sp,#4*(16+12)]
+	str	r10, [sp,#4*(16+13)]
+	str	r14,[sp,#4*(16+14)]
+
+	@ at this point we have first half of 512-bit result in
+	@ rx and second half at sp+4*(16+8)
+
+	cmp	r11,#64		@ done yet?
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	addlo	r12,sp,#4*(0)		@ shortcut or ...
+	ldrhs	r12,[sp,#4*(32+1)]	@ ... load inp
+	addlo	r14,sp,#4*(0)		@ shortcut or ...
+	ldrhs	r14,[sp,#4*(32+0)]	@ ... load out
+
+	ldr	r8,[sp,#4*(0)]	@ load key material
+	ldr	r9,[sp,#4*(1)]
+
+#if __LINUX_ARM_ARCH__ >= 6 || !defined(__ARMEB__)
+#if __LINUX_ARM_ARCH__ < 7
+	orr	r10,r12,r14
+	tst	r10,#3		@ are input and output aligned?
+	ldr	r10,[sp,#4*(2)]
+	bne	.Lunaligned
+	cmp	r11,#64		@ restore flags
+#else
+	ldr	r10,[sp,#4*(2)]
+#endif
+	ldr	r11,[sp,#4*(3)]
+
+	add	r0,r0,r8	@ accumulate key material
+	add	r1,r1,r9
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhs	r8,[r12],#16		@ load input
+	ldrhs	r9,[r12,#-12]
+
+	add	r2,r2,r10
+	add	r3,r3,r11
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhs	r10,[r12,#-8]
+	ldrhs	r11,[r12,#-4]
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
+	rev	r0,r0
+	rev	r1,r1
+	rev	r2,r2
+	rev	r3,r3
+#endif
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	eorhs	r0,r0,r8	@ xor with input
+	eorhs	r1,r1,r9
+	 add	r8,sp,#4*(4)
+	str	r0,[r14],#16		@ store output
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	eorhs	r2,r2,r10
+	eorhs	r3,r3,r11
+	 ldmia	r8,{r8-r11}	@ load key material
+	str	r1,[r14,#-12]
+	str	r2,[r14,#-8]
+	str	r3,[r14,#-4]
+
+	add	r4,r4,r8	@ accumulate key material
+	add	r5,r5,r9
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhs	r8,[r12],#16		@ load input
+	ldrhs	r9,[r12,#-12]
+	add	r6,r6,r10
+	add	r7,r7,r11
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhs	r10,[r12,#-8]
+	ldrhs	r11,[r12,#-4]
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
+	rev	r4,r4
+	rev	r5,r5
+	rev	r6,r6
+	rev	r7,r7
+#endif
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	eorhs	r4,r4,r8
+	eorhs	r5,r5,r9
+	 add	r8,sp,#4*(8)
+	str	r4,[r14],#16		@ store output
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	eorhs	r6,r6,r10
+	eorhs	r7,r7,r11
+	str	r5,[r14,#-12]
+	 ldmia	r8,{r8-r11}	@ load key material
+	str	r6,[r14,#-8]
+	 add	r0,sp,#4*(16+8)
+	str	r7,[r14,#-4]
+
+	ldmia	r0,{r0-r7}	@ load second half
+
+	add	r0,r0,r8	@ accumulate key material
+	add	r1,r1,r9
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhs	r8,[r12],#16		@ load input
+	ldrhs	r9,[r12,#-12]
+#ifdef	__thumb2__
+	itt	hi
+#endif
+	 strhi	r10,[sp,#4*(16+10)]	@ copy "rx" while at it
+	 strhi	r11,[sp,#4*(16+11)]	@ copy "rx" while at it
+	add	r2,r2,r10
+	add	r3,r3,r11
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhs	r10,[r12,#-8]
+	ldrhs	r11,[r12,#-4]
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
+	rev	r0,r0
+	rev	r1,r1
+	rev	r2,r2
+	rev	r3,r3
+#endif
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	eorhs	r0,r0,r8
+	eorhs	r1,r1,r9
+	 add	r8,sp,#4*(12)
+	str	r0,[r14],#16		@ store output
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	eorhs	r2,r2,r10
+	eorhs	r3,r3,r11
+	str	r1,[r14,#-12]
+	 ldmia	r8,{r8-r11}	@ load key material
+	str	r2,[r14,#-8]
+	str	r3,[r14,#-4]
+
+	add	r4,r4,r8	@ accumulate key material
+	add	r5,r5,r9
+#ifdef	__thumb2__
+	itt	hi
+#endif
+	 addhi	r8,r8,#1		@ next counter value
+	 strhi	r8,[sp,#4*(12)]	@ save next counter value
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhs	r8,[r12],#16		@ load input
+	ldrhs	r9,[r12,#-12]
+	add	r6,r6,r10
+	add	r7,r7,r11
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhs	r10,[r12,#-8]
+	ldrhs	r11,[r12,#-4]
+#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__)
+	rev	r4,r4
+	rev	r5,r5
+	rev	r6,r6
+	rev	r7,r7
+#endif
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	eorhs	r4,r4,r8
+	eorhs	r5,r5,r9
+#ifdef	__thumb2__
+	 it	ne
+#endif
+	 ldrne	r8,[sp,#4*(32+2)]	@ re-load len
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	eorhs	r6,r6,r10
+	eorhs	r7,r7,r11
+	str	r4,[r14],#16		@ store output
+	str	r5,[r14,#-12]
+#ifdef	__thumb2__
+	it	hs
+#endif
+	 subhs	r11,r8,#64		@ len-=64
+	str	r6,[r14,#-8]
+	str	r7,[r14,#-4]
+	bhi	.Loop_outer
+
+	beq	.Ldone
+#if __LINUX_ARM_ARCH__ < 7
+	b	.Ltail
+
+.align	4
+.Lunaligned:				@ unaligned endian-neutral path
+	cmp	r11,#64		@ restore flags
+#endif
+#endif
+#if __LINUX_ARM_ARCH__ < 7
+	ldr	r11,[sp,#4*(3)]
+	add	r0,r0,r8		@ accumulate key material
+	add	r1,r1,r9
+	add	r2,r2,r10
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	eorlo	r8,r8,r8		@ zero or ...
+	ldrhsb	r8,[r12],#16			@ ... load input
+	eorlo	r9,r9,r9
+	ldrhsb	r9,[r12,#-12]
+
+	add	r3,r3,r11
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	eorlo	r10,r10,r10
+	ldrhsb	r10,[r12,#-8]
+	eorlo	r11,r11,r11
+	ldrhsb	r11,[r12,#-4]
+
+	eor	r0,r8,r0		@ xor with input (or zero)
+	eor	r1,r9,r1
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-15]		@ load more input
+	ldrhsb	r9,[r12,#-11]
+	eor	r2,r10,r2
+	 strb	r0,[r14],#16		@ store output
+	eor	r3,r11,r3
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-7]
+	ldrhsb	r11,[r12,#-3]
+	 strb	r1,[r14,#-12]
+	eor	r0,r8,r0,lsr#8
+	 strb	r2,[r14,#-8]
+	eor	r1,r9,r1,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-14]		@ load more input
+	ldrhsb	r9,[r12,#-10]
+	 strb	r3,[r14,#-4]
+	eor	r2,r10,r2,lsr#8
+	 strb	r0,[r14,#-15]
+	eor	r3,r11,r3,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-6]
+	ldrhsb	r11,[r12,#-2]
+	 strb	r1,[r14,#-11]
+	eor	r0,r8,r0,lsr#8
+	 strb	r2,[r14,#-7]
+	eor	r1,r9,r1,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-13]		@ load more input
+	ldrhsb	r9,[r12,#-9]
+	 strb	r3,[r14,#-3]
+	eor	r2,r10,r2,lsr#8
+	 strb	r0,[r14,#-14]
+	eor	r3,r11,r3,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-5]
+	ldrhsb	r11,[r12,#-1]
+	 strb	r1,[r14,#-10]
+	 strb	r2,[r14,#-6]
+	eor	r0,r8,r0,lsr#8
+	 strb	r3,[r14,#-2]
+	eor	r1,r9,r1,lsr#8
+	 strb	r0,[r14,#-13]
+	eor	r2,r10,r2,lsr#8
+	 strb	r1,[r14,#-9]
+	eor	r3,r11,r3,lsr#8
+	 strb	r2,[r14,#-5]
+	 strb	r3,[r14,#-1]
+	add	r8,sp,#4*(4+0)
+	ldmia	r8,{r8-r11}		@ load key material
+	add	r0,sp,#4*(16+8)
+	add	r4,r4,r8		@ accumulate key material
+	add	r5,r5,r9
+	add	r6,r6,r10
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	eorlo	r8,r8,r8		@ zero or ...
+	ldrhsb	r8,[r12],#16			@ ... load input
+	eorlo	r9,r9,r9
+	ldrhsb	r9,[r12,#-12]
+
+	add	r7,r7,r11
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	eorlo	r10,r10,r10
+	ldrhsb	r10,[r12,#-8]
+	eorlo	r11,r11,r11
+	ldrhsb	r11,[r12,#-4]
+
+	eor	r4,r8,r4		@ xor with input (or zero)
+	eor	r5,r9,r5
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-15]		@ load more input
+	ldrhsb	r9,[r12,#-11]
+	eor	r6,r10,r6
+	 strb	r4,[r14],#16		@ store output
+	eor	r7,r11,r7
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-7]
+	ldrhsb	r11,[r12,#-3]
+	 strb	r5,[r14,#-12]
+	eor	r4,r8,r4,lsr#8
+	 strb	r6,[r14,#-8]
+	eor	r5,r9,r5,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-14]		@ load more input
+	ldrhsb	r9,[r12,#-10]
+	 strb	r7,[r14,#-4]
+	eor	r6,r10,r6,lsr#8
+	 strb	r4,[r14,#-15]
+	eor	r7,r11,r7,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-6]
+	ldrhsb	r11,[r12,#-2]
+	 strb	r5,[r14,#-11]
+	eor	r4,r8,r4,lsr#8
+	 strb	r6,[r14,#-7]
+	eor	r5,r9,r5,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-13]		@ load more input
+	ldrhsb	r9,[r12,#-9]
+	 strb	r7,[r14,#-3]
+	eor	r6,r10,r6,lsr#8
+	 strb	r4,[r14,#-14]
+	eor	r7,r11,r7,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-5]
+	ldrhsb	r11,[r12,#-1]
+	 strb	r5,[r14,#-10]
+	 strb	r6,[r14,#-6]
+	eor	r4,r8,r4,lsr#8
+	 strb	r7,[r14,#-2]
+	eor	r5,r9,r5,lsr#8
+	 strb	r4,[r14,#-13]
+	eor	r6,r10,r6,lsr#8
+	 strb	r5,[r14,#-9]
+	eor	r7,r11,r7,lsr#8
+	 strb	r6,[r14,#-5]
+	 strb	r7,[r14,#-1]
+	add	r8,sp,#4*(4+4)
+	ldmia	r8,{r8-r11}		@ load key material
+	ldmia	r0,{r0-r7}		@ load second half
+#ifdef	__thumb2__
+	itt	hi
+#endif
+	strhi	r10,[sp,#4*(16+10)]		@ copy "rx"
+	strhi	r11,[sp,#4*(16+11)]		@ copy "rx"
+	add	r0,r0,r8		@ accumulate key material
+	add	r1,r1,r9
+	add	r2,r2,r10
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	eorlo	r8,r8,r8		@ zero or ...
+	ldrhsb	r8,[r12],#16			@ ... load input
+	eorlo	r9,r9,r9
+	ldrhsb	r9,[r12,#-12]
+
+	add	r3,r3,r11
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	eorlo	r10,r10,r10
+	ldrhsb	r10,[r12,#-8]
+	eorlo	r11,r11,r11
+	ldrhsb	r11,[r12,#-4]
+
+	eor	r0,r8,r0		@ xor with input (or zero)
+	eor	r1,r9,r1
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-15]		@ load more input
+	ldrhsb	r9,[r12,#-11]
+	eor	r2,r10,r2
+	 strb	r0,[r14],#16		@ store output
+	eor	r3,r11,r3
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-7]
+	ldrhsb	r11,[r12,#-3]
+	 strb	r1,[r14,#-12]
+	eor	r0,r8,r0,lsr#8
+	 strb	r2,[r14,#-8]
+	eor	r1,r9,r1,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-14]		@ load more input
+	ldrhsb	r9,[r12,#-10]
+	 strb	r3,[r14,#-4]
+	eor	r2,r10,r2,lsr#8
+	 strb	r0,[r14,#-15]
+	eor	r3,r11,r3,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-6]
+	ldrhsb	r11,[r12,#-2]
+	 strb	r1,[r14,#-11]
+	eor	r0,r8,r0,lsr#8
+	 strb	r2,[r14,#-7]
+	eor	r1,r9,r1,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-13]		@ load more input
+	ldrhsb	r9,[r12,#-9]
+	 strb	r3,[r14,#-3]
+	eor	r2,r10,r2,lsr#8
+	 strb	r0,[r14,#-14]
+	eor	r3,r11,r3,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-5]
+	ldrhsb	r11,[r12,#-1]
+	 strb	r1,[r14,#-10]
+	 strb	r2,[r14,#-6]
+	eor	r0,r8,r0,lsr#8
+	 strb	r3,[r14,#-2]
+	eor	r1,r9,r1,lsr#8
+	 strb	r0,[r14,#-13]
+	eor	r2,r10,r2,lsr#8
+	 strb	r1,[r14,#-9]
+	eor	r3,r11,r3,lsr#8
+	 strb	r2,[r14,#-5]
+	 strb	r3,[r14,#-1]
+	add	r8,sp,#4*(4+8)
+	ldmia	r8,{r8-r11}		@ load key material
+	add	r4,r4,r8		@ accumulate key material
+#ifdef	__thumb2__
+	itt	hi
+#endif
+	addhi	r8,r8,#1			@ next counter value
+	strhi	r8,[sp,#4*(12)]		@ save next counter value
+	add	r5,r5,r9
+	add	r6,r6,r10
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	eorlo	r8,r8,r8		@ zero or ...
+	ldrhsb	r8,[r12],#16			@ ... load input
+	eorlo	r9,r9,r9
+	ldrhsb	r9,[r12,#-12]
+
+	add	r7,r7,r11
+#ifdef	__thumb2__
+	itete	lo
+#endif
+	eorlo	r10,r10,r10
+	ldrhsb	r10,[r12,#-8]
+	eorlo	r11,r11,r11
+	ldrhsb	r11,[r12,#-4]
+
+	eor	r4,r8,r4		@ xor with input (or zero)
+	eor	r5,r9,r5
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-15]		@ load more input
+	ldrhsb	r9,[r12,#-11]
+	eor	r6,r10,r6
+	 strb	r4,[r14],#16		@ store output
+	eor	r7,r11,r7
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-7]
+	ldrhsb	r11,[r12,#-3]
+	 strb	r5,[r14,#-12]
+	eor	r4,r8,r4,lsr#8
+	 strb	r6,[r14,#-8]
+	eor	r5,r9,r5,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-14]		@ load more input
+	ldrhsb	r9,[r12,#-10]
+	 strb	r7,[r14,#-4]
+	eor	r6,r10,r6,lsr#8
+	 strb	r4,[r14,#-15]
+	eor	r7,r11,r7,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-6]
+	ldrhsb	r11,[r12,#-2]
+	 strb	r5,[r14,#-11]
+	eor	r4,r8,r4,lsr#8
+	 strb	r6,[r14,#-7]
+	eor	r5,r9,r5,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r8,[r12,#-13]		@ load more input
+	ldrhsb	r9,[r12,#-9]
+	 strb	r7,[r14,#-3]
+	eor	r6,r10,r6,lsr#8
+	 strb	r4,[r14,#-14]
+	eor	r7,r11,r7,lsr#8
+#ifdef	__thumb2__
+	itt	hs
+#endif
+	ldrhsb	r10,[r12,#-5]
+	ldrhsb	r11,[r12,#-1]
+	 strb	r5,[r14,#-10]
+	 strb	r6,[r14,#-6]
+	eor	r4,r8,r4,lsr#8
+	 strb	r7,[r14,#-2]
+	eor	r5,r9,r5,lsr#8
+	 strb	r4,[r14,#-13]
+	eor	r6,r10,r6,lsr#8
+	 strb	r5,[r14,#-9]
+	eor	r7,r11,r7,lsr#8
+	 strb	r6,[r14,#-5]
+	 strb	r7,[r14,#-1]
+#ifdef	__thumb2__
+	it	ne
+#endif
+	ldrne	r8,[sp,#4*(32+2)]		@ re-load len
+#ifdef	__thumb2__
+	it	hs
+#endif
+	subhs	r11,r8,#64			@ len-=64
+	bhi	.Loop_outer
+
+	beq	.Ldone
+#endif
+
+.Ltail:
+	ldr	r12,[sp,#4*(32+1)]	@ load inp
+	add	r9,sp,#4*(0)
+	ldr	r14,[sp,#4*(32+0)]	@ load out
+
+.Loop_tail:
+	ldrb	r10,[r9],#1	@ read buffer on stack
+	ldrb	r11,[r12],#1		@ read input
+	subs	r8,r8,#1
+	eor	r11,r11,r10
+	strb	r11,[r14],#1		@ store output
+	bne	.Loop_tail
+
+.Ldone:
+	add	sp,sp,#4*(32+3)
+.Lno_data_arm:
+	ldmia	sp!,{r4-r11,pc}
+ENDPROC(chacha20_arm)
+
+#if __LINUX_ARM_ARCH__ >= 7 && IS_ENABLED(CONFIG_KERNEL_MODE_NEON)
+.align	5
+.Lsigma2:
+.long	0x61707865,0x3320646e,0x79622d32,0x6b206574	@ endian-neutral
+.Lone2:
+.long	1,0,0,0
+.word -1
+
+.arch	armv7-a
+.fpu	neon
+
+.align	5
+ENTRY(chacha20_neon)
+	ldr		r12,[sp,#0]		@ pull pointer to counter and nonce
+	stmdb		sp!,{r0-r2,r4-r11,lr}
+	cmp		r2,#0			@ len==0?
+#ifdef	__thumb2__
+	itt		eq
+#endif
+	addeq		sp,sp,#4*3
+	beq		.Lno_data_neon
+	cmp		r2,#192			@ test len
+	bls		.Lshort
+.Lchacha20_neon_begin:
+	adr		r14,.Lsigma2
+	vstmdb		sp!,{d8-d15}		@ ABI spec says so
+	stmdb		sp!,{r0-r3}
+
+	vld1.32		{q1-q2},[r3]		@ load key
+	ldmia		r3,{r4-r11}		@ load key
+
+	sub		sp,sp,#4*(16+16)
+	vld1.32		{q3},[r12]		@ load counter and nonce
+	add		r12,sp,#4*8
+	ldmia		r14,{r0-r3}		@ load sigma
+	vld1.32		{q0},[r14]!		@ load sigma
+	vld1.32		{q12},[r14]		@ one
+	vst1.32		{q2-q3},[r12]		@ copy 1/2key|counter|nonce
+	vst1.32		{q0-q1},[sp]		@ copy sigma|1/2key
+
+	str		r10,[sp,#4*(16+10)]	@ off-load "rx"
+	str		r11,[sp,#4*(16+11)]	@ off-load "rx"
+	vshl.i32	d26,d24,#1	@ two
+	vstr		d24,[sp,#4*(16+0)]
+	vshl.i32	d28,d24,#2	@ four
+	vstr		d26,[sp,#4*(16+2)]
+	vmov		q4,q0
+	vstr		d28,[sp,#4*(16+4)]
+	vmov		q8,q0
+	vmov		q5,q1
+	vmov		q9,q1
+	b		.Loop_neon_enter
+
+.align	4
+.Loop_neon_outer:
+	ldmia		sp,{r0-r9}		@ load key material
+	cmp		r11,#64*2		@ if len<=64*2
+	bls		.Lbreak_neon		@ switch to integer-only
+	vmov		q4,q0
+	str		r11,[sp,#4*(32+2)]	@ save len
+	vmov		q8,q0
+	str		r12,  [sp,#4*(32+1)]	@ save inp
+	vmov		q5,q1
+	str		r14,  [sp,#4*(32+0)]	@ save out
+	vmov		q9,q1
+.Loop_neon_enter:
+	ldr		r11, [sp,#4*(15)]
+	vadd.i32	q7,q3,q12		@ counter+1
+	ldr		r12,[sp,#4*(12)]	@ modulo-scheduled load
+	vmov		q6,q2
+	ldr		r10, [sp,#4*(13)]
+	vmov		q10,q2
+	ldr		r14,[sp,#4*(14)]
+	vadd.i32	q11,q7,q12		@ counter+2
+	str		r11, [sp,#4*(16+15)]
+	mov		r11,#10
+	add		r12,r12,#3	@ counter+3
+	b		.Loop_neon
+
+.align	4
+.Loop_neon:
+	subs		r11,r11,#1
+	vadd.i32	q0,q0,q1
+	add	r0,r0,r4
+	vadd.i32	q4,q4,q5
+	mov	r12,r12,ror#16
+	vadd.i32	q8,q8,q9
+	add	r1,r1,r5
+	veor	q3,q3,q0
+	mov	r10,r10,ror#16
+	veor	q7,q7,q4
+	eor	r12,r12,r0,ror#16
+	veor	q11,q11,q8
+	eor	r10,r10,r1,ror#16
+	vrev32.16	q3,q3
+	add	r8,r8,r12
+	vrev32.16	q7,q7
+	mov	r4,r4,ror#20
+	vrev32.16	q11,q11
+	add	r9,r9,r10
+	vadd.i32	q2,q2,q3
+	mov	r5,r5,ror#20
+	vadd.i32	q6,q6,q7
+	eor	r4,r4,r8,ror#20
+	vadd.i32	q10,q10,q11
+	eor	r5,r5,r9,ror#20
+	veor	q12,q1,q2
+	add	r0,r0,r4
+	veor	q13,q5,q6
+	mov	r12,r12,ror#24
+	veor	q14,q9,q10
+	add	r1,r1,r5
+	vshr.u32	q1,q12,#20
+	mov	r10,r10,ror#24
+	vshr.u32	q5,q13,#20
+	eor	r12,r12,r0,ror#24
+	vshr.u32	q9,q14,#20
+	eor	r10,r10,r1,ror#24
+	vsli.32	q1,q12,#12
+	add	r8,r8,r12
+	vsli.32	q5,q13,#12
+	mov	r4,r4,ror#25
+	vsli.32	q9,q14,#12
+	add	r9,r9,r10
+	vadd.i32	q0,q0,q1
+	mov	r5,r5,ror#25
+	vadd.i32	q4,q4,q5
+	str	r10,[sp,#4*(16+13)]
+	vadd.i32	q8,q8,q9
+	ldr	r10,[sp,#4*(16+15)]
+	veor	q12,q3,q0
+	eor	r4,r4,r8,ror#25
+	veor	q13,q7,q4
+	eor	r5,r5,r9,ror#25
+	veor	q14,q11,q8
+	str	r8,[sp,#4*(16+8)]
+	vshr.u32	q3,q12,#24
+	ldr	r8,[sp,#4*(16+10)]
+	vshr.u32	q7,q13,#24
+	add	r2,r2,r6
+	vshr.u32	q11,q14,#24
+	mov	r14,r14,ror#16
+	vsli.32	q3,q12,#8
+	str	r9,[sp,#4*(16+9)]
+	vsli.32	q7,q13,#8
+	ldr	r9,[sp,#4*(16+11)]
+	vsli.32	q11,q14,#8
+	add	r3,r3,r7
+	vadd.i32	q2,q2,q3
+	mov	r10,r10,ror#16
+	vadd.i32	q6,q6,q7
+	eor	r14,r14,r2,ror#16
+	vadd.i32	q10,q10,q11
+	eor	r10,r10,r3,ror#16
+	veor	q12,q1,q2
+	add	r8,r8,r14
+	veor	q13,q5,q6
+	mov	r6,r6,ror#20
+	veor	q14,q9,q10
+	add	r9,r9,r10
+	vshr.u32	q1,q12,#25
+	mov	r7,r7,ror#20
+	vshr.u32	q5,q13,#25
+	eor	r6,r6,r8,ror#20
+	vshr.u32	q9,q14,#25
+	eor	r7,r7,r9,ror#20
+	vsli.32	q1,q12,#7
+	add	r2,r2,r6
+	vsli.32	q5,q13,#7
+	mov	r14,r14,ror#24
+	vsli.32	q9,q14,#7
+	add	r3,r3,r7
+	vext.8	q2,q2,q2,#8
+	mov	r10,r10,ror#24
+	vext.8	q6,q6,q6,#8
+	eor	r14,r14,r2,ror#24
+	vext.8	q10,q10,q10,#8
+	eor	r10,r10,r3,ror#24
+	vext.8	q1,q1,q1,#4
+	add	r8,r8,r14
+	vext.8	q5,q5,q5,#4
+	mov	r6,r6,ror#25
+	vext.8	q9,q9,q9,#4
+	add	r9,r9,r10
+	vext.8	q3,q3,q3,#12
+	mov	r7,r7,ror#25
+	vext.8	q7,q7,q7,#12
+	eor	r6,r6,r8,ror#25
+	vext.8	q11,q11,q11,#12
+	eor	r7,r7,r9,ror#25
+	vadd.i32	q0,q0,q1
+	add	r0,r0,r5
+	vadd.i32	q4,q4,q5
+	mov	r10,r10,ror#16
+	vadd.i32	q8,q8,q9
+	add	r1,r1,r6
+	veor	q3,q3,q0
+	mov	r12,r12,ror#16
+	veor	q7,q7,q4
+	eor	r10,r10,r0,ror#16
+	veor	q11,q11,q8
+	eor	r12,r12,r1,ror#16
+	vrev32.16	q3,q3
+	add	r8,r8,r10
+	vrev32.16	q7,q7
+	mov	r5,r5,ror#20
+	vrev32.16	q11,q11
+	add	r9,r9,r12
+	vadd.i32	q2,q2,q3
+	mov	r6,r6,ror#20
+	vadd.i32	q6,q6,q7
+	eor	r5,r5,r8,ror#20
+	vadd.i32	q10,q10,q11
+	eor	r6,r6,r9,ror#20
+	veor	q12,q1,q2
+	add	r0,r0,r5
+	veor	q13,q5,q6
+	mov	r10,r10,ror#24
+	veor	q14,q9,q10
+	add	r1,r1,r6
+	vshr.u32	q1,q12,#20
+	mov	r12,r12,ror#24
+	vshr.u32	q5,q13,#20
+	eor	r10,r10,r0,ror#24
+	vshr.u32	q9,q14,#20
+	eor	r12,r12,r1,ror#24
+	vsli.32	q1,q12,#12
+	add	r8,r8,r10
+	vsli.32	q5,q13,#12
+	mov	r5,r5,ror#25
+	vsli.32	q9,q14,#12
+	str	r10,[sp,#4*(16+15)]
+	vadd.i32	q0,q0,q1
+	ldr	r10,[sp,#4*(16+13)]
+	vadd.i32	q4,q4,q5
+	add	r9,r9,r12
+	vadd.i32	q8,q8,q9
+	mov	r6,r6,ror#25
+	veor	q12,q3,q0
+	eor	r5,r5,r8,ror#25
+	veor	q13,q7,q4
+	eor	r6,r6,r9,ror#25
+	veor	q14,q11,q8
+	str	r8,[sp,#4*(16+10)]
+	vshr.u32	q3,q12,#24
+	ldr	r8,[sp,#4*(16+8)]
+	vshr.u32	q7,q13,#24
+	add	r2,r2,r7
+	vshr.u32	q11,q14,#24
+	mov	r10,r10,ror#16
+	vsli.32	q3,q12,#8
+	str	r9,[sp,#4*(16+11)]
+	vsli.32	q7,q13,#8
+	ldr	r9,[sp,#4*(16+9)]
+	vsli.32	q11,q14,#8
+	add	r3,r3,r4
+	vadd.i32	q2,q2,q3
+	mov	r14,r14,ror#16
+	vadd.i32	q6,q6,q7
+	eor	r10,r10,r2,ror#16
+	vadd.i32	q10,q10,q11
+	eor	r14,r14,r3,ror#16
+	veor	q12,q1,q2
+	add	r8,r8,r10
+	veor	q13,q5,q6
+	mov	r7,r7,ror#20
+	veor	q14,q9,q10
+	add	r9,r9,r14
+	vshr.u32	q1,q12,#25
+	mov	r4,r4,ror#20
+	vshr.u32	q5,q13,#25
+	eor	r7,r7,r8,ror#20
+	vshr.u32	q9,q14,#25
+	eor	r4,r4,r9,ror#20
+	vsli.32	q1,q12,#7
+	add	r2,r2,r7
+	vsli.32	q5,q13,#7
+	mov	r10,r10,ror#24
+	vsli.32	q9,q14,#7
+	add	r3,r3,r4
+	vext.8	q2,q2,q2,#8
+	mov	r14,r14,ror#24
+	vext.8	q6,q6,q6,#8
+	eor	r10,r10,r2,ror#24
+	vext.8	q10,q10,q10,#8
+	eor	r14,r14,r3,ror#24
+	vext.8	q1,q1,q1,#12
+	add	r8,r8,r10
+	vext.8	q5,q5,q5,#12
+	mov	r7,r7,ror#25
+	vext.8	q9,q9,q9,#12
+	add	r9,r9,r14
+	vext.8	q3,q3,q3,#4
+	mov	r4,r4,ror#25
+	vext.8	q7,q7,q7,#4
+	eor	r7,r7,r8,ror#25
+	vext.8	q11,q11,q11,#4
+	eor	r4,r4,r9,ror#25
+	bne		.Loop_neon
+
+	add		r11,sp,#32
+	vld1.32		{q12-q13},[sp]		@ load key material
+	vld1.32		{q14-q15},[r11]
+
+	ldr		r11,[sp,#4*(32+2)]	@ load len
+
+	str		r8, [sp,#4*(16+8)]	@ modulo-scheduled store
+	str		r9, [sp,#4*(16+9)]
+	str		r12,[sp,#4*(16+12)]
+	str		r10, [sp,#4*(16+13)]
+	str		r14,[sp,#4*(16+14)]
+
+	@ at this point we have first half of 512-bit result in
+	@ rx and second half at sp+4*(16+8)
+
+	ldr		r12,[sp,#4*(32+1)]	@ load inp
+	ldr		r14,[sp,#4*(32+0)]	@ load out
+
+	vadd.i32	q0,q0,q12		@ accumulate key material
+	vadd.i32	q4,q4,q12
+	vadd.i32	q8,q8,q12
+	vldr		d24,[sp,#4*(16+0)]	@ one
+
+	vadd.i32	q1,q1,q13
+	vadd.i32	q5,q5,q13
+	vadd.i32	q9,q9,q13
+	vldr		d26,[sp,#4*(16+2)]	@ two
+
+	vadd.i32	q2,q2,q14
+	vadd.i32	q6,q6,q14
+	vadd.i32	q10,q10,q14
+	vadd.i32	d14,d14,d24	@ counter+1
+	vadd.i32	d22,d22,d26	@ counter+2
+
+	vadd.i32	q3,q3,q15
+	vadd.i32	q7,q7,q15
+	vadd.i32	q11,q11,q15
+
+	cmp		r11,#64*4
+	blo		.Ltail_neon
+
+	vld1.8		{q12-q13},[r12]!	@ load input
+	 mov		r11,sp
+	vld1.8		{q14-q15},[r12]!
+	veor		q0,q0,q12		@ xor with input
+	veor		q1,q1,q13
+	vld1.8		{q12-q13},[r12]!
+	veor		q2,q2,q14
+	veor		q3,q3,q15
+	vld1.8		{q14-q15},[r12]!
+
+	veor		q4,q4,q12
+	 vst1.8		{q0-q1},[r14]!	@ store output
+	veor		q5,q5,q13
+	vld1.8		{q12-q13},[r12]!
+	veor		q6,q6,q14
+	 vst1.8		{q2-q3},[r14]!
+	veor		q7,q7,q15
+	vld1.8		{q14-q15},[r12]!
+
+	veor		q8,q8,q12
+	 vld1.32	{q0-q1},[r11]!	@ load for next iteration
+	 veor		d25,d25,d25
+	 vldr		d24,[sp,#4*(16+4)]	@ four
+	veor		q9,q9,q13
+	 vld1.32	{q2-q3},[r11]
+	veor		q10,q10,q14
+	 vst1.8		{q4-q5},[r14]!
+	veor		q11,q11,q15
+	 vst1.8		{q6-q7},[r14]!
+
+	vadd.i32	d6,d6,d24	@ next counter value
+	vldr		d24,[sp,#4*(16+0)]	@ one
+
+	ldmia		sp,{r8-r11}	@ load key material
+	add		r0,r0,r8	@ accumulate key material
+	ldr		r8,[r12],#16		@ load input
+	 vst1.8		{q8-q9},[r14]!
+	add		r1,r1,r9
+	ldr		r9,[r12,#-12]
+	 vst1.8		{q10-q11},[r14]!
+	add		r2,r2,r10
+	ldr		r10,[r12,#-8]
+	add		r3,r3,r11
+	ldr		r11,[r12,#-4]
+#ifdef	__ARMEB__
+	rev		r0,r0
+	rev		r1,r1
+	rev		r2,r2
+	rev		r3,r3
+#endif
+	eor		r0,r0,r8	@ xor with input
+	 add		r8,sp,#4*(4)
+	eor		r1,r1,r9
+	str		r0,[r14],#16		@ store output
+	eor		r2,r2,r10
+	str		r1,[r14,#-12]
+	eor		r3,r3,r11
+	 ldmia		r8,{r8-r11}	@ load key material
+	str		r2,[r14,#-8]
+	str		r3,[r14,#-4]
+
+	add		r4,r4,r8	@ accumulate key material
+	ldr		r8,[r12],#16		@ load input
+	add		r5,r5,r9
+	ldr		r9,[r12,#-12]
+	add		r6,r6,r10
+	ldr		r10,[r12,#-8]
+	add		r7,r7,r11
+	ldr		r11,[r12,#-4]
+#ifdef	__ARMEB__
+	rev		r4,r4
+	rev		r5,r5
+	rev		r6,r6
+	rev		r7,r7
+#endif
+	eor		r4,r4,r8
+	 add		r8,sp,#4*(8)
+	eor		r5,r5,r9
+	str		r4,[r14],#16		@ store output
+	eor		r6,r6,r10
+	str		r5,[r14,#-12]
+	eor		r7,r7,r11
+	 ldmia		r8,{r8-r11}	@ load key material
+	str		r6,[r14,#-8]
+	 add		r0,sp,#4*(16+8)
+	str		r7,[r14,#-4]
+
+	ldmia		r0,{r0-r7}	@ load second half
+
+	add		r0,r0,r8	@ accumulate key material
+	ldr		r8,[r12],#16		@ load input
+	add		r1,r1,r9
+	ldr		r9,[r12,#-12]
+#ifdef	__thumb2__
+	it	hi
+#endif
+	 strhi		r10,[sp,#4*(16+10)]	@ copy "rx" while at it
+	add		r2,r2,r10
+	ldr		r10,[r12,#-8]
+#ifdef	__thumb2__
+	it	hi
+#endif
+	 strhi		r11,[sp,#4*(16+11)]	@ copy "rx" while at it
+	add		r3,r3,r11
+	ldr		r11,[r12,#-4]
+#ifdef	__ARMEB__
+	rev		r0,r0
+	rev		r1,r1
+	rev		r2,r2
+	rev		r3,r3
+#endif
+	eor		r0,r0,r8
+	 add		r8,sp,#4*(12)
+	eor		r1,r1,r9
+	str		r0,[r14],#16		@ store output
+	eor		r2,r2,r10
+	str		r1,[r14,#-12]
+	eor		r3,r3,r11
+	 ldmia		r8,{r8-r11}	@ load key material
+	str		r2,[r14,#-8]
+	str		r3,[r14,#-4]
+
+	add		r4,r4,r8	@ accumulate key material
+	 add		r8,r8,#4		@ next counter value
+	add		r5,r5,r9
+	 str		r8,[sp,#4*(12)]	@ save next counter value
+	ldr		r8,[r12],#16		@ load input
+	add		r6,r6,r10
+	 add		r4,r4,#3		@ counter+3
+	ldr		r9,[r12,#-12]
+	add		r7,r7,r11
+	ldr		r10,[r12,#-8]
+	ldr		r11,[r12,#-4]
+#ifdef	__ARMEB__
+	rev		r4,r4
+	rev		r5,r5
+	rev		r6,r6
+	rev		r7,r7
+#endif
+	eor		r4,r4,r8
+#ifdef	__thumb2__
+	it	hi
+#endif
+	 ldrhi		r8,[sp,#4*(32+2)]	@ re-load len
+	eor		r5,r5,r9
+	eor		r6,r6,r10
+	str		r4,[r14],#16		@ store output
+	eor		r7,r7,r11
+	str		r5,[r14,#-12]
+	 sub		r11,r8,#64*4	@ len-=64*4
+	str		r6,[r14,#-8]
+	str		r7,[r14,#-4]
+	bhi		.Loop_neon_outer
+
+	b		.Ldone_neon
+
+.align	4
+.Lbreak_neon:
+	@ harmonize NEON and integer-only stack frames: load data
+	@ from NEON frame, but save to integer-only one; distance
+	@ between the two is 4*(32+4+16-32)=4*(20).
+
+	str		r11, [sp,#4*(20+32+2)]	@ save len
+	 add		r11,sp,#4*(32+4)
+	str		r12,   [sp,#4*(20+32+1)]	@ save inp
+	str		r14,   [sp,#4*(20+32+0)]	@ save out
+
+	ldr		r12,[sp,#4*(16+10)]
+	ldr		r14,[sp,#4*(16+11)]
+	 vldmia		r11,{d8-d15}			@ fulfill ABI requirement
+	str		r12,[sp,#4*(20+16+10)]	@ copy "rx"
+	str		r14,[sp,#4*(20+16+11)]	@ copy "rx"
+
+	ldr		r11, [sp,#4*(15)]
+	ldr		r12,[sp,#4*(12)]		@ modulo-scheduled load
+	ldr		r10, [sp,#4*(13)]
+	ldr		r14,[sp,#4*(14)]
+	str		r11, [sp,#4*(20+16+15)]
+	add		r11,sp,#4*(20)
+	vst1.32		{q0-q1},[r11]!		@ copy key
+	add		sp,sp,#4*(20)			@ switch frame
+	vst1.32		{q2-q3},[r11]
+	mov		r11,#10
+	b		.Loop				@ go integer-only
+
+.align	4
+.Ltail_neon:
+	cmp		r11,#64*3
+	bhs		.L192_or_more_neon
+	cmp		r11,#64*2
+	bhs		.L128_or_more_neon
+	cmp		r11,#64*1
+	bhs		.L64_or_more_neon
+
+	add		r8,sp,#4*(8)
+	vst1.8		{q0-q1},[sp]
+	add		r10,sp,#4*(0)
+	vst1.8		{q2-q3},[r8]
+	b		.Loop_tail_neon
+
+.align	4
+.L64_or_more_neon:
+	vld1.8		{q12-q13},[r12]!
+	vld1.8		{q14-q15},[r12]!
+	veor		q0,q0,q12
+	veor		q1,q1,q13
+	veor		q2,q2,q14
+	veor		q3,q3,q15
+	vst1.8		{q0-q1},[r14]!
+	vst1.8		{q2-q3},[r14]!
+
+	beq		.Ldone_neon
+
+	add		r8,sp,#4*(8)
+	vst1.8		{q4-q5},[sp]
+	add		r10,sp,#4*(0)
+	vst1.8		{q6-q7},[r8]
+	sub		r11,r11,#64*1	@ len-=64*1
+	b		.Loop_tail_neon
+
+.align	4
+.L128_or_more_neon:
+	vld1.8		{q12-q13},[r12]!
+	vld1.8		{q14-q15},[r12]!
+	veor		q0,q0,q12
+	veor		q1,q1,q13
+	vld1.8		{q12-q13},[r12]!
+	veor		q2,q2,q14
+	veor		q3,q3,q15
+	vld1.8		{q14-q15},[r12]!
+
+	veor		q4,q4,q12
+	veor		q5,q5,q13
+	 vst1.8		{q0-q1},[r14]!
+	veor		q6,q6,q14
+	 vst1.8		{q2-q3},[r14]!
+	veor		q7,q7,q15
+	vst1.8		{q4-q5},[r14]!
+	vst1.8		{q6-q7},[r14]!
+
+	beq		.Ldone_neon
+
+	add		r8,sp,#4*(8)
+	vst1.8		{q8-q9},[sp]
+	add		r10,sp,#4*(0)
+	vst1.8		{q10-q11},[r8]
+	sub		r11,r11,#64*2	@ len-=64*2
+	b		.Loop_tail_neon
+
+.align	4
+.L192_or_more_neon:
+	vld1.8		{q12-q13},[r12]!
+	vld1.8		{q14-q15},[r12]!
+	veor		q0,q0,q12
+	veor		q1,q1,q13
+	vld1.8		{q12-q13},[r12]!
+	veor		q2,q2,q14
+	veor		q3,q3,q15
+	vld1.8		{q14-q15},[r12]!
+
+	veor		q4,q4,q12
+	veor		q5,q5,q13
+	vld1.8		{q12-q13},[r12]!
+	veor		q6,q6,q14
+	 vst1.8		{q0-q1},[r14]!
+	veor		q7,q7,q15
+	vld1.8		{q14-q15},[r12]!
+
+	veor		q8,q8,q12
+	 vst1.8		{q2-q3},[r14]!
+	veor		q9,q9,q13
+	 vst1.8		{q4-q5},[r14]!
+	veor		q10,q10,q14
+	 vst1.8		{q6-q7},[r14]!
+	veor		q11,q11,q15
+	vst1.8		{q8-q9},[r14]!
+	vst1.8		{q10-q11},[r14]!
+
+	beq		.Ldone_neon
+
+	ldmia		sp,{r8-r11}	@ load key material
+	add		r0,r0,r8	@ accumulate key material
+	 add		r8,sp,#4*(4)
+	add		r1,r1,r9
+	add		r2,r2,r10
+	add		r3,r3,r11
+	 ldmia		r8,{r8-r11}	@ load key material
+
+	add		r4,r4,r8	@ accumulate key material
+	 add		r8,sp,#4*(8)
+	add		r5,r5,r9
+	add		r6,r6,r10
+	add		r7,r7,r11
+	 ldmia		r8,{r8-r11}	@ load key material
+#ifdef	__ARMEB__
+	rev		r0,r0
+	rev		r1,r1
+	rev		r2,r2
+	rev		r3,r3
+	rev		r4,r4
+	rev		r5,r5
+	rev		r6,r6
+	rev		r7,r7
+#endif
+	stmia		sp,{r0-r7}
+	 add		r0,sp,#4*(16+8)
+
+	ldmia		r0,{r0-r7}	@ load second half
+
+	add		r0,r0,r8	@ accumulate key material
+	 add		r8,sp,#4*(12)
+	add		r1,r1,r9
+	add		r2,r2,r10
+	add		r3,r3,r11
+	 ldmia		r8,{r8-r11}	@ load key material
+
+	add		r4,r4,r8	@ accumulate key material
+	 add		r8,sp,#4*(8)
+	add		r5,r5,r9
+	 add		r4,r4,#3		@ counter+3
+	add		r6,r6,r10
+	add		r7,r7,r11
+	 ldr		r11,[sp,#4*(32+2)]	@ re-load len
+#ifdef	__ARMEB__
+	rev		r0,r0
+	rev		r1,r1
+	rev		r2,r2
+	rev		r3,r3
+	rev		r4,r4
+	rev		r5,r5
+	rev		r6,r6
+	rev		r7,r7
+#endif
+	stmia		r8,{r0-r7}
+	 add		r10,sp,#4*(0)
+	 sub		r11,r11,#64*3	@ len-=64*3
+
+.Loop_tail_neon:
+	ldrb		r8,[r10],#1	@ read buffer on stack
+	ldrb		r9,[r12],#1		@ read input
+	subs		r11,r11,#1
+	eor		r8,r8,r9
+	strb		r8,[r14],#1		@ store output
+	bne		.Loop_tail_neon
+
+.Ldone_neon:
+	add		sp,sp,#4*(32+4)
+	vldmia		sp,{d8-d15}
+	add		sp,sp,#4*(16+3)
+.Lno_data_neon:
+	ldmia		sp!,{r4-r11,pc}
+ENDPROC(chacha20_neon)
+#endif
diff --git a/lib/zinc/chacha20/chacha20-arm64.S b/lib/zinc/chacha20/chacha20-arm64.S
new file mode 100644
index 000000000000..f90162c32a33
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-arm64.S
@@ -0,0 +1,1942 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#include <linux/linkage.h>
+
+.text
+.align	5
+.Lsigma:
+.quad	0x3320646e61707865,0x6b20657479622d32		// endian-neutral
+.Lone:
+.long	1,0,0,0
+
+.align	5
+ENTRY(chacha20_arm)
+	cbz	x2,.Labort
+.Lshort:
+	stp	x29,x30,[sp,#-96]!
+	add	x29,sp,#0
+
+	adr	x5,.Lsigma
+	stp	x19,x20,[sp,#16]
+	stp	x21,x22,[sp,#32]
+	stp	x23,x24,[sp,#48]
+	stp	x25,x26,[sp,#64]
+	stp	x27,x28,[sp,#80]
+	sub	sp,sp,#64
+
+	ldp	x22,x23,[x5]		// load sigma
+	ldp	x24,x25,[x3]		// load key
+	ldp	x26,x27,[x3,#16]
+	ldp	x28,x30,[x4]		// load counter
+#ifdef	__ARMEB__
+	ror	x24,x24,#32
+	ror	x25,x25,#32
+	ror	x26,x26,#32
+	ror	x27,x27,#32
+	ror	x28,x28,#32
+	ror	x30,x30,#32
+#endif
+
+.Loop_outer:
+	mov	w5,w22			// unpack key block
+	lsr	x6,x22,#32
+	mov	w7,w23
+	lsr	x8,x23,#32
+	mov	w9,w24
+	lsr	x10,x24,#32
+	mov	w11,w25
+	lsr	x12,x25,#32
+	mov	w13,w26
+	lsr	x14,x26,#32
+	mov	w15,w27
+	lsr	x16,x27,#32
+	mov	w17,w28
+	lsr	x19,x28,#32
+	mov	w20,w30
+	lsr	x21,x30,#32
+
+	mov	x4,#10
+	subs	x2,x2,#64
+.Loop:
+	sub	x4,x4,#1
+	add	w5,w5,w9
+	add	w6,w6,w10
+	add	w7,w7,w11
+	add	w8,w8,w12
+	eor	w17,w17,w5
+	eor	w19,w19,w6
+	eor	w20,w20,w7
+	eor	w21,w21,w8
+	ror	w17,w17,#16
+	ror	w19,w19,#16
+	ror	w20,w20,#16
+	ror	w21,w21,#16
+	add	w13,w13,w17
+	add	w14,w14,w19
+	add	w15,w15,w20
+	add	w16,w16,w21
+	eor	w9,w9,w13
+	eor	w10,w10,w14
+	eor	w11,w11,w15
+	eor	w12,w12,w16
+	ror	w9,w9,#20
+	ror	w10,w10,#20
+	ror	w11,w11,#20
+	ror	w12,w12,#20
+	add	w5,w5,w9
+	add	w6,w6,w10
+	add	w7,w7,w11
+	add	w8,w8,w12
+	eor	w17,w17,w5
+	eor	w19,w19,w6
+	eor	w20,w20,w7
+	eor	w21,w21,w8
+	ror	w17,w17,#24
+	ror	w19,w19,#24
+	ror	w20,w20,#24
+	ror	w21,w21,#24
+	add	w13,w13,w17
+	add	w14,w14,w19
+	add	w15,w15,w20
+	add	w16,w16,w21
+	eor	w9,w9,w13
+	eor	w10,w10,w14
+	eor	w11,w11,w15
+	eor	w12,w12,w16
+	ror	w9,w9,#25
+	ror	w10,w10,#25
+	ror	w11,w11,#25
+	ror	w12,w12,#25
+	add	w5,w5,w10
+	add	w6,w6,w11
+	add	w7,w7,w12
+	add	w8,w8,w9
+	eor	w21,w21,w5
+	eor	w17,w17,w6
+	eor	w19,w19,w7
+	eor	w20,w20,w8
+	ror	w21,w21,#16
+	ror	w17,w17,#16
+	ror	w19,w19,#16
+	ror	w20,w20,#16
+	add	w15,w15,w21
+	add	w16,w16,w17
+	add	w13,w13,w19
+	add	w14,w14,w20
+	eor	w10,w10,w15
+	eor	w11,w11,w16
+	eor	w12,w12,w13
+	eor	w9,w9,w14
+	ror	w10,w10,#20
+	ror	w11,w11,#20
+	ror	w12,w12,#20
+	ror	w9,w9,#20
+	add	w5,w5,w10
+	add	w6,w6,w11
+	add	w7,w7,w12
+	add	w8,w8,w9
+	eor	w21,w21,w5
+	eor	w17,w17,w6
+	eor	w19,w19,w7
+	eor	w20,w20,w8
+	ror	w21,w21,#24
+	ror	w17,w17,#24
+	ror	w19,w19,#24
+	ror	w20,w20,#24
+	add	w15,w15,w21
+	add	w16,w16,w17
+	add	w13,w13,w19
+	add	w14,w14,w20
+	eor	w10,w10,w15
+	eor	w11,w11,w16
+	eor	w12,w12,w13
+	eor	w9,w9,w14
+	ror	w10,w10,#25
+	ror	w11,w11,#25
+	ror	w12,w12,#25
+	ror	w9,w9,#25
+	cbnz	x4,.Loop
+
+	add	w5,w5,w22		// accumulate key block
+	add	x6,x6,x22,lsr#32
+	add	w7,w7,w23
+	add	x8,x8,x23,lsr#32
+	add	w9,w9,w24
+	add	x10,x10,x24,lsr#32
+	add	w11,w11,w25
+	add	x12,x12,x25,lsr#32
+	add	w13,w13,w26
+	add	x14,x14,x26,lsr#32
+	add	w15,w15,w27
+	add	x16,x16,x27,lsr#32
+	add	w17,w17,w28
+	add	x19,x19,x28,lsr#32
+	add	w20,w20,w30
+	add	x21,x21,x30,lsr#32
+
+	b.lo	.Ltail
+
+	add	x5,x5,x6,lsl#32	// pack
+	add	x7,x7,x8,lsl#32
+	ldp	x6,x8,[x1,#0]		// load input
+	add	x9,x9,x10,lsl#32
+	add	x11,x11,x12,lsl#32
+	ldp	x10,x12,[x1,#16]
+	add	x13,x13,x14,lsl#32
+	add	x15,x15,x16,lsl#32
+	ldp	x14,x16,[x1,#32]
+	add	x17,x17,x19,lsl#32
+	add	x20,x20,x21,lsl#32
+	ldp	x19,x21,[x1,#48]
+	add	x1,x1,#64
+#ifdef	__ARMEB__
+	rev	x5,x5
+	rev	x7,x7
+	rev	x9,x9
+	rev	x11,x11
+	rev	x13,x13
+	rev	x15,x15
+	rev	x17,x17
+	rev	x20,x20
+#endif
+	eor	x5,x5,x6
+	eor	x7,x7,x8
+	eor	x9,x9,x10
+	eor	x11,x11,x12
+	eor	x13,x13,x14
+	eor	x15,x15,x16
+	eor	x17,x17,x19
+	eor	x20,x20,x21
+
+	stp	x5,x7,[x0,#0]		// store output
+	add	x28,x28,#1			// increment counter
+	stp	x9,x11,[x0,#16]
+	stp	x13,x15,[x0,#32]
+	stp	x17,x20,[x0,#48]
+	add	x0,x0,#64
+
+	b.hi	.Loop_outer
+
+	ldp	x19,x20,[x29,#16]
+	add	sp,sp,#64
+	ldp	x21,x22,[x29,#32]
+	ldp	x23,x24,[x29,#48]
+	ldp	x25,x26,[x29,#64]
+	ldp	x27,x28,[x29,#80]
+	ldp	x29,x30,[sp],#96
+.Labort:
+	ret
+
+.align	4
+.Ltail:
+	add	x2,x2,#64
+.Less_than_64:
+	sub	x0,x0,#1
+	add	x1,x1,x2
+	add	x0,x0,x2
+	add	x4,sp,x2
+	neg	x2,x2
+
+	add	x5,x5,x6,lsl#32	// pack
+	add	x7,x7,x8,lsl#32
+	add	x9,x9,x10,lsl#32
+	add	x11,x11,x12,lsl#32
+	add	x13,x13,x14,lsl#32
+	add	x15,x15,x16,lsl#32
+	add	x17,x17,x19,lsl#32
+	add	x20,x20,x21,lsl#32
+#ifdef	__ARMEB__
+	rev	x5,x5
+	rev	x7,x7
+	rev	x9,x9
+	rev	x11,x11
+	rev	x13,x13
+	rev	x15,x15
+	rev	x17,x17
+	rev	x20,x20
+#endif
+	stp	x5,x7,[sp,#0]
+	stp	x9,x11,[sp,#16]
+	stp	x13,x15,[sp,#32]
+	stp	x17,x20,[sp,#48]
+
+.Loop_tail:
+	ldrb	w10,[x1,x2]
+	ldrb	w11,[x4,x2]
+	add	x2,x2,#1
+	eor	w10,w10,w11
+	strb	w10,[x0,x2]
+	cbnz	x2,.Loop_tail
+
+	stp	xzr,xzr,[sp,#0]
+	stp	xzr,xzr,[sp,#16]
+	stp	xzr,xzr,[sp,#32]
+	stp	xzr,xzr,[sp,#48]
+
+	ldp	x19,x20,[x29,#16]
+	add	sp,sp,#64
+	ldp	x21,x22,[x29,#32]
+	ldp	x23,x24,[x29,#48]
+	ldp	x25,x26,[x29,#64]
+	ldp	x27,x28,[x29,#80]
+	ldp	x29,x30,[sp],#96
+	ret
+ENDPROC(chacha20_arm)
+
+.align	5
+ENTRY(chacha20_neon)
+	cbz	x2,.Labort_neon
+	cmp	x2,#192
+	b.lo	.Lshort
+
+	stp	x29,x30,[sp,#-96]!
+	add	x29,sp,#0
+
+	adr	x5,.Lsigma
+	stp	x19,x20,[sp,#16]
+	stp	x21,x22,[sp,#32]
+	stp	x23,x24,[sp,#48]
+	stp	x25,x26,[sp,#64]
+	stp	x27,x28,[sp,#80]
+	cmp	x2,#512
+	b.hs	.L512_or_more_neon
+
+	sub	sp,sp,#64
+
+	ldp	x22,x23,[x5]		// load sigma
+	ld1	{v24.4s},[x5],#16
+	ldp	x24,x25,[x3]		// load key
+	ldp	x26,x27,[x3,#16]
+	ld1	{v25.4s,v26.4s},[x3]
+	ldp	x28,x30,[x4]		// load counter
+	ld1	{v27.4s},[x4]
+	ld1	{v31.4s},[x5]
+#ifdef	__ARMEB__
+	rev64	v24.4s,v24.4s
+	ror	x24,x24,#32
+	ror	x25,x25,#32
+	ror	x26,x26,#32
+	ror	x27,x27,#32
+	ror	x28,x28,#32
+	ror	x30,x30,#32
+#endif
+	add	v27.4s,v27.4s,v31.4s		// += 1
+	add	v28.4s,v27.4s,v31.4s
+	add	v29.4s,v28.4s,v31.4s
+	shl	v31.4s,v31.4s,#2			// 1 -> 4
+
+.Loop_outer_neon:
+	mov	w5,w22			// unpack key block
+	lsr	x6,x22,#32
+	mov	v0.16b,v24.16b
+	mov	w7,w23
+	lsr	x8,x23,#32
+	mov	v4.16b,v24.16b
+	mov	w9,w24
+	lsr	x10,x24,#32
+	mov	v16.16b,v24.16b
+	mov	w11,w25
+	mov	v1.16b,v25.16b
+	lsr	x12,x25,#32
+	mov	v5.16b,v25.16b
+	mov	w13,w26
+	mov	v17.16b,v25.16b
+	lsr	x14,x26,#32
+	mov	v3.16b,v27.16b
+	mov	w15,w27
+	mov	v7.16b,v28.16b
+	lsr	x16,x27,#32
+	mov	v19.16b,v29.16b
+	mov	w17,w28
+	mov	v2.16b,v26.16b
+	lsr	x19,x28,#32
+	mov	v6.16b,v26.16b
+	mov	w20,w30
+	mov	v18.16b,v26.16b
+	lsr	x21,x30,#32
+
+	mov	x4,#10
+	subs	x2,x2,#256
+.Loop_neon:
+	sub	x4,x4,#1
+	add	v0.4s,v0.4s,v1.4s
+	add	w5,w5,w9
+	add	v4.4s,v4.4s,v5.4s
+	add	w6,w6,w10
+	add	v16.4s,v16.4s,v17.4s
+	add	w7,w7,w11
+	eor	v3.16b,v3.16b,v0.16b
+	add	w8,w8,w12
+	eor	v7.16b,v7.16b,v4.16b
+	eor	w17,w17,w5
+	eor	v19.16b,v19.16b,v16.16b
+	eor	w19,w19,w6
+	rev32	v3.8h,v3.8h
+	eor	w20,w20,w7
+	rev32	v7.8h,v7.8h
+	eor	w21,w21,w8
+	rev32	v19.8h,v19.8h
+	ror	w17,w17,#16
+	add	v2.4s,v2.4s,v3.4s
+	ror	w19,w19,#16
+	add	v6.4s,v6.4s,v7.4s
+	ror	w20,w20,#16
+	add	v18.4s,v18.4s,v19.4s
+	ror	w21,w21,#16
+	eor	v20.16b,v1.16b,v2.16b
+	add	w13,w13,w17
+	eor	v21.16b,v5.16b,v6.16b
+	add	w14,w14,w19
+	eor	v22.16b,v17.16b,v18.16b
+	add	w15,w15,w20
+	ushr	v1.4s,v20.4s,#20
+	add	w16,w16,w21
+	ushr	v5.4s,v21.4s,#20
+	eor	w9,w9,w13
+	ushr	v17.4s,v22.4s,#20
+	eor	w10,w10,w14
+	sli	v1.4s,v20.4s,#12
+	eor	w11,w11,w15
+	sli	v5.4s,v21.4s,#12
+	eor	w12,w12,w16
+	sli	v17.4s,v22.4s,#12
+	ror	w9,w9,#20
+	add	v0.4s,v0.4s,v1.4s
+	ror	w10,w10,#20
+	add	v4.4s,v4.4s,v5.4s
+	ror	w11,w11,#20
+	add	v16.4s,v16.4s,v17.4s
+	ror	w12,w12,#20
+	eor	v20.16b,v3.16b,v0.16b
+	add	w5,w5,w9
+	eor	v21.16b,v7.16b,v4.16b
+	add	w6,w6,w10
+	eor	v22.16b,v19.16b,v16.16b
+	add	w7,w7,w11
+	ushr	v3.4s,v20.4s,#24
+	add	w8,w8,w12
+	ushr	v7.4s,v21.4s,#24
+	eor	w17,w17,w5
+	ushr	v19.4s,v22.4s,#24
+	eor	w19,w19,w6
+	sli	v3.4s,v20.4s,#8
+	eor	w20,w20,w7
+	sli	v7.4s,v21.4s,#8
+	eor	w21,w21,w8
+	sli	v19.4s,v22.4s,#8
+	ror	w17,w17,#24
+	add	v2.4s,v2.4s,v3.4s
+	ror	w19,w19,#24
+	add	v6.4s,v6.4s,v7.4s
+	ror	w20,w20,#24
+	add	v18.4s,v18.4s,v19.4s
+	ror	w21,w21,#24
+	eor	v20.16b,v1.16b,v2.16b
+	add	w13,w13,w17
+	eor	v21.16b,v5.16b,v6.16b
+	add	w14,w14,w19
+	eor	v22.16b,v17.16b,v18.16b
+	add	w15,w15,w20
+	ushr	v1.4s,v20.4s,#25
+	add	w16,w16,w21
+	ushr	v5.4s,v21.4s,#25
+	eor	w9,w9,w13
+	ushr	v17.4s,v22.4s,#25
+	eor	w10,w10,w14
+	sli	v1.4s,v20.4s,#7
+	eor	w11,w11,w15
+	sli	v5.4s,v21.4s,#7
+	eor	w12,w12,w16
+	sli	v17.4s,v22.4s,#7
+	ror	w9,w9,#25
+	ext	v2.16b,v2.16b,v2.16b,#8
+	ror	w10,w10,#25
+	ext	v6.16b,v6.16b,v6.16b,#8
+	ror	w11,w11,#25
+	ext	v18.16b,v18.16b,v18.16b,#8
+	ror	w12,w12,#25
+	ext	v3.16b,v3.16b,v3.16b,#12
+	ext	v7.16b,v7.16b,v7.16b,#12
+	ext	v19.16b,v19.16b,v19.16b,#12
+	ext	v1.16b,v1.16b,v1.16b,#4
+	ext	v5.16b,v5.16b,v5.16b,#4
+	ext	v17.16b,v17.16b,v17.16b,#4
+	add	v0.4s,v0.4s,v1.4s
+	add	w5,w5,w10
+	add	v4.4s,v4.4s,v5.4s
+	add	w6,w6,w11
+	add	v16.4s,v16.4s,v17.4s
+	add	w7,w7,w12
+	eor	v3.16b,v3.16b,v0.16b
+	add	w8,w8,w9
+	eor	v7.16b,v7.16b,v4.16b
+	eor	w21,w21,w5
+	eor	v19.16b,v19.16b,v16.16b
+	eor	w17,w17,w6
+	rev32	v3.8h,v3.8h
+	eor	w19,w19,w7
+	rev32	v7.8h,v7.8h
+	eor	w20,w20,w8
+	rev32	v19.8h,v19.8h
+	ror	w21,w21,#16
+	add	v2.4s,v2.4s,v3.4s
+	ror	w17,w17,#16
+	add	v6.4s,v6.4s,v7.4s
+	ror	w19,w19,#16
+	add	v18.4s,v18.4s,v19.4s
+	ror	w20,w20,#16
+	eor	v20.16b,v1.16b,v2.16b
+	add	w15,w15,w21
+	eor	v21.16b,v5.16b,v6.16b
+	add	w16,w16,w17
+	eor	v22.16b,v17.16b,v18.16b
+	add	w13,w13,w19
+	ushr	v1.4s,v20.4s,#20
+	add	w14,w14,w20
+	ushr	v5.4s,v21.4s,#20
+	eor	w10,w10,w15
+	ushr	v17.4s,v22.4s,#20
+	eor	w11,w11,w16
+	sli	v1.4s,v20.4s,#12
+	eor	w12,w12,w13
+	sli	v5.4s,v21.4s,#12
+	eor	w9,w9,w14
+	sli	v17.4s,v22.4s,#12
+	ror	w10,w10,#20
+	add	v0.4s,v0.4s,v1.4s
+	ror	w11,w11,#20
+	add	v4.4s,v4.4s,v5.4s
+	ror	w12,w12,#20
+	add	v16.4s,v16.4s,v17.4s
+	ror	w9,w9,#20
+	eor	v20.16b,v3.16b,v0.16b
+	add	w5,w5,w10
+	eor	v21.16b,v7.16b,v4.16b
+	add	w6,w6,w11
+	eor	v22.16b,v19.16b,v16.16b
+	add	w7,w7,w12
+	ushr	v3.4s,v20.4s,#24
+	add	w8,w8,w9
+	ushr	v7.4s,v21.4s,#24
+	eor	w21,w21,w5
+	ushr	v19.4s,v22.4s,#24
+	eor	w17,w17,w6
+	sli	v3.4s,v20.4s,#8
+	eor	w19,w19,w7
+	sli	v7.4s,v21.4s,#8
+	eor	w20,w20,w8
+	sli	v19.4s,v22.4s,#8
+	ror	w21,w21,#24
+	add	v2.4s,v2.4s,v3.4s
+	ror	w17,w17,#24
+	add	v6.4s,v6.4s,v7.4s
+	ror	w19,w19,#24
+	add	v18.4s,v18.4s,v19.4s
+	ror	w20,w20,#24
+	eor	v20.16b,v1.16b,v2.16b
+	add	w15,w15,w21
+	eor	v21.16b,v5.16b,v6.16b
+	add	w16,w16,w17
+	eor	v22.16b,v17.16b,v18.16b
+	add	w13,w13,w19
+	ushr	v1.4s,v20.4s,#25
+	add	w14,w14,w20
+	ushr	v5.4s,v21.4s,#25
+	eor	w10,w10,w15
+	ushr	v17.4s,v22.4s,#25
+	eor	w11,w11,w16
+	sli	v1.4s,v20.4s,#7
+	eor	w12,w12,w13
+	sli	v5.4s,v21.4s,#7
+	eor	w9,w9,w14
+	sli	v17.4s,v22.4s,#7
+	ror	w10,w10,#25
+	ext	v2.16b,v2.16b,v2.16b,#8
+	ror	w11,w11,#25
+	ext	v6.16b,v6.16b,v6.16b,#8
+	ror	w12,w12,#25
+	ext	v18.16b,v18.16b,v18.16b,#8
+	ror	w9,w9,#25
+	ext	v3.16b,v3.16b,v3.16b,#4
+	ext	v7.16b,v7.16b,v7.16b,#4
+	ext	v19.16b,v19.16b,v19.16b,#4
+	ext	v1.16b,v1.16b,v1.16b,#12
+	ext	v5.16b,v5.16b,v5.16b,#12
+	ext	v17.16b,v17.16b,v17.16b,#12
+	cbnz	x4,.Loop_neon
+
+	add	w5,w5,w22		// accumulate key block
+	add	v0.4s,v0.4s,v24.4s
+	add	x6,x6,x22,lsr#32
+	add	v4.4s,v4.4s,v24.4s
+	add	w7,w7,w23
+	add	v16.4s,v16.4s,v24.4s
+	add	x8,x8,x23,lsr#32
+	add	v2.4s,v2.4s,v26.4s
+	add	w9,w9,w24
+	add	v6.4s,v6.4s,v26.4s
+	add	x10,x10,x24,lsr#32
+	add	v18.4s,v18.4s,v26.4s
+	add	w11,w11,w25
+	add	v3.4s,v3.4s,v27.4s
+	add	x12,x12,x25,lsr#32
+	add	w13,w13,w26
+	add	v7.4s,v7.4s,v28.4s
+	add	x14,x14,x26,lsr#32
+	add	w15,w15,w27
+	add	v19.4s,v19.4s,v29.4s
+	add	x16,x16,x27,lsr#32
+	add	w17,w17,w28
+	add	v1.4s,v1.4s,v25.4s
+	add	x19,x19,x28,lsr#32
+	add	w20,w20,w30
+	add	v5.4s,v5.4s,v25.4s
+	add	x21,x21,x30,lsr#32
+	add	v17.4s,v17.4s,v25.4s
+
+	b.lo	.Ltail_neon
+
+	add	x5,x5,x6,lsl#32	// pack
+	add	x7,x7,x8,lsl#32
+	ldp	x6,x8,[x1,#0]		// load input
+	add	x9,x9,x10,lsl#32
+	add	x11,x11,x12,lsl#32
+	ldp	x10,x12,[x1,#16]
+	add	x13,x13,x14,lsl#32
+	add	x15,x15,x16,lsl#32
+	ldp	x14,x16,[x1,#32]
+	add	x17,x17,x19,lsl#32
+	add	x20,x20,x21,lsl#32
+	ldp	x19,x21,[x1,#48]
+	add	x1,x1,#64
+#ifdef	__ARMEB__
+	rev	x5,x5
+	rev	x7,x7
+	rev	x9,x9
+	rev	x11,x11
+	rev	x13,x13
+	rev	x15,x15
+	rev	x17,x17
+	rev	x20,x20
+#endif
+	ld1	{v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+	eor	x5,x5,x6
+	eor	x7,x7,x8
+	eor	x9,x9,x10
+	eor	x11,x11,x12
+	eor	x13,x13,x14
+	eor	v0.16b,v0.16b,v20.16b
+	eor	x15,x15,x16
+	eor	v1.16b,v1.16b,v21.16b
+	eor	x17,x17,x19
+	eor	v2.16b,v2.16b,v22.16b
+	eor	x20,x20,x21
+	eor	v3.16b,v3.16b,v23.16b
+	ld1	{v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+
+	stp	x5,x7,[x0,#0]		// store output
+	add	x28,x28,#4			// increment counter
+	stp	x9,x11,[x0,#16]
+	add	v27.4s,v27.4s,v31.4s		// += 4
+	stp	x13,x15,[x0,#32]
+	add	v28.4s,v28.4s,v31.4s
+	stp	x17,x20,[x0,#48]
+	add	v29.4s,v29.4s,v31.4s
+	add	x0,x0,#64
+
+	st1	{v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+	ld1	{v0.16b,v1.16b,v2.16b,v3.16b},[x1],#64
+
+	eor	v4.16b,v4.16b,v20.16b
+	eor	v5.16b,v5.16b,v21.16b
+	eor	v6.16b,v6.16b,v22.16b
+	eor	v7.16b,v7.16b,v23.16b
+	st1	{v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+
+	eor	v16.16b,v16.16b,v0.16b
+	eor	v17.16b,v17.16b,v1.16b
+	eor	v18.16b,v18.16b,v2.16b
+	eor	v19.16b,v19.16b,v3.16b
+	st1	{v16.16b,v17.16b,v18.16b,v19.16b},[x0],#64
+
+	b.hi	.Loop_outer_neon
+
+	ldp	x19,x20,[x29,#16]
+	add	sp,sp,#64
+	ldp	x21,x22,[x29,#32]
+	ldp	x23,x24,[x29,#48]
+	ldp	x25,x26,[x29,#64]
+	ldp	x27,x28,[x29,#80]
+	ldp	x29,x30,[sp],#96
+	ret
+
+.Ltail_neon:
+	add	x2,x2,#256
+	cmp	x2,#64
+	b.lo	.Less_than_64
+
+	add	x5,x5,x6,lsl#32	// pack
+	add	x7,x7,x8,lsl#32
+	ldp	x6,x8,[x1,#0]		// load input
+	add	x9,x9,x10,lsl#32
+	add	x11,x11,x12,lsl#32
+	ldp	x10,x12,[x1,#16]
+	add	x13,x13,x14,lsl#32
+	add	x15,x15,x16,lsl#32
+	ldp	x14,x16,[x1,#32]
+	add	x17,x17,x19,lsl#32
+	add	x20,x20,x21,lsl#32
+	ldp	x19,x21,[x1,#48]
+	add	x1,x1,#64
+#ifdef	__ARMEB__
+	rev	x5,x5
+	rev	x7,x7
+	rev	x9,x9
+	rev	x11,x11
+	rev	x13,x13
+	rev	x15,x15
+	rev	x17,x17
+	rev	x20,x20
+#endif
+	eor	x5,x5,x6
+	eor	x7,x7,x8
+	eor	x9,x9,x10
+	eor	x11,x11,x12
+	eor	x13,x13,x14
+	eor	x15,x15,x16
+	eor	x17,x17,x19
+	eor	x20,x20,x21
+
+	stp	x5,x7,[x0,#0]		// store output
+	add	x28,x28,#4			// increment counter
+	stp	x9,x11,[x0,#16]
+	stp	x13,x15,[x0,#32]
+	stp	x17,x20,[x0,#48]
+	add	x0,x0,#64
+	b.eq	.Ldone_neon
+	sub	x2,x2,#64
+	cmp	x2,#64
+	b.lo	.Less_than_128
+
+	ld1	{v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+	eor	v0.16b,v0.16b,v20.16b
+	eor	v1.16b,v1.16b,v21.16b
+	eor	v2.16b,v2.16b,v22.16b
+	eor	v3.16b,v3.16b,v23.16b
+	st1	{v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+	b.eq	.Ldone_neon
+	sub	x2,x2,#64
+	cmp	x2,#64
+	b.lo	.Less_than_192
+
+	ld1	{v20.16b,v21.16b,v22.16b,v23.16b},[x1],#64
+	eor	v4.16b,v4.16b,v20.16b
+	eor	v5.16b,v5.16b,v21.16b
+	eor	v6.16b,v6.16b,v22.16b
+	eor	v7.16b,v7.16b,v23.16b
+	st1	{v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+	b.eq	.Ldone_neon
+	sub	x2,x2,#64
+
+	st1	{v16.16b,v17.16b,v18.16b,v19.16b},[sp]
+	b	.Last_neon
+
+.Less_than_128:
+	st1	{v0.16b,v1.16b,v2.16b,v3.16b},[sp]
+	b	.Last_neon
+.Less_than_192:
+	st1	{v4.16b,v5.16b,v6.16b,v7.16b},[sp]
+	b	.Last_neon
+
+.align	4
+.Last_neon:
+	sub	x0,x0,#1
+	add	x1,x1,x2
+	add	x0,x0,x2
+	add	x4,sp,x2
+	neg	x2,x2
+
+.Loop_tail_neon:
+	ldrb	w10,[x1,x2]
+	ldrb	w11,[x4,x2]
+	add	x2,x2,#1
+	eor	w10,w10,w11
+	strb	w10,[x0,x2]
+	cbnz	x2,.Loop_tail_neon
+
+	stp	xzr,xzr,[sp,#0]
+	stp	xzr,xzr,[sp,#16]
+	stp	xzr,xzr,[sp,#32]
+	stp	xzr,xzr,[sp,#48]
+
+.Ldone_neon:
+	ldp	x19,x20,[x29,#16]
+	add	sp,sp,#64
+	ldp	x21,x22,[x29,#32]
+	ldp	x23,x24,[x29,#48]
+	ldp	x25,x26,[x29,#64]
+	ldp	x27,x28,[x29,#80]
+	ldp	x29,x30,[sp],#96
+	ret
+
+.L512_or_more_neon:
+	sub	sp,sp,#128+64
+
+	ldp	x22,x23,[x5]		// load sigma
+	ld1	{v24.4s},[x5],#16
+	ldp	x24,x25,[x3]		// load key
+	ldp	x26,x27,[x3,#16]
+	ld1	{v25.4s,v26.4s},[x3]
+	ldp	x28,x30,[x4]		// load counter
+	ld1	{v27.4s},[x4]
+	ld1	{v31.4s},[x5]
+#ifdef	__ARMEB__
+	rev64	v24.4s,v24.4s
+	ror	x24,x24,#32
+	ror	x25,x25,#32
+	ror	x26,x26,#32
+	ror	x27,x27,#32
+	ror	x28,x28,#32
+	ror	x30,x30,#32
+#endif
+	add	v27.4s,v27.4s,v31.4s		// += 1
+	stp	q24,q25,[sp,#0]		// off-load key block, invariant part
+	add	v27.4s,v27.4s,v31.4s		// not typo
+	str	q26,[sp,#32]
+	add	v28.4s,v27.4s,v31.4s
+	add	v29.4s,v28.4s,v31.4s
+	add	v30.4s,v29.4s,v31.4s
+	shl	v31.4s,v31.4s,#2			// 1 -> 4
+
+	stp	d8,d9,[sp,#128+0]		// meet ABI requirements
+	stp	d10,d11,[sp,#128+16]
+	stp	d12,d13,[sp,#128+32]
+	stp	d14,d15,[sp,#128+48]
+
+	sub	x2,x2,#512			// not typo
+
+.Loop_outer_512_neon:
+	mov	v0.16b,v24.16b
+	mov	v4.16b,v24.16b
+	mov	v8.16b,v24.16b
+	mov	v12.16b,v24.16b
+	mov	v16.16b,v24.16b
+	mov	v20.16b,v24.16b
+	mov	v1.16b,v25.16b
+	mov	w5,w22			// unpack key block
+	mov	v5.16b,v25.16b
+	lsr	x6,x22,#32
+	mov	v9.16b,v25.16b
+	mov	w7,w23
+	mov	v13.16b,v25.16b
+	lsr	x8,x23,#32
+	mov	v17.16b,v25.16b
+	mov	w9,w24
+	mov	v21.16b,v25.16b
+	lsr	x10,x24,#32
+	mov	v3.16b,v27.16b
+	mov	w11,w25
+	mov	v7.16b,v28.16b
+	lsr	x12,x25,#32
+	mov	v11.16b,v29.16b
+	mov	w13,w26
+	mov	v15.16b,v30.16b
+	lsr	x14,x26,#32
+	mov	v2.16b,v26.16b
+	mov	w15,w27
+	mov	v6.16b,v26.16b
+	lsr	x16,x27,#32
+	add	v19.4s,v3.4s,v31.4s			// +4
+	mov	w17,w28
+	add	v23.4s,v7.4s,v31.4s			// +4
+	lsr	x19,x28,#32
+	mov	v10.16b,v26.16b
+	mov	w20,w30
+	mov	v14.16b,v26.16b
+	lsr	x21,x30,#32
+	mov	v18.16b,v26.16b
+	stp	q27,q28,[sp,#48]		// off-load key block, variable part
+	mov	v22.16b,v26.16b
+	str	q29,[sp,#80]
+
+	mov	x4,#5
+	subs	x2,x2,#512
+.Loop_upper_neon:
+	sub	x4,x4,#1
+	add	v0.4s,v0.4s,v1.4s
+	add	w5,w5,w9
+	add	v4.4s,v4.4s,v5.4s
+	add	w6,w6,w10
+	add	v8.4s,v8.4s,v9.4s
+	add	w7,w7,w11
+	add	v12.4s,v12.4s,v13.4s
+	add	w8,w8,w12
+	add	v16.4s,v16.4s,v17.4s
+	eor	w17,w17,w5
+	add	v20.4s,v20.4s,v21.4s
+	eor	w19,w19,w6
+	eor	v3.16b,v3.16b,v0.16b
+	eor	w20,w20,w7
+	eor	v7.16b,v7.16b,v4.16b
+	eor	w21,w21,w8
+	eor	v11.16b,v11.16b,v8.16b
+	ror	w17,w17,#16
+	eor	v15.16b,v15.16b,v12.16b
+	ror	w19,w19,#16
+	eor	v19.16b,v19.16b,v16.16b
+	ror	w20,w20,#16
+	eor	v23.16b,v23.16b,v20.16b
+	ror	w21,w21,#16
+	rev32	v3.8h,v3.8h
+	add	w13,w13,w17
+	rev32	v7.8h,v7.8h
+	add	w14,w14,w19
+	rev32	v11.8h,v11.8h
+	add	w15,w15,w20
+	rev32	v15.8h,v15.8h
+	add	w16,w16,w21
+	rev32	v19.8h,v19.8h
+	eor	w9,w9,w13
+	rev32	v23.8h,v23.8h
+	eor	w10,w10,w14
+	add	v2.4s,v2.4s,v3.4s
+	eor	w11,w11,w15
+	add	v6.4s,v6.4s,v7.4s
+	eor	w12,w12,w16
+	add	v10.4s,v10.4s,v11.4s
+	ror	w9,w9,#20
+	add	v14.4s,v14.4s,v15.4s
+	ror	w10,w10,#20
+	add	v18.4s,v18.4s,v19.4s
+	ror	w11,w11,#20
+	add	v22.4s,v22.4s,v23.4s
+	ror	w12,w12,#20
+	eor	v24.16b,v1.16b,v2.16b
+	add	w5,w5,w9
+	eor	v25.16b,v5.16b,v6.16b
+	add	w6,w6,w10
+	eor	v26.16b,v9.16b,v10.16b
+	add	w7,w7,w11
+	eor	v27.16b,v13.16b,v14.16b
+	add	w8,w8,w12
+	eor	v28.16b,v17.16b,v18.16b
+	eor	w17,w17,w5
+	eor	v29.16b,v21.16b,v22.16b
+	eor	w19,w19,w6
+	ushr	v1.4s,v24.4s,#20
+	eor	w20,w20,w7
+	ushr	v5.4s,v25.4s,#20
+	eor	w21,w21,w8
+	ushr	v9.4s,v26.4s,#20
+	ror	w17,w17,#24
+	ushr	v13.4s,v27.4s,#20
+	ror	w19,w19,#24
+	ushr	v17.4s,v28.4s,#20
+	ror	w20,w20,#24
+	ushr	v21.4s,v29.4s,#20
+	ror	w21,w21,#24
+	sli	v1.4s,v24.4s,#12
+	add	w13,w13,w17
+	sli	v5.4s,v25.4s,#12
+	add	w14,w14,w19
+	sli	v9.4s,v26.4s,#12
+	add	w15,w15,w20
+	sli	v13.4s,v27.4s,#12
+	add	w16,w16,w21
+	sli	v17.4s,v28.4s,#12
+	eor	w9,w9,w13
+	sli	v21.4s,v29.4s,#12
+	eor	w10,w10,w14
+	add	v0.4s,v0.4s,v1.4s
+	eor	w11,w11,w15
+	add	v4.4s,v4.4s,v5.4s
+	eor	w12,w12,w16
+	add	v8.4s,v8.4s,v9.4s
+	ror	w9,w9,#25
+	add	v12.4s,v12.4s,v13.4s
+	ror	w10,w10,#25
+	add	v16.4s,v16.4s,v17.4s
+	ror	w11,w11,#25
+	add	v20.4s,v20.4s,v21.4s
+	ror	w12,w12,#25
+	eor	v24.16b,v3.16b,v0.16b
+	add	w5,w5,w10
+	eor	v25.16b,v7.16b,v4.16b
+	add	w6,w6,w11
+	eor	v26.16b,v11.16b,v8.16b
+	add	w7,w7,w12
+	eor	v27.16b,v15.16b,v12.16b
+	add	w8,w8,w9
+	eor	v28.16b,v19.16b,v16.16b
+	eor	w21,w21,w5
+	eor	v29.16b,v23.16b,v20.16b
+	eor	w17,w17,w6
+	ushr	v3.4s,v24.4s,#24
+	eor	w19,w19,w7
+	ushr	v7.4s,v25.4s,#24
+	eor	w20,w20,w8
+	ushr	v11.4s,v26.4s,#24
+	ror	w21,w21,#16
+	ushr	v15.4s,v27.4s,#24
+	ror	w17,w17,#16
+	ushr	v19.4s,v28.4s,#24
+	ror	w19,w19,#16
+	ushr	v23.4s,v29.4s,#24
+	ror	w20,w20,#16
+	sli	v3.4s,v24.4s,#8
+	add	w15,w15,w21
+	sli	v7.4s,v25.4s,#8
+	add	w16,w16,w17
+	sli	v11.4s,v26.4s,#8
+	add	w13,w13,w19
+	sli	v15.4s,v27.4s,#8
+	add	w14,w14,w20
+	sli	v19.4s,v28.4s,#8
+	eor	w10,w10,w15
+	sli	v23.4s,v29.4s,#8
+	eor	w11,w11,w16
+	add	v2.4s,v2.4s,v3.4s
+	eor	w12,w12,w13
+	add	v6.4s,v6.4s,v7.4s
+	eor	w9,w9,w14
+	add	v10.4s,v10.4s,v11.4s
+	ror	w10,w10,#20
+	add	v14.4s,v14.4s,v15.4s
+	ror	w11,w11,#20
+	add	v18.4s,v18.4s,v19.4s
+	ror	w12,w12,#20
+	add	v22.4s,v22.4s,v23.4s
+	ror	w9,w9,#20
+	eor	v24.16b,v1.16b,v2.16b
+	add	w5,w5,w10
+	eor	v25.16b,v5.16b,v6.16b
+	add	w6,w6,w11
+	eor	v26.16b,v9.16b,v10.16b
+	add	w7,w7,w12
+	eor	v27.16b,v13.16b,v14.16b
+	add	w8,w8,w9
+	eor	v28.16b,v17.16b,v18.16b
+	eor	w21,w21,w5
+	eor	v29.16b,v21.16b,v22.16b
+	eor	w17,w17,w6
+	ushr	v1.4s,v24.4s,#25
+	eor	w19,w19,w7
+	ushr	v5.4s,v25.4s,#25
+	eor	w20,w20,w8
+	ushr	v9.4s,v26.4s,#25
+	ror	w21,w21,#24
+	ushr	v13.4s,v27.4s,#25
+	ror	w17,w17,#24
+	ushr	v17.4s,v28.4s,#25
+	ror	w19,w19,#24
+	ushr	v21.4s,v29.4s,#25
+	ror	w20,w20,#24
+	sli	v1.4s,v24.4s,#7
+	add	w15,w15,w21
+	sli	v5.4s,v25.4s,#7
+	add	w16,w16,w17
+	sli	v9.4s,v26.4s,#7
+	add	w13,w13,w19
+	sli	v13.4s,v27.4s,#7
+	add	w14,w14,w20
+	sli	v17.4s,v28.4s,#7
+	eor	w10,w10,w15
+	sli	v21.4s,v29.4s,#7
+	eor	w11,w11,w16
+	ext	v2.16b,v2.16b,v2.16b,#8
+	eor	w12,w12,w13
+	ext	v6.16b,v6.16b,v6.16b,#8
+	eor	w9,w9,w14
+	ext	v10.16b,v10.16b,v10.16b,#8
+	ror	w10,w10,#25
+	ext	v14.16b,v14.16b,v14.16b,#8
+	ror	w11,w11,#25
+	ext	v18.16b,v18.16b,v18.16b,#8
+	ror	w12,w12,#25
+	ext	v22.16b,v22.16b,v22.16b,#8
+	ror	w9,w9,#25
+	ext	v3.16b,v3.16b,v3.16b,#12
+	ext	v7.16b,v7.16b,v7.16b,#12
+	ext	v11.16b,v11.16b,v11.16b,#12
+	ext	v15.16b,v15.16b,v15.16b,#12
+	ext	v19.16b,v19.16b,v19.16b,#12
+	ext	v23.16b,v23.16b,v23.16b,#12
+	ext	v1.16b,v1.16b,v1.16b,#4
+	ext	v5.16b,v5.16b,v5.16b,#4
+	ext	v9.16b,v9.16b,v9.16b,#4
+	ext	v13.16b,v13.16b,v13.16b,#4
+	ext	v17.16b,v17.16b,v17.16b,#4
+	ext	v21.16b,v21.16b,v21.16b,#4
+	add	v0.4s,v0.4s,v1.4s
+	add	w5,w5,w9
+	add	v4.4s,v4.4s,v5.4s
+	add	w6,w6,w10
+	add	v8.4s,v8.4s,v9.4s
+	add	w7,w7,w11
+	add	v12.4s,v12.4s,v13.4s
+	add	w8,w8,w12
+	add	v16.4s,v16.4s,v17.4s
+	eor	w17,w17,w5
+	add	v20.4s,v20.4s,v21.4s
+	eor	w19,w19,w6
+	eor	v3.16b,v3.16b,v0.16b
+	eor	w20,w20,w7
+	eor	v7.16b,v7.16b,v4.16b
+	eor	w21,w21,w8
+	eor	v11.16b,v11.16b,v8.16b
+	ror	w17,w17,#16
+	eor	v15.16b,v15.16b,v12.16b
+	ror	w19,w19,#16
+	eor	v19.16b,v19.16b,v16.16b
+	ror	w20,w20,#16
+	eor	v23.16b,v23.16b,v20.16b
+	ror	w21,w21,#16
+	rev32	v3.8h,v3.8h
+	add	w13,w13,w17
+	rev32	v7.8h,v7.8h
+	add	w14,w14,w19
+	rev32	v11.8h,v11.8h
+	add	w15,w15,w20
+	rev32	v15.8h,v15.8h
+	add	w16,w16,w21
+	rev32	v19.8h,v19.8h
+	eor	w9,w9,w13
+	rev32	v23.8h,v23.8h
+	eor	w10,w10,w14
+	add	v2.4s,v2.4s,v3.4s
+	eor	w11,w11,w15
+	add	v6.4s,v6.4s,v7.4s
+	eor	w12,w12,w16
+	add	v10.4s,v10.4s,v11.4s
+	ror	w9,w9,#20
+	add	v14.4s,v14.4s,v15.4s
+	ror	w10,w10,#20
+	add	v18.4s,v18.4s,v19.4s
+	ror	w11,w11,#20
+	add	v22.4s,v22.4s,v23.4s
+	ror	w12,w12,#20
+	eor	v24.16b,v1.16b,v2.16b
+	add	w5,w5,w9
+	eor	v25.16b,v5.16b,v6.16b
+	add	w6,w6,w10
+	eor	v26.16b,v9.16b,v10.16b
+	add	w7,w7,w11
+	eor	v27.16b,v13.16b,v14.16b
+	add	w8,w8,w12
+	eor	v28.16b,v17.16b,v18.16b
+	eor	w17,w17,w5
+	eor	v29.16b,v21.16b,v22.16b
+	eor	w19,w19,w6
+	ushr	v1.4s,v24.4s,#20
+	eor	w20,w20,w7
+	ushr	v5.4s,v25.4s,#20
+	eor	w21,w21,w8
+	ushr	v9.4s,v26.4s,#20
+	ror	w17,w17,#24
+	ushr	v13.4s,v27.4s,#20
+	ror	w19,w19,#24
+	ushr	v17.4s,v28.4s,#20
+	ror	w20,w20,#24
+	ushr	v21.4s,v29.4s,#20
+	ror	w21,w21,#24
+	sli	v1.4s,v24.4s,#12
+	add	w13,w13,w17
+	sli	v5.4s,v25.4s,#12
+	add	w14,w14,w19
+	sli	v9.4s,v26.4s,#12
+	add	w15,w15,w20
+	sli	v13.4s,v27.4s,#12
+	add	w16,w16,w21
+	sli	v17.4s,v28.4s,#12
+	eor	w9,w9,w13
+	sli	v21.4s,v29.4s,#12
+	eor	w10,w10,w14
+	add	v0.4s,v0.4s,v1.4s
+	eor	w11,w11,w15
+	add	v4.4s,v4.4s,v5.4s
+	eor	w12,w12,w16
+	add	v8.4s,v8.4s,v9.4s
+	ror	w9,w9,#25
+	add	v12.4s,v12.4s,v13.4s
+	ror	w10,w10,#25
+	add	v16.4s,v16.4s,v17.4s
+	ror	w11,w11,#25
+	add	v20.4s,v20.4s,v21.4s
+	ror	w12,w12,#25
+	eor	v24.16b,v3.16b,v0.16b
+	add	w5,w5,w10
+	eor	v25.16b,v7.16b,v4.16b
+	add	w6,w6,w11
+	eor	v26.16b,v11.16b,v8.16b
+	add	w7,w7,w12
+	eor	v27.16b,v15.16b,v12.16b
+	add	w8,w8,w9
+	eor	v28.16b,v19.16b,v16.16b
+	eor	w21,w21,w5
+	eor	v29.16b,v23.16b,v20.16b
+	eor	w17,w17,w6
+	ushr	v3.4s,v24.4s,#24
+	eor	w19,w19,w7
+	ushr	v7.4s,v25.4s,#24
+	eor	w20,w20,w8
+	ushr	v11.4s,v26.4s,#24
+	ror	w21,w21,#16
+	ushr	v15.4s,v27.4s,#24
+	ror	w17,w17,#16
+	ushr	v19.4s,v28.4s,#24
+	ror	w19,w19,#16
+	ushr	v23.4s,v29.4s,#24
+	ror	w20,w20,#16
+	sli	v3.4s,v24.4s,#8
+	add	w15,w15,w21
+	sli	v7.4s,v25.4s,#8
+	add	w16,w16,w17
+	sli	v11.4s,v26.4s,#8
+	add	w13,w13,w19
+	sli	v15.4s,v27.4s,#8
+	add	w14,w14,w20
+	sli	v19.4s,v28.4s,#8
+	eor	w10,w10,w15
+	sli	v23.4s,v29.4s,#8
+	eor	w11,w11,w16
+	add	v2.4s,v2.4s,v3.4s
+	eor	w12,w12,w13
+	add	v6.4s,v6.4s,v7.4s
+	eor	w9,w9,w14
+	add	v10.4s,v10.4s,v11.4s
+	ror	w10,w10,#20
+	add	v14.4s,v14.4s,v15.4s
+	ror	w11,w11,#20
+	add	v18.4s,v18.4s,v19.4s
+	ror	w12,w12,#20
+	add	v22.4s,v22.4s,v23.4s
+	ror	w9,w9,#20
+	eor	v24.16b,v1.16b,v2.16b
+	add	w5,w5,w10
+	eor	v25.16b,v5.16b,v6.16b
+	add	w6,w6,w11
+	eor	v26.16b,v9.16b,v10.16b
+	add	w7,w7,w12
+	eor	v27.16b,v13.16b,v14.16b
+	add	w8,w8,w9
+	eor	v28.16b,v17.16b,v18.16b
+	eor	w21,w21,w5
+	eor	v29.16b,v21.16b,v22.16b
+	eor	w17,w17,w6
+	ushr	v1.4s,v24.4s,#25
+	eor	w19,w19,w7
+	ushr	v5.4s,v25.4s,#25
+	eor	w20,w20,w8
+	ushr	v9.4s,v26.4s,#25
+	ror	w21,w21,#24
+	ushr	v13.4s,v27.4s,#25
+	ror	w17,w17,#24
+	ushr	v17.4s,v28.4s,#25
+	ror	w19,w19,#24
+	ushr	v21.4s,v29.4s,#25
+	ror	w20,w20,#24
+	sli	v1.4s,v24.4s,#7
+	add	w15,w15,w21
+	sli	v5.4s,v25.4s,#7
+	add	w16,w16,w17
+	sli	v9.4s,v26.4s,#7
+	add	w13,w13,w19
+	sli	v13.4s,v27.4s,#7
+	add	w14,w14,w20
+	sli	v17.4s,v28.4s,#7
+	eor	w10,w10,w15
+	sli	v21.4s,v29.4s,#7
+	eor	w11,w11,w16
+	ext	v2.16b,v2.16b,v2.16b,#8
+	eor	w12,w12,w13
+	ext	v6.16b,v6.16b,v6.16b,#8
+	eor	w9,w9,w14
+	ext	v10.16b,v10.16b,v10.16b,#8
+	ror	w10,w10,#25
+	ext	v14.16b,v14.16b,v14.16b,#8
+	ror	w11,w11,#25
+	ext	v18.16b,v18.16b,v18.16b,#8
+	ror	w12,w12,#25
+	ext	v22.16b,v22.16b,v22.16b,#8
+	ror	w9,w9,#25
+	ext	v3.16b,v3.16b,v3.16b,#4
+	ext	v7.16b,v7.16b,v7.16b,#4
+	ext	v11.16b,v11.16b,v11.16b,#4
+	ext	v15.16b,v15.16b,v15.16b,#4
+	ext	v19.16b,v19.16b,v19.16b,#4
+	ext	v23.16b,v23.16b,v23.16b,#4
+	ext	v1.16b,v1.16b,v1.16b,#12
+	ext	v5.16b,v5.16b,v5.16b,#12
+	ext	v9.16b,v9.16b,v9.16b,#12
+	ext	v13.16b,v13.16b,v13.16b,#12
+	ext	v17.16b,v17.16b,v17.16b,#12
+	ext	v21.16b,v21.16b,v21.16b,#12
+	cbnz	x4,.Loop_upper_neon
+
+	add	w5,w5,w22		// accumulate key block
+	add	x6,x6,x22,lsr#32
+	add	w7,w7,w23
+	add	x8,x8,x23,lsr#32
+	add	w9,w9,w24
+	add	x10,x10,x24,lsr#32
+	add	w11,w11,w25
+	add	x12,x12,x25,lsr#32
+	add	w13,w13,w26
+	add	x14,x14,x26,lsr#32
+	add	w15,w15,w27
+	add	x16,x16,x27,lsr#32
+	add	w17,w17,w28
+	add	x19,x19,x28,lsr#32
+	add	w20,w20,w30
+	add	x21,x21,x30,lsr#32
+
+	add	x5,x5,x6,lsl#32	// pack
+	add	x7,x7,x8,lsl#32
+	ldp	x6,x8,[x1,#0]		// load input
+	add	x9,x9,x10,lsl#32
+	add	x11,x11,x12,lsl#32
+	ldp	x10,x12,[x1,#16]
+	add	x13,x13,x14,lsl#32
+	add	x15,x15,x16,lsl#32
+	ldp	x14,x16,[x1,#32]
+	add	x17,x17,x19,lsl#32
+	add	x20,x20,x21,lsl#32
+	ldp	x19,x21,[x1,#48]
+	add	x1,x1,#64
+#ifdef	__ARMEB__
+	rev	x5,x5
+	rev	x7,x7
+	rev	x9,x9
+	rev	x11,x11
+	rev	x13,x13
+	rev	x15,x15
+	rev	x17,x17
+	rev	x20,x20
+#endif
+	eor	x5,x5,x6
+	eor	x7,x7,x8
+	eor	x9,x9,x10
+	eor	x11,x11,x12
+	eor	x13,x13,x14
+	eor	x15,x15,x16
+	eor	x17,x17,x19
+	eor	x20,x20,x21
+
+	stp	x5,x7,[x0,#0]		// store output
+	add	x28,x28,#1			// increment counter
+	mov	w5,w22			// unpack key block
+	lsr	x6,x22,#32
+	stp	x9,x11,[x0,#16]
+	mov	w7,w23
+	lsr	x8,x23,#32
+	stp	x13,x15,[x0,#32]
+	mov	w9,w24
+	lsr	x10,x24,#32
+	stp	x17,x20,[x0,#48]
+	add	x0,x0,#64
+	mov	w11,w25
+	lsr	x12,x25,#32
+	mov	w13,w26
+	lsr	x14,x26,#32
+	mov	w15,w27
+	lsr	x16,x27,#32
+	mov	w17,w28
+	lsr	x19,x28,#32
+	mov	w20,w30
+	lsr	x21,x30,#32
+
+	mov	x4,#5
+.Loop_lower_neon:
+	sub	x4,x4,#1
+	add	v0.4s,v0.4s,v1.4s
+	add	w5,w5,w9
+	add	v4.4s,v4.4s,v5.4s
+	add	w6,w6,w10
+	add	v8.4s,v8.4s,v9.4s
+	add	w7,w7,w11
+	add	v12.4s,v12.4s,v13.4s
+	add	w8,w8,w12
+	add	v16.4s,v16.4s,v17.4s
+	eor	w17,w17,w5
+	add	v20.4s,v20.4s,v21.4s
+	eor	w19,w19,w6
+	eor	v3.16b,v3.16b,v0.16b
+	eor	w20,w20,w7
+	eor	v7.16b,v7.16b,v4.16b
+	eor	w21,w21,w8
+	eor	v11.16b,v11.16b,v8.16b
+	ror	w17,w17,#16
+	eor	v15.16b,v15.16b,v12.16b
+	ror	w19,w19,#16
+	eor	v19.16b,v19.16b,v16.16b
+	ror	w20,w20,#16
+	eor	v23.16b,v23.16b,v20.16b
+	ror	w21,w21,#16
+	rev32	v3.8h,v3.8h
+	add	w13,w13,w17
+	rev32	v7.8h,v7.8h
+	add	w14,w14,w19
+	rev32	v11.8h,v11.8h
+	add	w15,w15,w20
+	rev32	v15.8h,v15.8h
+	add	w16,w16,w21
+	rev32	v19.8h,v19.8h
+	eor	w9,w9,w13
+	rev32	v23.8h,v23.8h
+	eor	w10,w10,w14
+	add	v2.4s,v2.4s,v3.4s
+	eor	w11,w11,w15
+	add	v6.4s,v6.4s,v7.4s
+	eor	w12,w12,w16
+	add	v10.4s,v10.4s,v11.4s
+	ror	w9,w9,#20
+	add	v14.4s,v14.4s,v15.4s
+	ror	w10,w10,#20
+	add	v18.4s,v18.4s,v19.4s
+	ror	w11,w11,#20
+	add	v22.4s,v22.4s,v23.4s
+	ror	w12,w12,#20
+	eor	v24.16b,v1.16b,v2.16b
+	add	w5,w5,w9
+	eor	v25.16b,v5.16b,v6.16b
+	add	w6,w6,w10
+	eor	v26.16b,v9.16b,v10.16b
+	add	w7,w7,w11
+	eor	v27.16b,v13.16b,v14.16b
+	add	w8,w8,w12
+	eor	v28.16b,v17.16b,v18.16b
+	eor	w17,w17,w5
+	eor	v29.16b,v21.16b,v22.16b
+	eor	w19,w19,w6
+	ushr	v1.4s,v24.4s,#20
+	eor	w20,w20,w7
+	ushr	v5.4s,v25.4s,#20
+	eor	w21,w21,w8
+	ushr	v9.4s,v26.4s,#20
+	ror	w17,w17,#24
+	ushr	v13.4s,v27.4s,#20
+	ror	w19,w19,#24
+	ushr	v17.4s,v28.4s,#20
+	ror	w20,w20,#24
+	ushr	v21.4s,v29.4s,#20
+	ror	w21,w21,#24
+	sli	v1.4s,v24.4s,#12
+	add	w13,w13,w17
+	sli	v5.4s,v25.4s,#12
+	add	w14,w14,w19
+	sli	v9.4s,v26.4s,#12
+	add	w15,w15,w20
+	sli	v13.4s,v27.4s,#12
+	add	w16,w16,w21
+	sli	v17.4s,v28.4s,#12
+	eor	w9,w9,w13
+	sli	v21.4s,v29.4s,#12
+	eor	w10,w10,w14
+	add	v0.4s,v0.4s,v1.4s
+	eor	w11,w11,w15
+	add	v4.4s,v4.4s,v5.4s
+	eor	w12,w12,w16
+	add	v8.4s,v8.4s,v9.4s
+	ror	w9,w9,#25
+	add	v12.4s,v12.4s,v13.4s
+	ror	w10,w10,#25
+	add	v16.4s,v16.4s,v17.4s
+	ror	w11,w11,#25
+	add	v20.4s,v20.4s,v21.4s
+	ror	w12,w12,#25
+	eor	v24.16b,v3.16b,v0.16b
+	add	w5,w5,w10
+	eor	v25.16b,v7.16b,v4.16b
+	add	w6,w6,w11
+	eor	v26.16b,v11.16b,v8.16b
+	add	w7,w7,w12
+	eor	v27.16b,v15.16b,v12.16b
+	add	w8,w8,w9
+	eor	v28.16b,v19.16b,v16.16b
+	eor	w21,w21,w5
+	eor	v29.16b,v23.16b,v20.16b
+	eor	w17,w17,w6
+	ushr	v3.4s,v24.4s,#24
+	eor	w19,w19,w7
+	ushr	v7.4s,v25.4s,#24
+	eor	w20,w20,w8
+	ushr	v11.4s,v26.4s,#24
+	ror	w21,w21,#16
+	ushr	v15.4s,v27.4s,#24
+	ror	w17,w17,#16
+	ushr	v19.4s,v28.4s,#24
+	ror	w19,w19,#16
+	ushr	v23.4s,v29.4s,#24
+	ror	w20,w20,#16
+	sli	v3.4s,v24.4s,#8
+	add	w15,w15,w21
+	sli	v7.4s,v25.4s,#8
+	add	w16,w16,w17
+	sli	v11.4s,v26.4s,#8
+	add	w13,w13,w19
+	sli	v15.4s,v27.4s,#8
+	add	w14,w14,w20
+	sli	v19.4s,v28.4s,#8
+	eor	w10,w10,w15
+	sli	v23.4s,v29.4s,#8
+	eor	w11,w11,w16
+	add	v2.4s,v2.4s,v3.4s
+	eor	w12,w12,w13
+	add	v6.4s,v6.4s,v7.4s
+	eor	w9,w9,w14
+	add	v10.4s,v10.4s,v11.4s
+	ror	w10,w10,#20
+	add	v14.4s,v14.4s,v15.4s
+	ror	w11,w11,#20
+	add	v18.4s,v18.4s,v19.4s
+	ror	w12,w12,#20
+	add	v22.4s,v22.4s,v23.4s
+	ror	w9,w9,#20
+	eor	v24.16b,v1.16b,v2.16b
+	add	w5,w5,w10
+	eor	v25.16b,v5.16b,v6.16b
+	add	w6,w6,w11
+	eor	v26.16b,v9.16b,v10.16b
+	add	w7,w7,w12
+	eor	v27.16b,v13.16b,v14.16b
+	add	w8,w8,w9
+	eor	v28.16b,v17.16b,v18.16b
+	eor	w21,w21,w5
+	eor	v29.16b,v21.16b,v22.16b
+	eor	w17,w17,w6
+	ushr	v1.4s,v24.4s,#25
+	eor	w19,w19,w7
+	ushr	v5.4s,v25.4s,#25
+	eor	w20,w20,w8
+	ushr	v9.4s,v26.4s,#25
+	ror	w21,w21,#24
+	ushr	v13.4s,v27.4s,#25
+	ror	w17,w17,#24
+	ushr	v17.4s,v28.4s,#25
+	ror	w19,w19,#24
+	ushr	v21.4s,v29.4s,#25
+	ror	w20,w20,#24
+	sli	v1.4s,v24.4s,#7
+	add	w15,w15,w21
+	sli	v5.4s,v25.4s,#7
+	add	w16,w16,w17
+	sli	v9.4s,v26.4s,#7
+	add	w13,w13,w19
+	sli	v13.4s,v27.4s,#7
+	add	w14,w14,w20
+	sli	v17.4s,v28.4s,#7
+	eor	w10,w10,w15
+	sli	v21.4s,v29.4s,#7
+	eor	w11,w11,w16
+	ext	v2.16b,v2.16b,v2.16b,#8
+	eor	w12,w12,w13
+	ext	v6.16b,v6.16b,v6.16b,#8
+	eor	w9,w9,w14
+	ext	v10.16b,v10.16b,v10.16b,#8
+	ror	w10,w10,#25
+	ext	v14.16b,v14.16b,v14.16b,#8
+	ror	w11,w11,#25
+	ext	v18.16b,v18.16b,v18.16b,#8
+	ror	w12,w12,#25
+	ext	v22.16b,v22.16b,v22.16b,#8
+	ror	w9,w9,#25
+	ext	v3.16b,v3.16b,v3.16b,#12
+	ext	v7.16b,v7.16b,v7.16b,#12
+	ext	v11.16b,v11.16b,v11.16b,#12
+	ext	v15.16b,v15.16b,v15.16b,#12
+	ext	v19.16b,v19.16b,v19.16b,#12
+	ext	v23.16b,v23.16b,v23.16b,#12
+	ext	v1.16b,v1.16b,v1.16b,#4
+	ext	v5.16b,v5.16b,v5.16b,#4
+	ext	v9.16b,v9.16b,v9.16b,#4
+	ext	v13.16b,v13.16b,v13.16b,#4
+	ext	v17.16b,v17.16b,v17.16b,#4
+	ext	v21.16b,v21.16b,v21.16b,#4
+	add	v0.4s,v0.4s,v1.4s
+	add	w5,w5,w9
+	add	v4.4s,v4.4s,v5.4s
+	add	w6,w6,w10
+	add	v8.4s,v8.4s,v9.4s
+	add	w7,w7,w11
+	add	v12.4s,v12.4s,v13.4s
+	add	w8,w8,w12
+	add	v16.4s,v16.4s,v17.4s
+	eor	w17,w17,w5
+	add	v20.4s,v20.4s,v21.4s
+	eor	w19,w19,w6
+	eor	v3.16b,v3.16b,v0.16b
+	eor	w20,w20,w7
+	eor	v7.16b,v7.16b,v4.16b
+	eor	w21,w21,w8
+	eor	v11.16b,v11.16b,v8.16b
+	ror	w17,w17,#16
+	eor	v15.16b,v15.16b,v12.16b
+	ror	w19,w19,#16
+	eor	v19.16b,v19.16b,v16.16b
+	ror	w20,w20,#16
+	eor	v23.16b,v23.16b,v20.16b
+	ror	w21,w21,#16
+	rev32	v3.8h,v3.8h
+	add	w13,w13,w17
+	rev32	v7.8h,v7.8h
+	add	w14,w14,w19
+	rev32	v11.8h,v11.8h
+	add	w15,w15,w20
+	rev32	v15.8h,v15.8h
+	add	w16,w16,w21
+	rev32	v19.8h,v19.8h
+	eor	w9,w9,w13
+	rev32	v23.8h,v23.8h
+	eor	w10,w10,w14
+	add	v2.4s,v2.4s,v3.4s
+	eor	w11,w11,w15
+	add	v6.4s,v6.4s,v7.4s
+	eor	w12,w12,w16
+	add	v10.4s,v10.4s,v11.4s
+	ror	w9,w9,#20
+	add	v14.4s,v14.4s,v15.4s
+	ror	w10,w10,#20
+	add	v18.4s,v18.4s,v19.4s
+	ror	w11,w11,#20
+	add	v22.4s,v22.4s,v23.4s
+	ror	w12,w12,#20
+	eor	v24.16b,v1.16b,v2.16b
+	add	w5,w5,w9
+	eor	v25.16b,v5.16b,v6.16b
+	add	w6,w6,w10
+	eor	v26.16b,v9.16b,v10.16b
+	add	w7,w7,w11
+	eor	v27.16b,v13.16b,v14.16b
+	add	w8,w8,w12
+	eor	v28.16b,v17.16b,v18.16b
+	eor	w17,w17,w5
+	eor	v29.16b,v21.16b,v22.16b
+	eor	w19,w19,w6
+	ushr	v1.4s,v24.4s,#20
+	eor	w20,w20,w7
+	ushr	v5.4s,v25.4s,#20
+	eor	w21,w21,w8
+	ushr	v9.4s,v26.4s,#20
+	ror	w17,w17,#24
+	ushr	v13.4s,v27.4s,#20
+	ror	w19,w19,#24
+	ushr	v17.4s,v28.4s,#20
+	ror	w20,w20,#24
+	ushr	v21.4s,v29.4s,#20
+	ror	w21,w21,#24
+	sli	v1.4s,v24.4s,#12
+	add	w13,w13,w17
+	sli	v5.4s,v25.4s,#12
+	add	w14,w14,w19
+	sli	v9.4s,v26.4s,#12
+	add	w15,w15,w20
+	sli	v13.4s,v27.4s,#12
+	add	w16,w16,w21
+	sli	v17.4s,v28.4s,#12
+	eor	w9,w9,w13
+	sli	v21.4s,v29.4s,#12
+	eor	w10,w10,w14
+	add	v0.4s,v0.4s,v1.4s
+	eor	w11,w11,w15
+	add	v4.4s,v4.4s,v5.4s
+	eor	w12,w12,w16
+	add	v8.4s,v8.4s,v9.4s
+	ror	w9,w9,#25
+	add	v12.4s,v12.4s,v13.4s
+	ror	w10,w10,#25
+	add	v16.4s,v16.4s,v17.4s
+	ror	w11,w11,#25
+	add	v20.4s,v20.4s,v21.4s
+	ror	w12,w12,#25
+	eor	v24.16b,v3.16b,v0.16b
+	add	w5,w5,w10
+	eor	v25.16b,v7.16b,v4.16b
+	add	w6,w6,w11
+	eor	v26.16b,v11.16b,v8.16b
+	add	w7,w7,w12
+	eor	v27.16b,v15.16b,v12.16b
+	add	w8,w8,w9
+	eor	v28.16b,v19.16b,v16.16b
+	eor	w21,w21,w5
+	eor	v29.16b,v23.16b,v20.16b
+	eor	w17,w17,w6
+	ushr	v3.4s,v24.4s,#24
+	eor	w19,w19,w7
+	ushr	v7.4s,v25.4s,#24
+	eor	w20,w20,w8
+	ushr	v11.4s,v26.4s,#24
+	ror	w21,w21,#16
+	ushr	v15.4s,v27.4s,#24
+	ror	w17,w17,#16
+	ushr	v19.4s,v28.4s,#24
+	ror	w19,w19,#16
+	ushr	v23.4s,v29.4s,#24
+	ror	w20,w20,#16
+	sli	v3.4s,v24.4s,#8
+	add	w15,w15,w21
+	sli	v7.4s,v25.4s,#8
+	add	w16,w16,w17
+	sli	v11.4s,v26.4s,#8
+	add	w13,w13,w19
+	sli	v15.4s,v27.4s,#8
+	add	w14,w14,w20
+	sli	v19.4s,v28.4s,#8
+	eor	w10,w10,w15
+	sli	v23.4s,v29.4s,#8
+	eor	w11,w11,w16
+	add	v2.4s,v2.4s,v3.4s
+	eor	w12,w12,w13
+	add	v6.4s,v6.4s,v7.4s
+	eor	w9,w9,w14
+	add	v10.4s,v10.4s,v11.4s
+	ror	w10,w10,#20
+	add	v14.4s,v14.4s,v15.4s
+	ror	w11,w11,#20
+	add	v18.4s,v18.4s,v19.4s
+	ror	w12,w12,#20
+	add	v22.4s,v22.4s,v23.4s
+	ror	w9,w9,#20
+	eor	v24.16b,v1.16b,v2.16b
+	add	w5,w5,w10
+	eor	v25.16b,v5.16b,v6.16b
+	add	w6,w6,w11
+	eor	v26.16b,v9.16b,v10.16b
+	add	w7,w7,w12
+	eor	v27.16b,v13.16b,v14.16b
+	add	w8,w8,w9
+	eor	v28.16b,v17.16b,v18.16b
+	eor	w21,w21,w5
+	eor	v29.16b,v21.16b,v22.16b
+	eor	w17,w17,w6
+	ushr	v1.4s,v24.4s,#25
+	eor	w19,w19,w7
+	ushr	v5.4s,v25.4s,#25
+	eor	w20,w20,w8
+	ushr	v9.4s,v26.4s,#25
+	ror	w21,w21,#24
+	ushr	v13.4s,v27.4s,#25
+	ror	w17,w17,#24
+	ushr	v17.4s,v28.4s,#25
+	ror	w19,w19,#24
+	ushr	v21.4s,v29.4s,#25
+	ror	w20,w20,#24
+	sli	v1.4s,v24.4s,#7
+	add	w15,w15,w21
+	sli	v5.4s,v25.4s,#7
+	add	w16,w16,w17
+	sli	v9.4s,v26.4s,#7
+	add	w13,w13,w19
+	sli	v13.4s,v27.4s,#7
+	add	w14,w14,w20
+	sli	v17.4s,v28.4s,#7
+	eor	w10,w10,w15
+	sli	v21.4s,v29.4s,#7
+	eor	w11,w11,w16
+	ext	v2.16b,v2.16b,v2.16b,#8
+	eor	w12,w12,w13
+	ext	v6.16b,v6.16b,v6.16b,#8
+	eor	w9,w9,w14
+	ext	v10.16b,v10.16b,v10.16b,#8
+	ror	w10,w10,#25
+	ext	v14.16b,v14.16b,v14.16b,#8
+	ror	w11,w11,#25
+	ext	v18.16b,v18.16b,v18.16b,#8
+	ror	w12,w12,#25
+	ext	v22.16b,v22.16b,v22.16b,#8
+	ror	w9,w9,#25
+	ext	v3.16b,v3.16b,v3.16b,#4
+	ext	v7.16b,v7.16b,v7.16b,#4
+	ext	v11.16b,v11.16b,v11.16b,#4
+	ext	v15.16b,v15.16b,v15.16b,#4
+	ext	v19.16b,v19.16b,v19.16b,#4
+	ext	v23.16b,v23.16b,v23.16b,#4
+	ext	v1.16b,v1.16b,v1.16b,#12
+	ext	v5.16b,v5.16b,v5.16b,#12
+	ext	v9.16b,v9.16b,v9.16b,#12
+	ext	v13.16b,v13.16b,v13.16b,#12
+	ext	v17.16b,v17.16b,v17.16b,#12
+	ext	v21.16b,v21.16b,v21.16b,#12
+	cbnz	x4,.Loop_lower_neon
+
+	add	w5,w5,w22		// accumulate key block
+	ldp	q24,q25,[sp,#0]
+	add	x6,x6,x22,lsr#32
+	ldp	q26,q27,[sp,#32]
+	add	w7,w7,w23
+	ldp	q28,q29,[sp,#64]
+	add	x8,x8,x23,lsr#32
+	add	v0.4s,v0.4s,v24.4s
+	add	w9,w9,w24
+	add	v4.4s,v4.4s,v24.4s
+	add	x10,x10,x24,lsr#32
+	add	v8.4s,v8.4s,v24.4s
+	add	w11,w11,w25
+	add	v12.4s,v12.4s,v24.4s
+	add	x12,x12,x25,lsr#32
+	add	v16.4s,v16.4s,v24.4s
+	add	w13,w13,w26
+	add	v20.4s,v20.4s,v24.4s
+	add	x14,x14,x26,lsr#32
+	add	v2.4s,v2.4s,v26.4s
+	add	w15,w15,w27
+	add	v6.4s,v6.4s,v26.4s
+	add	x16,x16,x27,lsr#32
+	add	v10.4s,v10.4s,v26.4s
+	add	w17,w17,w28
+	add	v14.4s,v14.4s,v26.4s
+	add	x19,x19,x28,lsr#32
+	add	v18.4s,v18.4s,v26.4s
+	add	w20,w20,w30
+	add	v22.4s,v22.4s,v26.4s
+	add	x21,x21,x30,lsr#32
+	add	v19.4s,v19.4s,v31.4s			// +4
+	add	x5,x5,x6,lsl#32	// pack
+	add	v23.4s,v23.4s,v31.4s			// +4
+	add	x7,x7,x8,lsl#32
+	add	v3.4s,v3.4s,v27.4s
+	ldp	x6,x8,[x1,#0]		// load input
+	add	v7.4s,v7.4s,v28.4s
+	add	x9,x9,x10,lsl#32
+	add	v11.4s,v11.4s,v29.4s
+	add	x11,x11,x12,lsl#32
+	add	v15.4s,v15.4s,v30.4s
+	ldp	x10,x12,[x1,#16]
+	add	v19.4s,v19.4s,v27.4s
+	add	x13,x13,x14,lsl#32
+	add	v23.4s,v23.4s,v28.4s
+	add	x15,x15,x16,lsl#32
+	add	v1.4s,v1.4s,v25.4s
+	ldp	x14,x16,[x1,#32]
+	add	v5.4s,v5.4s,v25.4s
+	add	x17,x17,x19,lsl#32
+	add	v9.4s,v9.4s,v25.4s
+	add	x20,x20,x21,lsl#32
+	add	v13.4s,v13.4s,v25.4s
+	ldp	x19,x21,[x1,#48]
+	add	v17.4s,v17.4s,v25.4s
+	add	x1,x1,#64
+	add	v21.4s,v21.4s,v25.4s
+
+#ifdef	__ARMEB__
+	rev	x5,x5
+	rev	x7,x7
+	rev	x9,x9
+	rev	x11,x11
+	rev	x13,x13
+	rev	x15,x15
+	rev	x17,x17
+	rev	x20,x20
+#endif
+	ld1	{v24.16b,v25.16b,v26.16b,v27.16b},[x1],#64
+	eor	x5,x5,x6
+	eor	x7,x7,x8
+	eor	x9,x9,x10
+	eor	x11,x11,x12
+	eor	x13,x13,x14
+	eor	v0.16b,v0.16b,v24.16b
+	eor	x15,x15,x16
+	eor	v1.16b,v1.16b,v25.16b
+	eor	x17,x17,x19
+	eor	v2.16b,v2.16b,v26.16b
+	eor	x20,x20,x21
+	eor	v3.16b,v3.16b,v27.16b
+	ld1	{v24.16b,v25.16b,v26.16b,v27.16b},[x1],#64
+
+	stp	x5,x7,[x0,#0]		// store output
+	add	x28,x28,#7			// increment counter
+	stp	x9,x11,[x0,#16]
+	stp	x13,x15,[x0,#32]
+	stp	x17,x20,[x0,#48]
+	add	x0,x0,#64
+	st1	{v0.16b,v1.16b,v2.16b,v3.16b},[x0],#64
+
+	ld1	{v0.16b,v1.16b,v2.16b,v3.16b},[x1],#64
+	eor	v4.16b,v4.16b,v24.16b
+	eor	v5.16b,v5.16b,v25.16b
+	eor	v6.16b,v6.16b,v26.16b
+	eor	v7.16b,v7.16b,v27.16b
+	st1	{v4.16b,v5.16b,v6.16b,v7.16b},[x0],#64
+
+	ld1	{v4.16b,v5.16b,v6.16b,v7.16b},[x1],#64
+	eor	v8.16b,v8.16b,v0.16b
+	ldp	q24,q25,[sp,#0]
+	eor	v9.16b,v9.16b,v1.16b
+	ldp	q26,q27,[sp,#32]
+	eor	v10.16b,v10.16b,v2.16b
+	eor	v11.16b,v11.16b,v3.16b
+	st1	{v8.16b,v9.16b,v10.16b,v11.16b},[x0],#64
+
+	ld1	{v8.16b,v9.16b,v10.16b,v11.16b},[x1],#64
+	eor	v12.16b,v12.16b,v4.16b
+	eor	v13.16b,v13.16b,v5.16b
+	eor	v14.16b,v14.16b,v6.16b
+	eor	v15.16b,v15.16b,v7.16b
+	st1	{v12.16b,v13.16b,v14.16b,v15.16b},[x0],#64
+
+	ld1	{v12.16b,v13.16b,v14.16b,v15.16b},[x1],#64
+	eor	v16.16b,v16.16b,v8.16b
+	eor	v17.16b,v17.16b,v9.16b
+	eor	v18.16b,v18.16b,v10.16b
+	eor	v19.16b,v19.16b,v11.16b
+	st1	{v16.16b,v17.16b,v18.16b,v19.16b},[x0],#64
+
+	shl	v0.4s,v31.4s,#1			// 4 -> 8
+	eor	v20.16b,v20.16b,v12.16b
+	eor	v21.16b,v21.16b,v13.16b
+	eor	v22.16b,v22.16b,v14.16b
+	eor	v23.16b,v23.16b,v15.16b
+	st1	{v20.16b,v21.16b,v22.16b,v23.16b},[x0],#64
+
+	add	v27.4s,v27.4s,v0.4s			// += 8
+	add	v28.4s,v28.4s,v0.4s
+	add	v29.4s,v29.4s,v0.4s
+	add	v30.4s,v30.4s,v0.4s
+
+	b.hs	.Loop_outer_512_neon
+
+	adds	x2,x2,#512
+	ushr	v0.4s,v31.4s,#2			// 4 -> 1
+
+	ldp	d8,d9,[sp,#128+0]		// meet ABI requirements
+	ldp	d10,d11,[sp,#128+16]
+	ldp	d12,d13,[sp,#128+32]
+	ldp	d14,d15,[sp,#128+48]
+
+	stp	q24,q31,[sp,#0]		// wipe off-load area
+	stp	q24,q31,[sp,#32]
+	stp	q24,q31,[sp,#64]
+
+	b.eq	.Ldone_512_neon
+
+	cmp	x2,#192
+	sub	v27.4s,v27.4s,v0.4s			// -= 1
+	sub	v28.4s,v28.4s,v0.4s
+	sub	v29.4s,v29.4s,v0.4s
+	add	sp,sp,#128
+	b.hs	.Loop_outer_neon
+
+	eor	v25.16b,v25.16b,v25.16b
+	eor	v26.16b,v26.16b,v26.16b
+	eor	v27.16b,v27.16b,v27.16b
+	eor	v28.16b,v28.16b,v28.16b
+	eor	v29.16b,v29.16b,v29.16b
+	eor	v30.16b,v30.16b,v30.16b
+	b	.Loop_outer
+
+.Ldone_512_neon:
+	ldp	x19,x20,[x29,#16]
+	add	sp,sp,#128+64
+	ldp	x21,x22,[x29,#32]
+	ldp	x23,x24,[x29,#48]
+	ldp	x25,x26,[x29,#64]
+	ldp	x27,x28,[x29,#80]
+	ldp	x29,x30,[sp],#96
+.Labort_neon:
+	ret
+ENDPROC(chacha20_neon)
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 05/20] zinc: ChaCha20 x86_64 implementation
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (3 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 04/20] zinc: ChaCha20 ARM and ARM64 implementations Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 06/20] zinc: ChaCha20 MIPS32r2 implementation Jason A. Donenfeld
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Andy Polyakov, Thomas Gleixner,
	Ingo Molnar, x86

This provides SSSE3, AVX-2, AVX-512F, and AVX-512VL implementations for
ChaCha20. The AVX-512F implementation is disabled on Skylake, due to
throttling, and the VL ymm implementation is used instead. These come
from Andy Polyakov's implementation, with the following modifications
from Samuel Neves:

  - Some cosmetic changes, like renaming labels to .Lname, constants,
    and other Linux conventions.

  - CPU feature checking is done in C by the glue code, so that has been
    removed from the assembly.

  - Eliminate translating certain instructions, such as pshufb, palignr,
    vprotd, etc, to .byte directives. This is meant for compatibility
    with ancient toolchains, but presumably it is unnecessary here,
    since the build system already does checks on what GNU as can
    assemble.

  - When aligning the stack, the original code was saving %rsp to %r9.
    To keep objtool happy, we use instead the DRAP idiom to save %rsp
    to %r10:

      leaq    8(%rsp),%r10
      ... code here ...
      leaq    -8(%r10),%rsp

  - The original code assumes the stack comes aligned to 16 bytes. This
    is not necessarily the case, and to avoid crashes,
    `andq $-alignment, %rsp` was added in the prolog of a few functions.

  - The original hardcodes returns as .byte 0xf3,0xc3, aka "rep ret".
    We replace this by "ret". "rep ret" was meant to help with AMD K8
    chips, cf. http://repzret.org/p/repzret. It makes no sense to
    continue to use this kludge for code that won't even run on ancient
    AMD chips.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
---
 lib/zinc/Makefile                        |    4 +
 lib/zinc/chacha20/chacha20-x86_64-glue.h |  102 +
 lib/zinc/chacha20/chacha20-x86_64.S      | 2632 ++++++++++++++++++++++
 3 files changed, 2738 insertions(+)
 create mode 100644 lib/zinc/chacha20/chacha20-x86_64-glue.h
 create mode 100644 lib/zinc/chacha20/chacha20-x86_64.S

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 8d14cb13349a..32e4bd94ea0b 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -5,6 +5,10 @@ ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG
 
 ifeq ($(CONFIG_ZINC_CHACHA20),y)
 zinc-y += chacha20/chacha20.o
+ifeq ($(CONFIG_ZINC_ARCH_X86_64),y)
+zinc-y += chacha20/chacha20-x86_64.o
+CFLAGS_chacha20.o += -include $(srctree)/$(src)/chacha20/chacha20-x86_64-glue.h
+endif
 ifeq ($(CONFIG_ZINC_ARCH_ARM),y)
 zinc-y += chacha20/chacha20-arm.o
 CFLAGS_chacha20.o += -include $(srctree)/$(src)/chacha20/chacha20-arm-glue.h
diff --git a/lib/zinc/chacha20/chacha20-x86_64-glue.h b/lib/zinc/chacha20/chacha20-x86_64-glue.h
new file mode 100644
index 000000000000..e4f6c3162d3f
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-x86_64-glue.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/chacha20.h>
+#include <asm/fpu/api.h>
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+#include <asm/intel-family.h>
+
+#ifdef CONFIG_AS_SSSE3
+asmlinkage void hchacha20_ssse3(u8 *derived_key, const u8 *nonce,
+				const u8 *key);
+asmlinkage void chacha20_ssse3(u8 *out, const u8 *in, const size_t len,
+			       const u32 key[8], const u32 counter[4]);
+#endif
+#ifdef CONFIG_AS_AVX2
+asmlinkage void chacha20_avx2(u8 *out, const u8 *in, const size_t len,
+			      const u32 key[8], const u32 counter[4]);
+#endif
+#ifdef CONFIG_AS_AVX512
+asmlinkage void chacha20_avx512(u8 *out, const u8 *in, const size_t len,
+				const u32 key[8], const u32 counter[4]);
+asmlinkage void chacha20_avx512vl(u8 *out, const u8 *in, const size_t len,
+				  const u32 key[8], const u32 counter[4]);
+#endif
+
+static bool chacha20_use_ssse3 __ro_after_init;
+static bool chacha20_use_avx2 __ro_after_init;
+static bool chacha20_use_avx512 __ro_after_init;
+static bool chacha20_use_avx512vl __ro_after_init;
+
+void __init chacha20_fpu_init(void)
+{
+	chacha20_use_ssse3 = boot_cpu_has(X86_FEATURE_SSSE3);
+	chacha20_use_avx2 =
+		boot_cpu_has(X86_FEATURE_AVX) &&
+		boot_cpu_has(X86_FEATURE_AVX2) &&
+		cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
+	chacha20_use_avx512 =
+		boot_cpu_has(X86_FEATURE_AVX) &&
+		boot_cpu_has(X86_FEATURE_AVX2) &&
+		boot_cpu_has(X86_FEATURE_AVX512F) &&
+		cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
+				  XFEATURE_MASK_AVX512, NULL) &&
+		/* Skylake downclocks unacceptably much when using zmm. */
+		boot_cpu_data.x86_model != INTEL_FAM6_SKYLAKE_X;
+	chacha20_use_avx512vl =
+		boot_cpu_has(X86_FEATURE_AVX) &&
+		boot_cpu_has(X86_FEATURE_AVX2) &&
+		boot_cpu_has(X86_FEATURE_AVX512F) &&
+		boot_cpu_has(X86_FEATURE_AVX512VL) &&
+		cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
+				  XFEATURE_MASK_AVX512, NULL);
+}
+
+static inline bool chacha20_arch(u8 *dst, const u8 *src, const size_t len,
+				 const u32 key[8], const u32 counter[4],
+				 simd_context_t simd_context)
+{
+	if (simd_context != HAVE_FULL_SIMD)
+		return false;
+
+#ifdef CONFIG_AS_AVX512
+	if (chacha20_use_avx512) {
+		chacha20_avx512(dst, src, len, key, counter);
+		return true;
+	}
+	if (chacha20_use_avx512vl) {
+		chacha20_avx512vl(dst, src, len, key, counter);
+		return true;
+	}
+#endif
+#ifdef CONFIG_AS_AVX2
+	if (chacha20_use_avx2) {
+		chacha20_avx2(dst, src, len, key, counter);
+		return true;
+	}
+#endif
+#ifdef CONFIG_AS_SSSE3
+	if (chacha20_use_ssse3) {
+		chacha20_ssse3(dst, src, len, key, counter);
+		return true;
+	}
+#endif
+	return false;
+}
+
+static inline bool hchacha20_arch(u8 *derived_key, const u8 *nonce,
+				  const u8 *key, simd_context_t simd_context)
+{
+#if defined(CONFIG_AS_SSSE3)
+	if (simd_context == HAVE_FULL_SIMD && chacha20_use_ssse3) {
+		hchacha20_ssse3(derived_key, nonce, key);
+		return true;
+	}
+#endif
+	return false;
+}
+
+#define HAVE_CHACHA20_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/chacha20/chacha20-x86_64.S b/lib/zinc/chacha20/chacha20-x86_64.S
new file mode 100644
index 000000000000..3f503a319692
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-x86_64.S
@@ -0,0 +1,2632 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ *
+ * Copyright (C) 2017 Samuel Neves <sneves@dei.uc.pt>. All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#include <linux/linkage.h>
+
+.section .rodata.cst16.Lzero, "aM", @progbits, 16
+.align	16
+.Lzero:
+.long	0,0,0,0
+.section .rodata.cst16.Lone, "aM", @progbits, 16
+.align	16
+.Lone:
+.long	1,0,0,0
+.section .rodata.cst16.Linc, "aM", @progbits, 16
+.align	16
+.Linc:
+.long	0,1,2,3
+.section .rodata.cst16.Lfour, "aM", @progbits, 16
+.align	16
+.Lfour:
+.long	4,4,4,4
+.section .rodata.cst32.Lincy, "aM", @progbits, 32
+.align	32
+.Lincy:
+.long	0,2,4,6,1,3,5,7
+.section .rodata.cst32.Leight, "aM", @progbits, 32
+.align	32
+.Leight:
+.long	8,8,8,8,8,8,8,8
+.section .rodata.cst16.Lrot16, "aM", @progbits, 16
+.align	16
+.Lrot16:
+.byte	0x2,0x3,0x0,0x1, 0x6,0x7,0x4,0x5, 0xa,0xb,0x8,0x9, 0xe,0xf,0xc,0xd
+.section .rodata.cst16.Lrot24, "aM", @progbits, 16
+.align	16
+.Lrot24:
+.byte	0x3,0x0,0x1,0x2, 0x7,0x4,0x5,0x6, 0xb,0x8,0x9,0xa, 0xf,0xc,0xd,0xe
+.section .rodata.cst16.Lsigma, "aM", @progbits, 16
+.align	16
+.Lsigma:
+.byte	101,120,112,97,110,100,32,51,50,45,98,121,116,101,32,107,0
+.section .rodata.cst64.Lzeroz, "aM", @progbits, 64
+.align	64
+.Lzeroz:
+.long	0,0,0,0, 1,0,0,0, 2,0,0,0, 3,0,0,0
+.section .rodata.cst64.Lfourz, "aM", @progbits, 64
+.align	64
+.Lfourz:
+.long	4,0,0,0, 4,0,0,0, 4,0,0,0, 4,0,0,0
+.section .rodata.cst64.Lincz, "aM", @progbits, 64
+.align	64
+.Lincz:
+.long	0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
+.section .rodata.cst64.Lsixteen, "aM", @progbits, 64
+.align	64
+.Lsixteen:
+.long	16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16
+.section .rodata.cst32.Ltwoy, "aM", @progbits, 32
+.align	64
+.Ltwoy:
+.long	2,0,0,0, 2,0,0,0
+
+.text
+
+#ifdef CONFIG_AS_SSSE3
+.align	32
+ENTRY(hchacha20_ssse3)
+	movdqa	.Lsigma(%rip),%xmm0
+	movdqu	(%rdx),%xmm1
+	movdqu	16(%rdx),%xmm2
+	movdqu	(%rsi),%xmm3
+	movdqa	.Lrot16(%rip),%xmm6
+	movdqa	.Lrot24(%rip),%xmm7
+	movq	$10,%r8
+	.align	32
+.Loop_hssse3:
+	paddd	%xmm1,%xmm0
+	pxor	%xmm0,%xmm3
+	pshufb	%xmm6,%xmm3
+	paddd	%xmm3,%xmm2
+	pxor	%xmm2,%xmm1
+	movdqa	%xmm1,%xmm4
+	psrld	$20,%xmm1
+	pslld	$12,%xmm4
+	por	%xmm4,%xmm1
+	paddd	%xmm1,%xmm0
+	pxor	%xmm0,%xmm3
+	pshufb	%xmm7,%xmm3
+	paddd	%xmm3,%xmm2
+	pxor	%xmm2,%xmm1
+	movdqa	%xmm1,%xmm4
+	psrld	$25,%xmm1
+	pslld	$7,%xmm4
+	por	%xmm4,%xmm1
+	pshufd	$78,%xmm2,%xmm2
+	pshufd	$57,%xmm1,%xmm1
+	pshufd	$147,%xmm3,%xmm3
+	nop
+	paddd	%xmm1,%xmm0
+	pxor	%xmm0,%xmm3
+	pshufb	%xmm6,%xmm3
+	paddd	%xmm3,%xmm2
+	pxor	%xmm2,%xmm1
+	movdqa	%xmm1,%xmm4
+	psrld	$20,%xmm1
+	pslld	$12,%xmm4
+	por	%xmm4,%xmm1
+	paddd	%xmm1,%xmm0
+	pxor	%xmm0,%xmm3
+	pshufb	%xmm7,%xmm3
+	paddd	%xmm3,%xmm2
+	pxor	%xmm2,%xmm1
+	movdqa	%xmm1,%xmm4
+	psrld	$25,%xmm1
+	pslld	$7,%xmm4
+	por	%xmm4,%xmm1
+	pshufd	$78,%xmm2,%xmm2
+	pshufd	$147,%xmm1,%xmm1
+	pshufd	$57,%xmm3,%xmm3
+	decq	%r8
+	jnz	.Loop_hssse3
+	movdqu	%xmm0,0(%rdi)
+	movdqu	%xmm3,16(%rdi)
+	ret
+ENDPROC(hchacha20_ssse3)
+
+.align	32
+ENTRY(chacha20_ssse3)
+.Lchacha20_ssse3:
+	cmpq	$0,%rdx
+	je	.Lssse3_epilogue
+	leaq	8(%rsp),%r10
+
+	cmpq	$128,%rdx
+	ja	.Lchacha20_4x
+
+.Ldo_sse3_after_all:
+	subq	$64+8,%rsp
+	andq	$-32,%rsp
+	movdqa	.Lsigma(%rip),%xmm0
+	movdqu	(%rcx),%xmm1
+	movdqu	16(%rcx),%xmm2
+	movdqu	(%r8),%xmm3
+	movdqa	.Lrot16(%rip),%xmm6
+	movdqa	.Lrot24(%rip),%xmm7
+
+	movdqa	%xmm0,0(%rsp)
+	movdqa	%xmm1,16(%rsp)
+	movdqa	%xmm2,32(%rsp)
+	movdqa	%xmm3,48(%rsp)
+	movq	$10,%r8
+	jmp	.Loop_ssse3
+
+.align	32
+.Loop_outer_ssse3:
+	movdqa	.Lone(%rip),%xmm3
+	movdqa	0(%rsp),%xmm0
+	movdqa	16(%rsp),%xmm1
+	movdqa	32(%rsp),%xmm2
+	paddd	48(%rsp),%xmm3
+	movq	$10,%r8
+	movdqa	%xmm3,48(%rsp)
+	jmp	.Loop_ssse3
+
+.align	32
+.Loop_ssse3:
+	paddd	%xmm1,%xmm0
+	pxor	%xmm0,%xmm3
+	pshufb	%xmm6,%xmm3
+	paddd	%xmm3,%xmm2
+	pxor	%xmm2,%xmm1
+	movdqa	%xmm1,%xmm4
+	psrld	$20,%xmm1
+	pslld	$12,%xmm4
+	por	%xmm4,%xmm1
+	paddd	%xmm1,%xmm0
+	pxor	%xmm0,%xmm3
+	pshufb	%xmm7,%xmm3
+	paddd	%xmm3,%xmm2
+	pxor	%xmm2,%xmm1
+	movdqa	%xmm1,%xmm4
+	psrld	$25,%xmm1
+	pslld	$7,%xmm4
+	por	%xmm4,%xmm1
+	pshufd	$78,%xmm2,%xmm2
+	pshufd	$57,%xmm1,%xmm1
+	pshufd	$147,%xmm3,%xmm3
+	nop
+	paddd	%xmm1,%xmm0
+	pxor	%xmm0,%xmm3
+	pshufb	%xmm6,%xmm3
+	paddd	%xmm3,%xmm2
+	pxor	%xmm2,%xmm1
+	movdqa	%xmm1,%xmm4
+	psrld	$20,%xmm1
+	pslld	$12,%xmm4
+	por	%xmm4,%xmm1
+	paddd	%xmm1,%xmm0
+	pxor	%xmm0,%xmm3
+	pshufb	%xmm7,%xmm3
+	paddd	%xmm3,%xmm2
+	pxor	%xmm2,%xmm1
+	movdqa	%xmm1,%xmm4
+	psrld	$25,%xmm1
+	pslld	$7,%xmm4
+	por	%xmm4,%xmm1
+	pshufd	$78,%xmm2,%xmm2
+	pshufd	$147,%xmm1,%xmm1
+	pshufd	$57,%xmm3,%xmm3
+	decq	%r8
+	jnz	.Loop_ssse3
+	paddd	0(%rsp),%xmm0
+	paddd	16(%rsp),%xmm1
+	paddd	32(%rsp),%xmm2
+	paddd	48(%rsp),%xmm3
+
+	cmpq	$64,%rdx
+	jb	.Ltail_ssse3
+
+	movdqu	0(%rsi),%xmm4
+	movdqu	16(%rsi),%xmm5
+	pxor	%xmm4,%xmm0
+	movdqu	32(%rsi),%xmm4
+	pxor	%xmm5,%xmm1
+	movdqu	48(%rsi),%xmm5
+	leaq	64(%rsi),%rsi
+	pxor	%xmm4,%xmm2
+	pxor	%xmm5,%xmm3
+
+	movdqu	%xmm0,0(%rdi)
+	movdqu	%xmm1,16(%rdi)
+	movdqu	%xmm2,32(%rdi)
+	movdqu	%xmm3,48(%rdi)
+	leaq	64(%rdi),%rdi
+
+	subq	$64,%rdx
+	jnz	.Loop_outer_ssse3
+
+	jmp	.Ldone_ssse3
+
+.align	16
+.Ltail_ssse3:
+	movdqa	%xmm0,0(%rsp)
+	movdqa	%xmm1,16(%rsp)
+	movdqa	%xmm2,32(%rsp)
+	movdqa	%xmm3,48(%rsp)
+	xorq	%r8,%r8
+
+.Loop_tail_ssse3:
+	movzbl	(%rsi,%r8,1),%eax
+	movzbl	(%rsp,%r8,1),%ecx
+	leaq	1(%r8),%r8
+	xorl	%ecx,%eax
+	movb	%al,-1(%rdi,%r8,1)
+	decq	%rdx
+	jnz	.Loop_tail_ssse3
+
+.Ldone_ssse3:
+	leaq	-8(%r10),%rsp
+
+.Lssse3_epilogue:
+	ret
+
+.align	32
+.Lchacha20_4x:
+	leaq	8(%rsp),%r10
+
+.Lproceed4x:
+	subq	$0x140+8,%rsp
+	andq	$-32,%rsp
+	movdqa	.Lsigma(%rip),%xmm11
+	movdqu	(%rcx),%xmm15
+	movdqu	16(%rcx),%xmm7
+	movdqu	(%r8),%xmm3
+	leaq	256(%rsp),%rcx
+	leaq	.Lrot16(%rip),%r9
+	leaq	.Lrot24(%rip),%r11
+
+	pshufd	$0x00,%xmm11,%xmm8
+	pshufd	$0x55,%xmm11,%xmm9
+	movdqa	%xmm8,64(%rsp)
+	pshufd	$0xaa,%xmm11,%xmm10
+	movdqa	%xmm9,80(%rsp)
+	pshufd	$0xff,%xmm11,%xmm11
+	movdqa	%xmm10,96(%rsp)
+	movdqa	%xmm11,112(%rsp)
+
+	pshufd	$0x00,%xmm15,%xmm12
+	pshufd	$0x55,%xmm15,%xmm13
+	movdqa	%xmm12,128-256(%rcx)
+	pshufd	$0xaa,%xmm15,%xmm14
+	movdqa	%xmm13,144-256(%rcx)
+	pshufd	$0xff,%xmm15,%xmm15
+	movdqa	%xmm14,160-256(%rcx)
+	movdqa	%xmm15,176-256(%rcx)
+
+	pshufd	$0x00,%xmm7,%xmm4
+	pshufd	$0x55,%xmm7,%xmm5
+	movdqa	%xmm4,192-256(%rcx)
+	pshufd	$0xaa,%xmm7,%xmm6
+	movdqa	%xmm5,208-256(%rcx)
+	pshufd	$0xff,%xmm7,%xmm7
+	movdqa	%xmm6,224-256(%rcx)
+	movdqa	%xmm7,240-256(%rcx)
+
+	pshufd	$0x00,%xmm3,%xmm0
+	pshufd	$0x55,%xmm3,%xmm1
+	paddd	.Linc(%rip),%xmm0
+	pshufd	$0xaa,%xmm3,%xmm2
+	movdqa	%xmm1,272-256(%rcx)
+	pshufd	$0xff,%xmm3,%xmm3
+	movdqa	%xmm2,288-256(%rcx)
+	movdqa	%xmm3,304-256(%rcx)
+
+	jmp	.Loop_enter4x
+
+.align	32
+.Loop_outer4x:
+	movdqa	64(%rsp),%xmm8
+	movdqa	80(%rsp),%xmm9
+	movdqa	96(%rsp),%xmm10
+	movdqa	112(%rsp),%xmm11
+	movdqa	128-256(%rcx),%xmm12
+	movdqa	144-256(%rcx),%xmm13
+	movdqa	160-256(%rcx),%xmm14
+	movdqa	176-256(%rcx),%xmm15
+	movdqa	192-256(%rcx),%xmm4
+	movdqa	208-256(%rcx),%xmm5
+	movdqa	224-256(%rcx),%xmm6
+	movdqa	240-256(%rcx),%xmm7
+	movdqa	256-256(%rcx),%xmm0
+	movdqa	272-256(%rcx),%xmm1
+	movdqa	288-256(%rcx),%xmm2
+	movdqa	304-256(%rcx),%xmm3
+	paddd	.Lfour(%rip),%xmm0
+
+.Loop_enter4x:
+	movdqa	%xmm6,32(%rsp)
+	movdqa	%xmm7,48(%rsp)
+	movdqa	(%r9),%xmm7
+	movl	$10,%eax
+	movdqa	%xmm0,256-256(%rcx)
+	jmp	.Loop4x
+
+.align	32
+.Loop4x:
+	paddd	%xmm12,%xmm8
+	paddd	%xmm13,%xmm9
+	pxor	%xmm8,%xmm0
+	pxor	%xmm9,%xmm1
+	pshufb	%xmm7,%xmm0
+	pshufb	%xmm7,%xmm1
+	paddd	%xmm0,%xmm4
+	paddd	%xmm1,%xmm5
+	pxor	%xmm4,%xmm12
+	pxor	%xmm5,%xmm13
+	movdqa	%xmm12,%xmm6
+	pslld	$12,%xmm12
+	psrld	$20,%xmm6
+	movdqa	%xmm13,%xmm7
+	pslld	$12,%xmm13
+	por	%xmm6,%xmm12
+	psrld	$20,%xmm7
+	movdqa	(%r11),%xmm6
+	por	%xmm7,%xmm13
+	paddd	%xmm12,%xmm8
+	paddd	%xmm13,%xmm9
+	pxor	%xmm8,%xmm0
+	pxor	%xmm9,%xmm1
+	pshufb	%xmm6,%xmm0
+	pshufb	%xmm6,%xmm1
+	paddd	%xmm0,%xmm4
+	paddd	%xmm1,%xmm5
+	pxor	%xmm4,%xmm12
+	pxor	%xmm5,%xmm13
+	movdqa	%xmm12,%xmm7
+	pslld	$7,%xmm12
+	psrld	$25,%xmm7
+	movdqa	%xmm13,%xmm6
+	pslld	$7,%xmm13
+	por	%xmm7,%xmm12
+	psrld	$25,%xmm6
+	movdqa	(%r9),%xmm7
+	por	%xmm6,%xmm13
+	movdqa	%xmm4,0(%rsp)
+	movdqa	%xmm5,16(%rsp)
+	movdqa	32(%rsp),%xmm4
+	movdqa	48(%rsp),%xmm5
+	paddd	%xmm14,%xmm10
+	paddd	%xmm15,%xmm11
+	pxor	%xmm10,%xmm2
+	pxor	%xmm11,%xmm3
+	pshufb	%xmm7,%xmm2
+	pshufb	%xmm7,%xmm3
+	paddd	%xmm2,%xmm4
+	paddd	%xmm3,%xmm5
+	pxor	%xmm4,%xmm14
+	pxor	%xmm5,%xmm15
+	movdqa	%xmm14,%xmm6
+	pslld	$12,%xmm14
+	psrld	$20,%xmm6
+	movdqa	%xmm15,%xmm7
+	pslld	$12,%xmm15
+	por	%xmm6,%xmm14
+	psrld	$20,%xmm7
+	movdqa	(%r11),%xmm6
+	por	%xmm7,%xmm15
+	paddd	%xmm14,%xmm10
+	paddd	%xmm15,%xmm11
+	pxor	%xmm10,%xmm2
+	pxor	%xmm11,%xmm3
+	pshufb	%xmm6,%xmm2
+	pshufb	%xmm6,%xmm3
+	paddd	%xmm2,%xmm4
+	paddd	%xmm3,%xmm5
+	pxor	%xmm4,%xmm14
+	pxor	%xmm5,%xmm15
+	movdqa	%xmm14,%xmm7
+	pslld	$7,%xmm14
+	psrld	$25,%xmm7
+	movdqa	%xmm15,%xmm6
+	pslld	$7,%xmm15
+	por	%xmm7,%xmm14
+	psrld	$25,%xmm6
+	movdqa	(%r9),%xmm7
+	por	%xmm6,%xmm15
+	paddd	%xmm13,%xmm8
+	paddd	%xmm14,%xmm9
+	pxor	%xmm8,%xmm3
+	pxor	%xmm9,%xmm0
+	pshufb	%xmm7,%xmm3
+	pshufb	%xmm7,%xmm0
+	paddd	%xmm3,%xmm4
+	paddd	%xmm0,%xmm5
+	pxor	%xmm4,%xmm13
+	pxor	%xmm5,%xmm14
+	movdqa	%xmm13,%xmm6
+	pslld	$12,%xmm13
+	psrld	$20,%xmm6
+	movdqa	%xmm14,%xmm7
+	pslld	$12,%xmm14
+	por	%xmm6,%xmm13
+	psrld	$20,%xmm7
+	movdqa	(%r11),%xmm6
+	por	%xmm7,%xmm14
+	paddd	%xmm13,%xmm8
+	paddd	%xmm14,%xmm9
+	pxor	%xmm8,%xmm3
+	pxor	%xmm9,%xmm0
+	pshufb	%xmm6,%xmm3
+	pshufb	%xmm6,%xmm0
+	paddd	%xmm3,%xmm4
+	paddd	%xmm0,%xmm5
+	pxor	%xmm4,%xmm13
+	pxor	%xmm5,%xmm14
+	movdqa	%xmm13,%xmm7
+	pslld	$7,%xmm13
+	psrld	$25,%xmm7
+	movdqa	%xmm14,%xmm6
+	pslld	$7,%xmm14
+	por	%xmm7,%xmm13
+	psrld	$25,%xmm6
+	movdqa	(%r9),%xmm7
+	por	%xmm6,%xmm14
+	movdqa	%xmm4,32(%rsp)
+	movdqa	%xmm5,48(%rsp)
+	movdqa	0(%rsp),%xmm4
+	movdqa	16(%rsp),%xmm5
+	paddd	%xmm15,%xmm10
+	paddd	%xmm12,%xmm11
+	pxor	%xmm10,%xmm1
+	pxor	%xmm11,%xmm2
+	pshufb	%xmm7,%xmm1
+	pshufb	%xmm7,%xmm2
+	paddd	%xmm1,%xmm4
+	paddd	%xmm2,%xmm5
+	pxor	%xmm4,%xmm15
+	pxor	%xmm5,%xmm12
+	movdqa	%xmm15,%xmm6
+	pslld	$12,%xmm15
+	psrld	$20,%xmm6
+	movdqa	%xmm12,%xmm7
+	pslld	$12,%xmm12
+	por	%xmm6,%xmm15
+	psrld	$20,%xmm7
+	movdqa	(%r11),%xmm6
+	por	%xmm7,%xmm12
+	paddd	%xmm15,%xmm10
+	paddd	%xmm12,%xmm11
+	pxor	%xmm10,%xmm1
+	pxor	%xmm11,%xmm2
+	pshufb	%xmm6,%xmm1
+	pshufb	%xmm6,%xmm2
+	paddd	%xmm1,%xmm4
+	paddd	%xmm2,%xmm5
+	pxor	%xmm4,%xmm15
+	pxor	%xmm5,%xmm12
+	movdqa	%xmm15,%xmm7
+	pslld	$7,%xmm15
+	psrld	$25,%xmm7
+	movdqa	%xmm12,%xmm6
+	pslld	$7,%xmm12
+	por	%xmm7,%xmm15
+	psrld	$25,%xmm6
+	movdqa	(%r9),%xmm7
+	por	%xmm6,%xmm12
+	decl	%eax
+	jnz	.Loop4x
+
+	paddd	64(%rsp),%xmm8
+	paddd	80(%rsp),%xmm9
+	paddd	96(%rsp),%xmm10
+	paddd	112(%rsp),%xmm11
+
+	movdqa	%xmm8,%xmm6
+	punpckldq	%xmm9,%xmm8
+	movdqa	%xmm10,%xmm7
+	punpckldq	%xmm11,%xmm10
+	punpckhdq	%xmm9,%xmm6
+	punpckhdq	%xmm11,%xmm7
+	movdqa	%xmm8,%xmm9
+	punpcklqdq	%xmm10,%xmm8
+	movdqa	%xmm6,%xmm11
+	punpcklqdq	%xmm7,%xmm6
+	punpckhqdq	%xmm10,%xmm9
+	punpckhqdq	%xmm7,%xmm11
+	paddd	128-256(%rcx),%xmm12
+	paddd	144-256(%rcx),%xmm13
+	paddd	160-256(%rcx),%xmm14
+	paddd	176-256(%rcx),%xmm15
+
+	movdqa	%xmm8,0(%rsp)
+	movdqa	%xmm9,16(%rsp)
+	movdqa	32(%rsp),%xmm8
+	movdqa	48(%rsp),%xmm9
+
+	movdqa	%xmm12,%xmm10
+	punpckldq	%xmm13,%xmm12
+	movdqa	%xmm14,%xmm7
+	punpckldq	%xmm15,%xmm14
+	punpckhdq	%xmm13,%xmm10
+	punpckhdq	%xmm15,%xmm7
+	movdqa	%xmm12,%xmm13
+	punpcklqdq	%xmm14,%xmm12
+	movdqa	%xmm10,%xmm15
+	punpcklqdq	%xmm7,%xmm10
+	punpckhqdq	%xmm14,%xmm13
+	punpckhqdq	%xmm7,%xmm15
+	paddd	192-256(%rcx),%xmm4
+	paddd	208-256(%rcx),%xmm5
+	paddd	224-256(%rcx),%xmm8
+	paddd	240-256(%rcx),%xmm9
+
+	movdqa	%xmm6,32(%rsp)
+	movdqa	%xmm11,48(%rsp)
+
+	movdqa	%xmm4,%xmm14
+	punpckldq	%xmm5,%xmm4
+	movdqa	%xmm8,%xmm7
+	punpckldq	%xmm9,%xmm8
+	punpckhdq	%xmm5,%xmm14
+	punpckhdq	%xmm9,%xmm7
+	movdqa	%xmm4,%xmm5
+	punpcklqdq	%xmm8,%xmm4
+	movdqa	%xmm14,%xmm9
+	punpcklqdq	%xmm7,%xmm14
+	punpckhqdq	%xmm8,%xmm5
+	punpckhqdq	%xmm7,%xmm9
+	paddd	256-256(%rcx),%xmm0
+	paddd	272-256(%rcx),%xmm1
+	paddd	288-256(%rcx),%xmm2
+	paddd	304-256(%rcx),%xmm3
+
+	movdqa	%xmm0,%xmm8
+	punpckldq	%xmm1,%xmm0
+	movdqa	%xmm2,%xmm7
+	punpckldq	%xmm3,%xmm2
+	punpckhdq	%xmm1,%xmm8
+	punpckhdq	%xmm3,%xmm7
+	movdqa	%xmm0,%xmm1
+	punpcklqdq	%xmm2,%xmm0
+	movdqa	%xmm8,%xmm3
+	punpcklqdq	%xmm7,%xmm8
+	punpckhqdq	%xmm2,%xmm1
+	punpckhqdq	%xmm7,%xmm3
+	cmpq	$256,%rdx
+	jb	.Ltail4x
+
+	movdqu	0(%rsi),%xmm6
+	movdqu	16(%rsi),%xmm11
+	movdqu	32(%rsi),%xmm2
+	movdqu	48(%rsi),%xmm7
+	pxor	0(%rsp),%xmm6
+	pxor	%xmm12,%xmm11
+	pxor	%xmm4,%xmm2
+	pxor	%xmm0,%xmm7
+
+	movdqu	%xmm6,0(%rdi)
+	movdqu	64(%rsi),%xmm6
+	movdqu	%xmm11,16(%rdi)
+	movdqu	80(%rsi),%xmm11
+	movdqu	%xmm2,32(%rdi)
+	movdqu	96(%rsi),%xmm2
+	movdqu	%xmm7,48(%rdi)
+	movdqu	112(%rsi),%xmm7
+	leaq	128(%rsi),%rsi
+	pxor	16(%rsp),%xmm6
+	pxor	%xmm13,%xmm11
+	pxor	%xmm5,%xmm2
+	pxor	%xmm1,%xmm7
+
+	movdqu	%xmm6,64(%rdi)
+	movdqu	0(%rsi),%xmm6
+	movdqu	%xmm11,80(%rdi)
+	movdqu	16(%rsi),%xmm11
+	movdqu	%xmm2,96(%rdi)
+	movdqu	32(%rsi),%xmm2
+	movdqu	%xmm7,112(%rdi)
+	leaq	128(%rdi),%rdi
+	movdqu	48(%rsi),%xmm7
+	pxor	32(%rsp),%xmm6
+	pxor	%xmm10,%xmm11
+	pxor	%xmm14,%xmm2
+	pxor	%xmm8,%xmm7
+
+	movdqu	%xmm6,0(%rdi)
+	movdqu	64(%rsi),%xmm6
+	movdqu	%xmm11,16(%rdi)
+	movdqu	80(%rsi),%xmm11
+	movdqu	%xmm2,32(%rdi)
+	movdqu	96(%rsi),%xmm2
+	movdqu	%xmm7,48(%rdi)
+	movdqu	112(%rsi),%xmm7
+	leaq	128(%rsi),%rsi
+	pxor	48(%rsp),%xmm6
+	pxor	%xmm15,%xmm11
+	pxor	%xmm9,%xmm2
+	pxor	%xmm3,%xmm7
+	movdqu	%xmm6,64(%rdi)
+	movdqu	%xmm11,80(%rdi)
+	movdqu	%xmm2,96(%rdi)
+	movdqu	%xmm7,112(%rdi)
+	leaq	128(%rdi),%rdi
+
+	subq	$256,%rdx
+	jnz	.Loop_outer4x
+
+	jmp	.Ldone4x
+
+.Ltail4x:
+	cmpq	$192,%rdx
+	jae	.L192_or_more4x
+	cmpq	$128,%rdx
+	jae	.L128_or_more4x
+	cmpq	$64,%rdx
+	jae	.L64_or_more4x
+
+
+	xorq	%r9,%r9
+
+	movdqa	%xmm12,16(%rsp)
+	movdqa	%xmm4,32(%rsp)
+	movdqa	%xmm0,48(%rsp)
+	jmp	.Loop_tail4x
+
+.align	32
+.L64_or_more4x:
+	movdqu	0(%rsi),%xmm6
+	movdqu	16(%rsi),%xmm11
+	movdqu	32(%rsi),%xmm2
+	movdqu	48(%rsi),%xmm7
+	pxor	0(%rsp),%xmm6
+	pxor	%xmm12,%xmm11
+	pxor	%xmm4,%xmm2
+	pxor	%xmm0,%xmm7
+	movdqu	%xmm6,0(%rdi)
+	movdqu	%xmm11,16(%rdi)
+	movdqu	%xmm2,32(%rdi)
+	movdqu	%xmm7,48(%rdi)
+	je	.Ldone4x
+
+	movdqa	16(%rsp),%xmm6
+	leaq	64(%rsi),%rsi
+	xorq	%r9,%r9
+	movdqa	%xmm6,0(%rsp)
+	movdqa	%xmm13,16(%rsp)
+	leaq	64(%rdi),%rdi
+	movdqa	%xmm5,32(%rsp)
+	subq	$64,%rdx
+	movdqa	%xmm1,48(%rsp)
+	jmp	.Loop_tail4x
+
+.align	32
+.L128_or_more4x:
+	movdqu	0(%rsi),%xmm6
+	movdqu	16(%rsi),%xmm11
+	movdqu	32(%rsi),%xmm2
+	movdqu	48(%rsi),%xmm7
+	pxor	0(%rsp),%xmm6
+	pxor	%xmm12,%xmm11
+	pxor	%xmm4,%xmm2
+	pxor	%xmm0,%xmm7
+
+	movdqu	%xmm6,0(%rdi)
+	movdqu	64(%rsi),%xmm6
+	movdqu	%xmm11,16(%rdi)
+	movdqu	80(%rsi),%xmm11
+	movdqu	%xmm2,32(%rdi)
+	movdqu	96(%rsi),%xmm2
+	movdqu	%xmm7,48(%rdi)
+	movdqu	112(%rsi),%xmm7
+	pxor	16(%rsp),%xmm6
+	pxor	%xmm13,%xmm11
+	pxor	%xmm5,%xmm2
+	pxor	%xmm1,%xmm7
+	movdqu	%xmm6,64(%rdi)
+	movdqu	%xmm11,80(%rdi)
+	movdqu	%xmm2,96(%rdi)
+	movdqu	%xmm7,112(%rdi)
+	je	.Ldone4x
+
+	movdqa	32(%rsp),%xmm6
+	leaq	128(%rsi),%rsi
+	xorq	%r9,%r9
+	movdqa	%xmm6,0(%rsp)
+	movdqa	%xmm10,16(%rsp)
+	leaq	128(%rdi),%rdi
+	movdqa	%xmm14,32(%rsp)
+	subq	$128,%rdx
+	movdqa	%xmm8,48(%rsp)
+	jmp	.Loop_tail4x
+
+.align	32
+.L192_or_more4x:
+	movdqu	0(%rsi),%xmm6
+	movdqu	16(%rsi),%xmm11
+	movdqu	32(%rsi),%xmm2
+	movdqu	48(%rsi),%xmm7
+	pxor	0(%rsp),%xmm6
+	pxor	%xmm12,%xmm11
+	pxor	%xmm4,%xmm2
+	pxor	%xmm0,%xmm7
+
+	movdqu	%xmm6,0(%rdi)
+	movdqu	64(%rsi),%xmm6
+	movdqu	%xmm11,16(%rdi)
+	movdqu	80(%rsi),%xmm11
+	movdqu	%xmm2,32(%rdi)
+	movdqu	96(%rsi),%xmm2
+	movdqu	%xmm7,48(%rdi)
+	movdqu	112(%rsi),%xmm7
+	leaq	128(%rsi),%rsi
+	pxor	16(%rsp),%xmm6
+	pxor	%xmm13,%xmm11
+	pxor	%xmm5,%xmm2
+	pxor	%xmm1,%xmm7
+
+	movdqu	%xmm6,64(%rdi)
+	movdqu	0(%rsi),%xmm6
+	movdqu	%xmm11,80(%rdi)
+	movdqu	16(%rsi),%xmm11
+	movdqu	%xmm2,96(%rdi)
+	movdqu	32(%rsi),%xmm2
+	movdqu	%xmm7,112(%rdi)
+	leaq	128(%rdi),%rdi
+	movdqu	48(%rsi),%xmm7
+	pxor	32(%rsp),%xmm6
+	pxor	%xmm10,%xmm11
+	pxor	%xmm14,%xmm2
+	pxor	%xmm8,%xmm7
+	movdqu	%xmm6,0(%rdi)
+	movdqu	%xmm11,16(%rdi)
+	movdqu	%xmm2,32(%rdi)
+	movdqu	%xmm7,48(%rdi)
+	je	.Ldone4x
+
+	movdqa	48(%rsp),%xmm6
+	leaq	64(%rsi),%rsi
+	xorq	%r9,%r9
+	movdqa	%xmm6,0(%rsp)
+	movdqa	%xmm15,16(%rsp)
+	leaq	64(%rdi),%rdi
+	movdqa	%xmm9,32(%rsp)
+	subq	$192,%rdx
+	movdqa	%xmm3,48(%rsp)
+
+.Loop_tail4x:
+	movzbl	(%rsi,%r9,1),%eax
+	movzbl	(%rsp,%r9,1),%ecx
+	leaq	1(%r9),%r9
+	xorl	%ecx,%eax
+	movb	%al,-1(%rdi,%r9,1)
+	decq	%rdx
+	jnz	.Loop_tail4x
+
+.Ldone4x:
+	leaq	-8(%r10),%rsp
+
+.L4x_epilogue:
+	ret
+ENDPROC(chacha20_ssse3)
+#endif /* CONFIG_AS_SSSE3 */
+
+#ifdef CONFIG_AS_AVX2
+.align	32
+ENTRY(chacha20_avx2)
+.Lchacha20_avx2:
+	cmpq	$0,%rdx
+	je	.L8x_epilogue
+	leaq	8(%rsp),%r10
+
+	subq	$0x280+8,%rsp
+	andq	$-32,%rsp
+	vzeroupper
+
+	vbroadcasti128	.Lsigma(%rip),%ymm11
+	vbroadcasti128	(%rcx),%ymm3
+	vbroadcasti128	16(%rcx),%ymm15
+	vbroadcasti128	(%r8),%ymm7
+	leaq	256(%rsp),%rcx
+	leaq	512(%rsp),%rax
+	leaq	.Lrot16(%rip),%r9
+	leaq	.Lrot24(%rip),%r11
+
+	vpshufd	$0x00,%ymm11,%ymm8
+	vpshufd	$0x55,%ymm11,%ymm9
+	vmovdqa	%ymm8,128-256(%rcx)
+	vpshufd	$0xaa,%ymm11,%ymm10
+	vmovdqa	%ymm9,160-256(%rcx)
+	vpshufd	$0xff,%ymm11,%ymm11
+	vmovdqa	%ymm10,192-256(%rcx)
+	vmovdqa	%ymm11,224-256(%rcx)
+
+	vpshufd	$0x00,%ymm3,%ymm0
+	vpshufd	$0x55,%ymm3,%ymm1
+	vmovdqa	%ymm0,256-256(%rcx)
+	vpshufd	$0xaa,%ymm3,%ymm2
+	vmovdqa	%ymm1,288-256(%rcx)
+	vpshufd	$0xff,%ymm3,%ymm3
+	vmovdqa	%ymm2,320-256(%rcx)
+	vmovdqa	%ymm3,352-256(%rcx)
+
+	vpshufd	$0x00,%ymm15,%ymm12
+	vpshufd	$0x55,%ymm15,%ymm13
+	vmovdqa	%ymm12,384-512(%rax)
+	vpshufd	$0xaa,%ymm15,%ymm14
+	vmovdqa	%ymm13,416-512(%rax)
+	vpshufd	$0xff,%ymm15,%ymm15
+	vmovdqa	%ymm14,448-512(%rax)
+	vmovdqa	%ymm15,480-512(%rax)
+
+	vpshufd	$0x00,%ymm7,%ymm4
+	vpshufd	$0x55,%ymm7,%ymm5
+	vpaddd	.Lincy(%rip),%ymm4,%ymm4
+	vpshufd	$0xaa,%ymm7,%ymm6
+	vmovdqa	%ymm5,544-512(%rax)
+	vpshufd	$0xff,%ymm7,%ymm7
+	vmovdqa	%ymm6,576-512(%rax)
+	vmovdqa	%ymm7,608-512(%rax)
+
+	jmp	.Loop_enter8x
+
+.align	32
+.Loop_outer8x:
+	vmovdqa	128-256(%rcx),%ymm8
+	vmovdqa	160-256(%rcx),%ymm9
+	vmovdqa	192-256(%rcx),%ymm10
+	vmovdqa	224-256(%rcx),%ymm11
+	vmovdqa	256-256(%rcx),%ymm0
+	vmovdqa	288-256(%rcx),%ymm1
+	vmovdqa	320-256(%rcx),%ymm2
+	vmovdqa	352-256(%rcx),%ymm3
+	vmovdqa	384-512(%rax),%ymm12
+	vmovdqa	416-512(%rax),%ymm13
+	vmovdqa	448-512(%rax),%ymm14
+	vmovdqa	480-512(%rax),%ymm15
+	vmovdqa	512-512(%rax),%ymm4
+	vmovdqa	544-512(%rax),%ymm5
+	vmovdqa	576-512(%rax),%ymm6
+	vmovdqa	608-512(%rax),%ymm7
+	vpaddd	.Leight(%rip),%ymm4,%ymm4
+
+.Loop_enter8x:
+	vmovdqa	%ymm14,64(%rsp)
+	vmovdqa	%ymm15,96(%rsp)
+	vbroadcasti128	(%r9),%ymm15
+	vmovdqa	%ymm4,512-512(%rax)
+	movl	$10,%eax
+	jmp	.Loop8x
+
+.align	32
+.Loop8x:
+	vpaddd	%ymm0,%ymm8,%ymm8
+	vpxor	%ymm4,%ymm8,%ymm4
+	vpshufb	%ymm15,%ymm4,%ymm4
+	vpaddd	%ymm1,%ymm9,%ymm9
+	vpxor	%ymm5,%ymm9,%ymm5
+	vpshufb	%ymm15,%ymm5,%ymm5
+	vpaddd	%ymm4,%ymm12,%ymm12
+	vpxor	%ymm0,%ymm12,%ymm0
+	vpslld	$12,%ymm0,%ymm14
+	vpsrld	$20,%ymm0,%ymm0
+	vpor	%ymm0,%ymm14,%ymm0
+	vbroadcasti128	(%r11),%ymm14
+	vpaddd	%ymm5,%ymm13,%ymm13
+	vpxor	%ymm1,%ymm13,%ymm1
+	vpslld	$12,%ymm1,%ymm15
+	vpsrld	$20,%ymm1,%ymm1
+	vpor	%ymm1,%ymm15,%ymm1
+	vpaddd	%ymm0,%ymm8,%ymm8
+	vpxor	%ymm4,%ymm8,%ymm4
+	vpshufb	%ymm14,%ymm4,%ymm4
+	vpaddd	%ymm1,%ymm9,%ymm9
+	vpxor	%ymm5,%ymm9,%ymm5
+	vpshufb	%ymm14,%ymm5,%ymm5
+	vpaddd	%ymm4,%ymm12,%ymm12
+	vpxor	%ymm0,%ymm12,%ymm0
+	vpslld	$7,%ymm0,%ymm15
+	vpsrld	$25,%ymm0,%ymm0
+	vpor	%ymm0,%ymm15,%ymm0
+	vbroadcasti128	(%r9),%ymm15
+	vpaddd	%ymm5,%ymm13,%ymm13
+	vpxor	%ymm1,%ymm13,%ymm1
+	vpslld	$7,%ymm1,%ymm14
+	vpsrld	$25,%ymm1,%ymm1
+	vpor	%ymm1,%ymm14,%ymm1
+	vmovdqa	%ymm12,0(%rsp)
+	vmovdqa	%ymm13,32(%rsp)
+	vmovdqa	64(%rsp),%ymm12
+	vmovdqa	96(%rsp),%ymm13
+	vpaddd	%ymm2,%ymm10,%ymm10
+	vpxor	%ymm6,%ymm10,%ymm6
+	vpshufb	%ymm15,%ymm6,%ymm6
+	vpaddd	%ymm3,%ymm11,%ymm11
+	vpxor	%ymm7,%ymm11,%ymm7
+	vpshufb	%ymm15,%ymm7,%ymm7
+	vpaddd	%ymm6,%ymm12,%ymm12
+	vpxor	%ymm2,%ymm12,%ymm2
+	vpslld	$12,%ymm2,%ymm14
+	vpsrld	$20,%ymm2,%ymm2
+	vpor	%ymm2,%ymm14,%ymm2
+	vbroadcasti128	(%r11),%ymm14
+	vpaddd	%ymm7,%ymm13,%ymm13
+	vpxor	%ymm3,%ymm13,%ymm3
+	vpslld	$12,%ymm3,%ymm15
+	vpsrld	$20,%ymm3,%ymm3
+	vpor	%ymm3,%ymm15,%ymm3
+	vpaddd	%ymm2,%ymm10,%ymm10
+	vpxor	%ymm6,%ymm10,%ymm6
+	vpshufb	%ymm14,%ymm6,%ymm6
+	vpaddd	%ymm3,%ymm11,%ymm11
+	vpxor	%ymm7,%ymm11,%ymm7
+	vpshufb	%ymm14,%ymm7,%ymm7
+	vpaddd	%ymm6,%ymm12,%ymm12
+	vpxor	%ymm2,%ymm12,%ymm2
+	vpslld	$7,%ymm2,%ymm15
+	vpsrld	$25,%ymm2,%ymm2
+	vpor	%ymm2,%ymm15,%ymm2
+	vbroadcasti128	(%r9),%ymm15
+	vpaddd	%ymm7,%ymm13,%ymm13
+	vpxor	%ymm3,%ymm13,%ymm3
+	vpslld	$7,%ymm3,%ymm14
+	vpsrld	$25,%ymm3,%ymm3
+	vpor	%ymm3,%ymm14,%ymm3
+	vpaddd	%ymm1,%ymm8,%ymm8
+	vpxor	%ymm7,%ymm8,%ymm7
+	vpshufb	%ymm15,%ymm7,%ymm7
+	vpaddd	%ymm2,%ymm9,%ymm9
+	vpxor	%ymm4,%ymm9,%ymm4
+	vpshufb	%ymm15,%ymm4,%ymm4
+	vpaddd	%ymm7,%ymm12,%ymm12
+	vpxor	%ymm1,%ymm12,%ymm1
+	vpslld	$12,%ymm1,%ymm14
+	vpsrld	$20,%ymm1,%ymm1
+	vpor	%ymm1,%ymm14,%ymm1
+	vbroadcasti128	(%r11),%ymm14
+	vpaddd	%ymm4,%ymm13,%ymm13
+	vpxor	%ymm2,%ymm13,%ymm2
+	vpslld	$12,%ymm2,%ymm15
+	vpsrld	$20,%ymm2,%ymm2
+	vpor	%ymm2,%ymm15,%ymm2
+	vpaddd	%ymm1,%ymm8,%ymm8
+	vpxor	%ymm7,%ymm8,%ymm7
+	vpshufb	%ymm14,%ymm7,%ymm7
+	vpaddd	%ymm2,%ymm9,%ymm9
+	vpxor	%ymm4,%ymm9,%ymm4
+	vpshufb	%ymm14,%ymm4,%ymm4
+	vpaddd	%ymm7,%ymm12,%ymm12
+	vpxor	%ymm1,%ymm12,%ymm1
+	vpslld	$7,%ymm1,%ymm15
+	vpsrld	$25,%ymm1,%ymm1
+	vpor	%ymm1,%ymm15,%ymm1
+	vbroadcasti128	(%r9),%ymm15
+	vpaddd	%ymm4,%ymm13,%ymm13
+	vpxor	%ymm2,%ymm13,%ymm2
+	vpslld	$7,%ymm2,%ymm14
+	vpsrld	$25,%ymm2,%ymm2
+	vpor	%ymm2,%ymm14,%ymm2
+	vmovdqa	%ymm12,64(%rsp)
+	vmovdqa	%ymm13,96(%rsp)
+	vmovdqa	0(%rsp),%ymm12
+	vmovdqa	32(%rsp),%ymm13
+	vpaddd	%ymm3,%ymm10,%ymm10
+	vpxor	%ymm5,%ymm10,%ymm5
+	vpshufb	%ymm15,%ymm5,%ymm5
+	vpaddd	%ymm0,%ymm11,%ymm11
+	vpxor	%ymm6,%ymm11,%ymm6
+	vpshufb	%ymm15,%ymm6,%ymm6
+	vpaddd	%ymm5,%ymm12,%ymm12
+	vpxor	%ymm3,%ymm12,%ymm3
+	vpslld	$12,%ymm3,%ymm14
+	vpsrld	$20,%ymm3,%ymm3
+	vpor	%ymm3,%ymm14,%ymm3
+	vbroadcasti128	(%r11),%ymm14
+	vpaddd	%ymm6,%ymm13,%ymm13
+	vpxor	%ymm0,%ymm13,%ymm0
+	vpslld	$12,%ymm0,%ymm15
+	vpsrld	$20,%ymm0,%ymm0
+	vpor	%ymm0,%ymm15,%ymm0
+	vpaddd	%ymm3,%ymm10,%ymm10
+	vpxor	%ymm5,%ymm10,%ymm5
+	vpshufb	%ymm14,%ymm5,%ymm5
+	vpaddd	%ymm0,%ymm11,%ymm11
+	vpxor	%ymm6,%ymm11,%ymm6
+	vpshufb	%ymm14,%ymm6,%ymm6
+	vpaddd	%ymm5,%ymm12,%ymm12
+	vpxor	%ymm3,%ymm12,%ymm3
+	vpslld	$7,%ymm3,%ymm15
+	vpsrld	$25,%ymm3,%ymm3
+	vpor	%ymm3,%ymm15,%ymm3
+	vbroadcasti128	(%r9),%ymm15
+	vpaddd	%ymm6,%ymm13,%ymm13
+	vpxor	%ymm0,%ymm13,%ymm0
+	vpslld	$7,%ymm0,%ymm14
+	vpsrld	$25,%ymm0,%ymm0
+	vpor	%ymm0,%ymm14,%ymm0
+	decl	%eax
+	jnz	.Loop8x
+
+	leaq	512(%rsp),%rax
+	vpaddd	128-256(%rcx),%ymm8,%ymm8
+	vpaddd	160-256(%rcx),%ymm9,%ymm9
+	vpaddd	192-256(%rcx),%ymm10,%ymm10
+	vpaddd	224-256(%rcx),%ymm11,%ymm11
+
+	vpunpckldq	%ymm9,%ymm8,%ymm14
+	vpunpckldq	%ymm11,%ymm10,%ymm15
+	vpunpckhdq	%ymm9,%ymm8,%ymm8
+	vpunpckhdq	%ymm11,%ymm10,%ymm10
+	vpunpcklqdq	%ymm15,%ymm14,%ymm9
+	vpunpckhqdq	%ymm15,%ymm14,%ymm14
+	vpunpcklqdq	%ymm10,%ymm8,%ymm11
+	vpunpckhqdq	%ymm10,%ymm8,%ymm8
+	vpaddd	256-256(%rcx),%ymm0,%ymm0
+	vpaddd	288-256(%rcx),%ymm1,%ymm1
+	vpaddd	320-256(%rcx),%ymm2,%ymm2
+	vpaddd	352-256(%rcx),%ymm3,%ymm3
+
+	vpunpckldq	%ymm1,%ymm0,%ymm10
+	vpunpckldq	%ymm3,%ymm2,%ymm15
+	vpunpckhdq	%ymm1,%ymm0,%ymm0
+	vpunpckhdq	%ymm3,%ymm2,%ymm2
+	vpunpcklqdq	%ymm15,%ymm10,%ymm1
+	vpunpckhqdq	%ymm15,%ymm10,%ymm10
+	vpunpcklqdq	%ymm2,%ymm0,%ymm3
+	vpunpckhqdq	%ymm2,%ymm0,%ymm0
+	vperm2i128	$0x20,%ymm1,%ymm9,%ymm15
+	vperm2i128	$0x31,%ymm1,%ymm9,%ymm1
+	vperm2i128	$0x20,%ymm10,%ymm14,%ymm9
+	vperm2i128	$0x31,%ymm10,%ymm14,%ymm10
+	vperm2i128	$0x20,%ymm3,%ymm11,%ymm14
+	vperm2i128	$0x31,%ymm3,%ymm11,%ymm3
+	vperm2i128	$0x20,%ymm0,%ymm8,%ymm11
+	vperm2i128	$0x31,%ymm0,%ymm8,%ymm0
+	vmovdqa	%ymm15,0(%rsp)
+	vmovdqa	%ymm9,32(%rsp)
+	vmovdqa	64(%rsp),%ymm15
+	vmovdqa	96(%rsp),%ymm9
+
+	vpaddd	384-512(%rax),%ymm12,%ymm12
+	vpaddd	416-512(%rax),%ymm13,%ymm13
+	vpaddd	448-512(%rax),%ymm15,%ymm15
+	vpaddd	480-512(%rax),%ymm9,%ymm9
+
+	vpunpckldq	%ymm13,%ymm12,%ymm2
+	vpunpckldq	%ymm9,%ymm15,%ymm8
+	vpunpckhdq	%ymm13,%ymm12,%ymm12
+	vpunpckhdq	%ymm9,%ymm15,%ymm15
+	vpunpcklqdq	%ymm8,%ymm2,%ymm13
+	vpunpckhqdq	%ymm8,%ymm2,%ymm2
+	vpunpcklqdq	%ymm15,%ymm12,%ymm9
+	vpunpckhqdq	%ymm15,%ymm12,%ymm12
+	vpaddd	512-512(%rax),%ymm4,%ymm4
+	vpaddd	544-512(%rax),%ymm5,%ymm5
+	vpaddd	576-512(%rax),%ymm6,%ymm6
+	vpaddd	608-512(%rax),%ymm7,%ymm7
+
+	vpunpckldq	%ymm5,%ymm4,%ymm15
+	vpunpckldq	%ymm7,%ymm6,%ymm8
+	vpunpckhdq	%ymm5,%ymm4,%ymm4
+	vpunpckhdq	%ymm7,%ymm6,%ymm6
+	vpunpcklqdq	%ymm8,%ymm15,%ymm5
+	vpunpckhqdq	%ymm8,%ymm15,%ymm15
+	vpunpcklqdq	%ymm6,%ymm4,%ymm7
+	vpunpckhqdq	%ymm6,%ymm4,%ymm4
+	vperm2i128	$0x20,%ymm5,%ymm13,%ymm8
+	vperm2i128	$0x31,%ymm5,%ymm13,%ymm5
+	vperm2i128	$0x20,%ymm15,%ymm2,%ymm13
+	vperm2i128	$0x31,%ymm15,%ymm2,%ymm15
+	vperm2i128	$0x20,%ymm7,%ymm9,%ymm2
+	vperm2i128	$0x31,%ymm7,%ymm9,%ymm7
+	vperm2i128	$0x20,%ymm4,%ymm12,%ymm9
+	vperm2i128	$0x31,%ymm4,%ymm12,%ymm4
+	vmovdqa	0(%rsp),%ymm6
+	vmovdqa	32(%rsp),%ymm12
+
+	cmpq	$512,%rdx
+	jb	.Ltail8x
+
+	vpxor	0(%rsi),%ymm6,%ymm6
+	vpxor	32(%rsi),%ymm8,%ymm8
+	vpxor	64(%rsi),%ymm1,%ymm1
+	vpxor	96(%rsi),%ymm5,%ymm5
+	leaq	128(%rsi),%rsi
+	vmovdqu	%ymm6,0(%rdi)
+	vmovdqu	%ymm8,32(%rdi)
+	vmovdqu	%ymm1,64(%rdi)
+	vmovdqu	%ymm5,96(%rdi)
+	leaq	128(%rdi),%rdi
+
+	vpxor	0(%rsi),%ymm12,%ymm12
+	vpxor	32(%rsi),%ymm13,%ymm13
+	vpxor	64(%rsi),%ymm10,%ymm10
+	vpxor	96(%rsi),%ymm15,%ymm15
+	leaq	128(%rsi),%rsi
+	vmovdqu	%ymm12,0(%rdi)
+	vmovdqu	%ymm13,32(%rdi)
+	vmovdqu	%ymm10,64(%rdi)
+	vmovdqu	%ymm15,96(%rdi)
+	leaq	128(%rdi),%rdi
+
+	vpxor	0(%rsi),%ymm14,%ymm14
+	vpxor	32(%rsi),%ymm2,%ymm2
+	vpxor	64(%rsi),%ymm3,%ymm3
+	vpxor	96(%rsi),%ymm7,%ymm7
+	leaq	128(%rsi),%rsi
+	vmovdqu	%ymm14,0(%rdi)
+	vmovdqu	%ymm2,32(%rdi)
+	vmovdqu	%ymm3,64(%rdi)
+	vmovdqu	%ymm7,96(%rdi)
+	leaq	128(%rdi),%rdi
+
+	vpxor	0(%rsi),%ymm11,%ymm11
+	vpxor	32(%rsi),%ymm9,%ymm9
+	vpxor	64(%rsi),%ymm0,%ymm0
+	vpxor	96(%rsi),%ymm4,%ymm4
+	leaq	128(%rsi),%rsi
+	vmovdqu	%ymm11,0(%rdi)
+	vmovdqu	%ymm9,32(%rdi)
+	vmovdqu	%ymm0,64(%rdi)
+	vmovdqu	%ymm4,96(%rdi)
+	leaq	128(%rdi),%rdi
+
+	subq	$512,%rdx
+	jnz	.Loop_outer8x
+
+	jmp	.Ldone8x
+
+.Ltail8x:
+	cmpq	$448,%rdx
+	jae	.L448_or_more8x
+	cmpq	$384,%rdx
+	jae	.L384_or_more8x
+	cmpq	$320,%rdx
+	jae	.L320_or_more8x
+	cmpq	$256,%rdx
+	jae	.L256_or_more8x
+	cmpq	$192,%rdx
+	jae	.L192_or_more8x
+	cmpq	$128,%rdx
+	jae	.L128_or_more8x
+	cmpq	$64,%rdx
+	jae	.L64_or_more8x
+
+	xorq	%r9,%r9
+	vmovdqa	%ymm6,0(%rsp)
+	vmovdqa	%ymm8,32(%rsp)
+	jmp	.Loop_tail8x
+
+.align	32
+.L64_or_more8x:
+	vpxor	0(%rsi),%ymm6,%ymm6
+	vpxor	32(%rsi),%ymm8,%ymm8
+	vmovdqu	%ymm6,0(%rdi)
+	vmovdqu	%ymm8,32(%rdi)
+	je	.Ldone8x
+
+	leaq	64(%rsi),%rsi
+	xorq	%r9,%r9
+	vmovdqa	%ymm1,0(%rsp)
+	leaq	64(%rdi),%rdi
+	subq	$64,%rdx
+	vmovdqa	%ymm5,32(%rsp)
+	jmp	.Loop_tail8x
+
+.align	32
+.L128_or_more8x:
+	vpxor	0(%rsi),%ymm6,%ymm6
+	vpxor	32(%rsi),%ymm8,%ymm8
+	vpxor	64(%rsi),%ymm1,%ymm1
+	vpxor	96(%rsi),%ymm5,%ymm5
+	vmovdqu	%ymm6,0(%rdi)
+	vmovdqu	%ymm8,32(%rdi)
+	vmovdqu	%ymm1,64(%rdi)
+	vmovdqu	%ymm5,96(%rdi)
+	je	.Ldone8x
+
+	leaq	128(%rsi),%rsi
+	xorq	%r9,%r9
+	vmovdqa	%ymm12,0(%rsp)
+	leaq	128(%rdi),%rdi
+	subq	$128,%rdx
+	vmovdqa	%ymm13,32(%rsp)
+	jmp	.Loop_tail8x
+
+.align	32
+.L192_or_more8x:
+	vpxor	0(%rsi),%ymm6,%ymm6
+	vpxor	32(%rsi),%ymm8,%ymm8
+	vpxor	64(%rsi),%ymm1,%ymm1
+	vpxor	96(%rsi),%ymm5,%ymm5
+	vpxor	128(%rsi),%ymm12,%ymm12
+	vpxor	160(%rsi),%ymm13,%ymm13
+	vmovdqu	%ymm6,0(%rdi)
+	vmovdqu	%ymm8,32(%rdi)
+	vmovdqu	%ymm1,64(%rdi)
+	vmovdqu	%ymm5,96(%rdi)
+	vmovdqu	%ymm12,128(%rdi)
+	vmovdqu	%ymm13,160(%rdi)
+	je	.Ldone8x
+
+	leaq	192(%rsi),%rsi
+	xorq	%r9,%r9
+	vmovdqa	%ymm10,0(%rsp)
+	leaq	192(%rdi),%rdi
+	subq	$192,%rdx
+	vmovdqa	%ymm15,32(%rsp)
+	jmp	.Loop_tail8x
+
+.align	32
+.L256_or_more8x:
+	vpxor	0(%rsi),%ymm6,%ymm6
+	vpxor	32(%rsi),%ymm8,%ymm8
+	vpxor	64(%rsi),%ymm1,%ymm1
+	vpxor	96(%rsi),%ymm5,%ymm5
+	vpxor	128(%rsi),%ymm12,%ymm12
+	vpxor	160(%rsi),%ymm13,%ymm13
+	vpxor	192(%rsi),%ymm10,%ymm10
+	vpxor	224(%rsi),%ymm15,%ymm15
+	vmovdqu	%ymm6,0(%rdi)
+	vmovdqu	%ymm8,32(%rdi)
+	vmovdqu	%ymm1,64(%rdi)
+	vmovdqu	%ymm5,96(%rdi)
+	vmovdqu	%ymm12,128(%rdi)
+	vmovdqu	%ymm13,160(%rdi)
+	vmovdqu	%ymm10,192(%rdi)
+	vmovdqu	%ymm15,224(%rdi)
+	je	.Ldone8x
+
+	leaq	256(%rsi),%rsi
+	xorq	%r9,%r9
+	vmovdqa	%ymm14,0(%rsp)
+	leaq	256(%rdi),%rdi
+	subq	$256,%rdx
+	vmovdqa	%ymm2,32(%rsp)
+	jmp	.Loop_tail8x
+
+.align	32
+.L320_or_more8x:
+	vpxor	0(%rsi),%ymm6,%ymm6
+	vpxor	32(%rsi),%ymm8,%ymm8
+	vpxor	64(%rsi),%ymm1,%ymm1
+	vpxor	96(%rsi),%ymm5,%ymm5
+	vpxor	128(%rsi),%ymm12,%ymm12
+	vpxor	160(%rsi),%ymm13,%ymm13
+	vpxor	192(%rsi),%ymm10,%ymm10
+	vpxor	224(%rsi),%ymm15,%ymm15
+	vpxor	256(%rsi),%ymm14,%ymm14
+	vpxor	288(%rsi),%ymm2,%ymm2
+	vmovdqu	%ymm6,0(%rdi)
+	vmovdqu	%ymm8,32(%rdi)
+	vmovdqu	%ymm1,64(%rdi)
+	vmovdqu	%ymm5,96(%rdi)
+	vmovdqu	%ymm12,128(%rdi)
+	vmovdqu	%ymm13,160(%rdi)
+	vmovdqu	%ymm10,192(%rdi)
+	vmovdqu	%ymm15,224(%rdi)
+	vmovdqu	%ymm14,256(%rdi)
+	vmovdqu	%ymm2,288(%rdi)
+	je	.Ldone8x
+
+	leaq	320(%rsi),%rsi
+	xorq	%r9,%r9
+	vmovdqa	%ymm3,0(%rsp)
+	leaq	320(%rdi),%rdi
+	subq	$320,%rdx
+	vmovdqa	%ymm7,32(%rsp)
+	jmp	.Loop_tail8x
+
+.align	32
+.L384_or_more8x:
+	vpxor	0(%rsi),%ymm6,%ymm6
+	vpxor	32(%rsi),%ymm8,%ymm8
+	vpxor	64(%rsi),%ymm1,%ymm1
+	vpxor	96(%rsi),%ymm5,%ymm5
+	vpxor	128(%rsi),%ymm12,%ymm12
+	vpxor	160(%rsi),%ymm13,%ymm13
+	vpxor	192(%rsi),%ymm10,%ymm10
+	vpxor	224(%rsi),%ymm15,%ymm15
+	vpxor	256(%rsi),%ymm14,%ymm14
+	vpxor	288(%rsi),%ymm2,%ymm2
+	vpxor	320(%rsi),%ymm3,%ymm3
+	vpxor	352(%rsi),%ymm7,%ymm7
+	vmovdqu	%ymm6,0(%rdi)
+	vmovdqu	%ymm8,32(%rdi)
+	vmovdqu	%ymm1,64(%rdi)
+	vmovdqu	%ymm5,96(%rdi)
+	vmovdqu	%ymm12,128(%rdi)
+	vmovdqu	%ymm13,160(%rdi)
+	vmovdqu	%ymm10,192(%rdi)
+	vmovdqu	%ymm15,224(%rdi)
+	vmovdqu	%ymm14,256(%rdi)
+	vmovdqu	%ymm2,288(%rdi)
+	vmovdqu	%ymm3,320(%rdi)
+	vmovdqu	%ymm7,352(%rdi)
+	je	.Ldone8x
+
+	leaq	384(%rsi),%rsi
+	xorq	%r9,%r9
+	vmovdqa	%ymm11,0(%rsp)
+	leaq	384(%rdi),%rdi
+	subq	$384,%rdx
+	vmovdqa	%ymm9,32(%rsp)
+	jmp	.Loop_tail8x
+
+.align	32
+.L448_or_more8x:
+	vpxor	0(%rsi),%ymm6,%ymm6
+	vpxor	32(%rsi),%ymm8,%ymm8
+	vpxor	64(%rsi),%ymm1,%ymm1
+	vpxor	96(%rsi),%ymm5,%ymm5
+	vpxor	128(%rsi),%ymm12,%ymm12
+	vpxor	160(%rsi),%ymm13,%ymm13
+	vpxor	192(%rsi),%ymm10,%ymm10
+	vpxor	224(%rsi),%ymm15,%ymm15
+	vpxor	256(%rsi),%ymm14,%ymm14
+	vpxor	288(%rsi),%ymm2,%ymm2
+	vpxor	320(%rsi),%ymm3,%ymm3
+	vpxor	352(%rsi),%ymm7,%ymm7
+	vpxor	384(%rsi),%ymm11,%ymm11
+	vpxor	416(%rsi),%ymm9,%ymm9
+	vmovdqu	%ymm6,0(%rdi)
+	vmovdqu	%ymm8,32(%rdi)
+	vmovdqu	%ymm1,64(%rdi)
+	vmovdqu	%ymm5,96(%rdi)
+	vmovdqu	%ymm12,128(%rdi)
+	vmovdqu	%ymm13,160(%rdi)
+	vmovdqu	%ymm10,192(%rdi)
+	vmovdqu	%ymm15,224(%rdi)
+	vmovdqu	%ymm14,256(%rdi)
+	vmovdqu	%ymm2,288(%rdi)
+	vmovdqu	%ymm3,320(%rdi)
+	vmovdqu	%ymm7,352(%rdi)
+	vmovdqu	%ymm11,384(%rdi)
+	vmovdqu	%ymm9,416(%rdi)
+	je	.Ldone8x
+
+	leaq	448(%rsi),%rsi
+	xorq	%r9,%r9
+	vmovdqa	%ymm0,0(%rsp)
+	leaq	448(%rdi),%rdi
+	subq	$448,%rdx
+	vmovdqa	%ymm4,32(%rsp)
+
+.Loop_tail8x:
+	movzbl	(%rsi,%r9,1),%eax
+	movzbl	(%rsp,%r9,1),%ecx
+	leaq	1(%r9),%r9
+	xorl	%ecx,%eax
+	movb	%al,-1(%rdi,%r9,1)
+	decq	%rdx
+	jnz	.Loop_tail8x
+
+.Ldone8x:
+	vzeroall
+	leaq	-8(%r10),%rsp
+
+.L8x_epilogue:
+	ret
+ENDPROC(chacha20_avx2)
+#endif /* CONFIG_AS_AVX2 */
+
+#ifdef CONFIG_AS_AVX512
+.align	32
+ENTRY(chacha20_avx512)
+.Lchacha20_avx512:
+	cmpq	$0,%rdx
+	je	.Lavx512_epilogue
+	leaq	8(%rsp),%r10
+
+	cmpq	$512,%rdx
+	ja	.Lchacha20_16x
+
+	subq	$64+8,%rsp
+	andq	$-64,%rsp
+	vbroadcasti32x4	.Lsigma(%rip),%zmm0
+	vbroadcasti32x4	(%rcx),%zmm1
+	vbroadcasti32x4	16(%rcx),%zmm2
+	vbroadcasti32x4	(%r8),%zmm3
+
+	vmovdqa32	%zmm0,%zmm16
+	vmovdqa32	%zmm1,%zmm17
+	vmovdqa32	%zmm2,%zmm18
+	vpaddd	.Lzeroz(%rip),%zmm3,%zmm3
+	vmovdqa32	.Lfourz(%rip),%zmm20
+	movq	$10,%r8
+	vmovdqa32	%zmm3,%zmm19
+	jmp	.Loop_avx512
+
+.align	16
+.Loop_outer_avx512:
+	vmovdqa32	%zmm16,%zmm0
+	vmovdqa32	%zmm17,%zmm1
+	vmovdqa32	%zmm18,%zmm2
+	vpaddd	%zmm20,%zmm19,%zmm3
+	movq	$10,%r8
+	vmovdqa32	%zmm3,%zmm19
+	jmp	.Loop_avx512
+
+.align	32
+.Loop_avx512:
+	vpaddd	%zmm1,%zmm0,%zmm0
+	vpxord	%zmm0,%zmm3,%zmm3
+	vprold	$16,%zmm3,%zmm3
+	vpaddd	%zmm3,%zmm2,%zmm2
+	vpxord	%zmm2,%zmm1,%zmm1
+	vprold	$12,%zmm1,%zmm1
+	vpaddd	%zmm1,%zmm0,%zmm0
+	vpxord	%zmm0,%zmm3,%zmm3
+	vprold	$8,%zmm3,%zmm3
+	vpaddd	%zmm3,%zmm2,%zmm2
+	vpxord	%zmm2,%zmm1,%zmm1
+	vprold	$7,%zmm1,%zmm1
+	vpshufd	$78,%zmm2,%zmm2
+	vpshufd	$57,%zmm1,%zmm1
+	vpshufd	$147,%zmm3,%zmm3
+	vpaddd	%zmm1,%zmm0,%zmm0
+	vpxord	%zmm0,%zmm3,%zmm3
+	vprold	$16,%zmm3,%zmm3
+	vpaddd	%zmm3,%zmm2,%zmm2
+	vpxord	%zmm2,%zmm1,%zmm1
+	vprold	$12,%zmm1,%zmm1
+	vpaddd	%zmm1,%zmm0,%zmm0
+	vpxord	%zmm0,%zmm3,%zmm3
+	vprold	$8,%zmm3,%zmm3
+	vpaddd	%zmm3,%zmm2,%zmm2
+	vpxord	%zmm2,%zmm1,%zmm1
+	vprold	$7,%zmm1,%zmm1
+	vpshufd	$78,%zmm2,%zmm2
+	vpshufd	$147,%zmm1,%zmm1
+	vpshufd	$57,%zmm3,%zmm3
+	decq	%r8
+	jnz	.Loop_avx512
+	vpaddd	%zmm16,%zmm0,%zmm0
+	vpaddd	%zmm17,%zmm1,%zmm1
+	vpaddd	%zmm18,%zmm2,%zmm2
+	vpaddd	%zmm19,%zmm3,%zmm3
+
+	subq	$64,%rdx
+	jb	.Ltail64_avx512
+
+	vpxor	0(%rsi),%xmm0,%xmm4
+	vpxor	16(%rsi),%xmm1,%xmm5
+	vpxor	32(%rsi),%xmm2,%xmm6
+	vpxor	48(%rsi),%xmm3,%xmm7
+	leaq	64(%rsi),%rsi
+
+	vmovdqu	%xmm4,0(%rdi)
+	vmovdqu	%xmm5,16(%rdi)
+	vmovdqu	%xmm6,32(%rdi)
+	vmovdqu	%xmm7,48(%rdi)
+	leaq	64(%rdi),%rdi
+
+	jz	.Ldone_avx512
+
+	vextracti32x4	$1,%zmm0,%xmm4
+	vextracti32x4	$1,%zmm1,%xmm5
+	vextracti32x4	$1,%zmm2,%xmm6
+	vextracti32x4	$1,%zmm3,%xmm7
+
+	subq	$64,%rdx
+	jb	.Ltail_avx512
+
+	vpxor	0(%rsi),%xmm4,%xmm4
+	vpxor	16(%rsi),%xmm5,%xmm5
+	vpxor	32(%rsi),%xmm6,%xmm6
+	vpxor	48(%rsi),%xmm7,%xmm7
+	leaq	64(%rsi),%rsi
+
+	vmovdqu	%xmm4,0(%rdi)
+	vmovdqu	%xmm5,16(%rdi)
+	vmovdqu	%xmm6,32(%rdi)
+	vmovdqu	%xmm7,48(%rdi)
+	leaq	64(%rdi),%rdi
+
+	jz	.Ldone_avx512
+
+	vextracti32x4	$2,%zmm0,%xmm4
+	vextracti32x4	$2,%zmm1,%xmm5
+	vextracti32x4	$2,%zmm2,%xmm6
+	vextracti32x4	$2,%zmm3,%xmm7
+
+	subq	$64,%rdx
+	jb	.Ltail_avx512
+
+	vpxor	0(%rsi),%xmm4,%xmm4
+	vpxor	16(%rsi),%xmm5,%xmm5
+	vpxor	32(%rsi),%xmm6,%xmm6
+	vpxor	48(%rsi),%xmm7,%xmm7
+	leaq	64(%rsi),%rsi
+
+	vmovdqu	%xmm4,0(%rdi)
+	vmovdqu	%xmm5,16(%rdi)
+	vmovdqu	%xmm6,32(%rdi)
+	vmovdqu	%xmm7,48(%rdi)
+	leaq	64(%rdi),%rdi
+
+	jz	.Ldone_avx512
+
+	vextracti32x4	$3,%zmm0,%xmm4
+	vextracti32x4	$3,%zmm1,%xmm5
+	vextracti32x4	$3,%zmm2,%xmm6
+	vextracti32x4	$3,%zmm3,%xmm7
+
+	subq	$64,%rdx
+	jb	.Ltail_avx512
+
+	vpxor	0(%rsi),%xmm4,%xmm4
+	vpxor	16(%rsi),%xmm5,%xmm5
+	vpxor	32(%rsi),%xmm6,%xmm6
+	vpxor	48(%rsi),%xmm7,%xmm7
+	leaq	64(%rsi),%rsi
+
+	vmovdqu	%xmm4,0(%rdi)
+	vmovdqu	%xmm5,16(%rdi)
+	vmovdqu	%xmm6,32(%rdi)
+	vmovdqu	%xmm7,48(%rdi)
+	leaq	64(%rdi),%rdi
+
+	jnz	.Loop_outer_avx512
+
+	jmp	.Ldone_avx512
+
+.align	16
+.Ltail64_avx512:
+	vmovdqa	%xmm0,0(%rsp)
+	vmovdqa	%xmm1,16(%rsp)
+	vmovdqa	%xmm2,32(%rsp)
+	vmovdqa	%xmm3,48(%rsp)
+	addq	$64,%rdx
+	jmp	.Loop_tail_avx512
+
+.align	16
+.Ltail_avx512:
+	vmovdqa	%xmm4,0(%rsp)
+	vmovdqa	%xmm5,16(%rsp)
+	vmovdqa	%xmm6,32(%rsp)
+	vmovdqa	%xmm7,48(%rsp)
+	addq	$64,%rdx
+
+.Loop_tail_avx512:
+	movzbl	(%rsi,%r8,1),%eax
+	movzbl	(%rsp,%r8,1),%ecx
+	leaq	1(%r8),%r8
+	xorl	%ecx,%eax
+	movb	%al,-1(%rdi,%r8,1)
+	decq	%rdx
+	jnz	.Loop_tail_avx512
+
+	vmovdqa32	%zmm16,0(%rsp)
+
+.Ldone_avx512:
+	vzeroall
+	leaq	-8(%r10),%rsp
+
+.Lavx512_epilogue:
+	ret
+
+.align	32
+.Lchacha20_16x:
+	leaq	8(%rsp),%r10
+
+	subq	$64+8,%rsp
+	andq	$-64,%rsp
+	vzeroupper
+
+	leaq	.Lsigma(%rip),%r9
+	vbroadcasti32x4	(%r9),%zmm3
+	vbroadcasti32x4	(%rcx),%zmm7
+	vbroadcasti32x4	16(%rcx),%zmm11
+	vbroadcasti32x4	(%r8),%zmm15
+
+	vpshufd	$0x00,%zmm3,%zmm0
+	vpshufd	$0x55,%zmm3,%zmm1
+	vpshufd	$0xaa,%zmm3,%zmm2
+	vpshufd	$0xff,%zmm3,%zmm3
+	vmovdqa64	%zmm0,%zmm16
+	vmovdqa64	%zmm1,%zmm17
+	vmovdqa64	%zmm2,%zmm18
+	vmovdqa64	%zmm3,%zmm19
+
+	vpshufd	$0x00,%zmm7,%zmm4
+	vpshufd	$0x55,%zmm7,%zmm5
+	vpshufd	$0xaa,%zmm7,%zmm6
+	vpshufd	$0xff,%zmm7,%zmm7
+	vmovdqa64	%zmm4,%zmm20
+	vmovdqa64	%zmm5,%zmm21
+	vmovdqa64	%zmm6,%zmm22
+	vmovdqa64	%zmm7,%zmm23
+
+	vpshufd	$0x00,%zmm11,%zmm8
+	vpshufd	$0x55,%zmm11,%zmm9
+	vpshufd	$0xaa,%zmm11,%zmm10
+	vpshufd	$0xff,%zmm11,%zmm11
+	vmovdqa64	%zmm8,%zmm24
+	vmovdqa64	%zmm9,%zmm25
+	vmovdqa64	%zmm10,%zmm26
+	vmovdqa64	%zmm11,%zmm27
+
+	vpshufd	$0x00,%zmm15,%zmm12
+	vpshufd	$0x55,%zmm15,%zmm13
+	vpshufd	$0xaa,%zmm15,%zmm14
+	vpshufd	$0xff,%zmm15,%zmm15
+	vpaddd	.Lincz(%rip),%zmm12,%zmm12
+	vmovdqa64	%zmm12,%zmm28
+	vmovdqa64	%zmm13,%zmm29
+	vmovdqa64	%zmm14,%zmm30
+	vmovdqa64	%zmm15,%zmm31
+
+	movl	$10,%eax
+	jmp	.Loop16x
+
+.align	32
+.Loop_outer16x:
+	vpbroadcastd	0(%r9),%zmm0
+	vpbroadcastd	4(%r9),%zmm1
+	vpbroadcastd	8(%r9),%zmm2
+	vpbroadcastd	12(%r9),%zmm3
+	vpaddd	.Lsixteen(%rip),%zmm28,%zmm28
+	vmovdqa64	%zmm20,%zmm4
+	vmovdqa64	%zmm21,%zmm5
+	vmovdqa64	%zmm22,%zmm6
+	vmovdqa64	%zmm23,%zmm7
+	vmovdqa64	%zmm24,%zmm8
+	vmovdqa64	%zmm25,%zmm9
+	vmovdqa64	%zmm26,%zmm10
+	vmovdqa64	%zmm27,%zmm11
+	vmovdqa64	%zmm28,%zmm12
+	vmovdqa64	%zmm29,%zmm13
+	vmovdqa64	%zmm30,%zmm14
+	vmovdqa64	%zmm31,%zmm15
+
+	vmovdqa64	%zmm0,%zmm16
+	vmovdqa64	%zmm1,%zmm17
+	vmovdqa64	%zmm2,%zmm18
+	vmovdqa64	%zmm3,%zmm19
+
+	movl	$10,%eax
+	jmp	.Loop16x
+
+.align	32
+.Loop16x:
+	vpaddd	%zmm4,%zmm0,%zmm0
+	vpaddd	%zmm5,%zmm1,%zmm1
+	vpaddd	%zmm6,%zmm2,%zmm2
+	vpaddd	%zmm7,%zmm3,%zmm3
+	vpxord	%zmm0,%zmm12,%zmm12
+	vpxord	%zmm1,%zmm13,%zmm13
+	vpxord	%zmm2,%zmm14,%zmm14
+	vpxord	%zmm3,%zmm15,%zmm15
+	vprold	$16,%zmm12,%zmm12
+	vprold	$16,%zmm13,%zmm13
+	vprold	$16,%zmm14,%zmm14
+	vprold	$16,%zmm15,%zmm15
+	vpaddd	%zmm12,%zmm8,%zmm8
+	vpaddd	%zmm13,%zmm9,%zmm9
+	vpaddd	%zmm14,%zmm10,%zmm10
+	vpaddd	%zmm15,%zmm11,%zmm11
+	vpxord	%zmm8,%zmm4,%zmm4
+	vpxord	%zmm9,%zmm5,%zmm5
+	vpxord	%zmm10,%zmm6,%zmm6
+	vpxord	%zmm11,%zmm7,%zmm7
+	vprold	$12,%zmm4,%zmm4
+	vprold	$12,%zmm5,%zmm5
+	vprold	$12,%zmm6,%zmm6
+	vprold	$12,%zmm7,%zmm7
+	vpaddd	%zmm4,%zmm0,%zmm0
+	vpaddd	%zmm5,%zmm1,%zmm1
+	vpaddd	%zmm6,%zmm2,%zmm2
+	vpaddd	%zmm7,%zmm3,%zmm3
+	vpxord	%zmm0,%zmm12,%zmm12
+	vpxord	%zmm1,%zmm13,%zmm13
+	vpxord	%zmm2,%zmm14,%zmm14
+	vpxord	%zmm3,%zmm15,%zmm15
+	vprold	$8,%zmm12,%zmm12
+	vprold	$8,%zmm13,%zmm13
+	vprold	$8,%zmm14,%zmm14
+	vprold	$8,%zmm15,%zmm15
+	vpaddd	%zmm12,%zmm8,%zmm8
+	vpaddd	%zmm13,%zmm9,%zmm9
+	vpaddd	%zmm14,%zmm10,%zmm10
+	vpaddd	%zmm15,%zmm11,%zmm11
+	vpxord	%zmm8,%zmm4,%zmm4
+	vpxord	%zmm9,%zmm5,%zmm5
+	vpxord	%zmm10,%zmm6,%zmm6
+	vpxord	%zmm11,%zmm7,%zmm7
+	vprold	$7,%zmm4,%zmm4
+	vprold	$7,%zmm5,%zmm5
+	vprold	$7,%zmm6,%zmm6
+	vprold	$7,%zmm7,%zmm7
+	vpaddd	%zmm5,%zmm0,%zmm0
+	vpaddd	%zmm6,%zmm1,%zmm1
+	vpaddd	%zmm7,%zmm2,%zmm2
+	vpaddd	%zmm4,%zmm3,%zmm3
+	vpxord	%zmm0,%zmm15,%zmm15
+	vpxord	%zmm1,%zmm12,%zmm12
+	vpxord	%zmm2,%zmm13,%zmm13
+	vpxord	%zmm3,%zmm14,%zmm14
+	vprold	$16,%zmm15,%zmm15
+	vprold	$16,%zmm12,%zmm12
+	vprold	$16,%zmm13,%zmm13
+	vprold	$16,%zmm14,%zmm14
+	vpaddd	%zmm15,%zmm10,%zmm10
+	vpaddd	%zmm12,%zmm11,%zmm11
+	vpaddd	%zmm13,%zmm8,%zmm8
+	vpaddd	%zmm14,%zmm9,%zmm9
+	vpxord	%zmm10,%zmm5,%zmm5
+	vpxord	%zmm11,%zmm6,%zmm6
+	vpxord	%zmm8,%zmm7,%zmm7
+	vpxord	%zmm9,%zmm4,%zmm4
+	vprold	$12,%zmm5,%zmm5
+	vprold	$12,%zmm6,%zmm6
+	vprold	$12,%zmm7,%zmm7
+	vprold	$12,%zmm4,%zmm4
+	vpaddd	%zmm5,%zmm0,%zmm0
+	vpaddd	%zmm6,%zmm1,%zmm1
+	vpaddd	%zmm7,%zmm2,%zmm2
+	vpaddd	%zmm4,%zmm3,%zmm3
+	vpxord	%zmm0,%zmm15,%zmm15
+	vpxord	%zmm1,%zmm12,%zmm12
+	vpxord	%zmm2,%zmm13,%zmm13
+	vpxord	%zmm3,%zmm14,%zmm14
+	vprold	$8,%zmm15,%zmm15
+	vprold	$8,%zmm12,%zmm12
+	vprold	$8,%zmm13,%zmm13
+	vprold	$8,%zmm14,%zmm14
+	vpaddd	%zmm15,%zmm10,%zmm10
+	vpaddd	%zmm12,%zmm11,%zmm11
+	vpaddd	%zmm13,%zmm8,%zmm8
+	vpaddd	%zmm14,%zmm9,%zmm9
+	vpxord	%zmm10,%zmm5,%zmm5
+	vpxord	%zmm11,%zmm6,%zmm6
+	vpxord	%zmm8,%zmm7,%zmm7
+	vpxord	%zmm9,%zmm4,%zmm4
+	vprold	$7,%zmm5,%zmm5
+	vprold	$7,%zmm6,%zmm6
+	vprold	$7,%zmm7,%zmm7
+	vprold	$7,%zmm4,%zmm4
+	decl	%eax
+	jnz	.Loop16x
+
+	vpaddd	%zmm16,%zmm0,%zmm0
+	vpaddd	%zmm17,%zmm1,%zmm1
+	vpaddd	%zmm18,%zmm2,%zmm2
+	vpaddd	%zmm19,%zmm3,%zmm3
+
+	vpunpckldq	%zmm1,%zmm0,%zmm18
+	vpunpckldq	%zmm3,%zmm2,%zmm19
+	vpunpckhdq	%zmm1,%zmm0,%zmm0
+	vpunpckhdq	%zmm3,%zmm2,%zmm2
+	vpunpcklqdq	%zmm19,%zmm18,%zmm1
+	vpunpckhqdq	%zmm19,%zmm18,%zmm18
+	vpunpcklqdq	%zmm2,%zmm0,%zmm3
+	vpunpckhqdq	%zmm2,%zmm0,%zmm0
+	vpaddd	%zmm20,%zmm4,%zmm4
+	vpaddd	%zmm21,%zmm5,%zmm5
+	vpaddd	%zmm22,%zmm6,%zmm6
+	vpaddd	%zmm23,%zmm7,%zmm7
+
+	vpunpckldq	%zmm5,%zmm4,%zmm2
+	vpunpckldq	%zmm7,%zmm6,%zmm19
+	vpunpckhdq	%zmm5,%zmm4,%zmm4
+	vpunpckhdq	%zmm7,%zmm6,%zmm6
+	vpunpcklqdq	%zmm19,%zmm2,%zmm5
+	vpunpckhqdq	%zmm19,%zmm2,%zmm2
+	vpunpcklqdq	%zmm6,%zmm4,%zmm7
+	vpunpckhqdq	%zmm6,%zmm4,%zmm4
+	vshufi32x4	$0x44,%zmm5,%zmm1,%zmm19
+	vshufi32x4	$0xee,%zmm5,%zmm1,%zmm5
+	vshufi32x4	$0x44,%zmm2,%zmm18,%zmm1
+	vshufi32x4	$0xee,%zmm2,%zmm18,%zmm2
+	vshufi32x4	$0x44,%zmm7,%zmm3,%zmm18
+	vshufi32x4	$0xee,%zmm7,%zmm3,%zmm7
+	vshufi32x4	$0x44,%zmm4,%zmm0,%zmm3
+	vshufi32x4	$0xee,%zmm4,%zmm0,%zmm4
+	vpaddd	%zmm24,%zmm8,%zmm8
+	vpaddd	%zmm25,%zmm9,%zmm9
+	vpaddd	%zmm26,%zmm10,%zmm10
+	vpaddd	%zmm27,%zmm11,%zmm11
+
+	vpunpckldq	%zmm9,%zmm8,%zmm6
+	vpunpckldq	%zmm11,%zmm10,%zmm0
+	vpunpckhdq	%zmm9,%zmm8,%zmm8
+	vpunpckhdq	%zmm11,%zmm10,%zmm10
+	vpunpcklqdq	%zmm0,%zmm6,%zmm9
+	vpunpckhqdq	%zmm0,%zmm6,%zmm6
+	vpunpcklqdq	%zmm10,%zmm8,%zmm11
+	vpunpckhqdq	%zmm10,%zmm8,%zmm8
+	vpaddd	%zmm28,%zmm12,%zmm12
+	vpaddd	%zmm29,%zmm13,%zmm13
+	vpaddd	%zmm30,%zmm14,%zmm14
+	vpaddd	%zmm31,%zmm15,%zmm15
+
+	vpunpckldq	%zmm13,%zmm12,%zmm10
+	vpunpckldq	%zmm15,%zmm14,%zmm0
+	vpunpckhdq	%zmm13,%zmm12,%zmm12
+	vpunpckhdq	%zmm15,%zmm14,%zmm14
+	vpunpcklqdq	%zmm0,%zmm10,%zmm13
+	vpunpckhqdq	%zmm0,%zmm10,%zmm10
+	vpunpcklqdq	%zmm14,%zmm12,%zmm15
+	vpunpckhqdq	%zmm14,%zmm12,%zmm12
+	vshufi32x4	$0x44,%zmm13,%zmm9,%zmm0
+	vshufi32x4	$0xee,%zmm13,%zmm9,%zmm13
+	vshufi32x4	$0x44,%zmm10,%zmm6,%zmm9
+	vshufi32x4	$0xee,%zmm10,%zmm6,%zmm10
+	vshufi32x4	$0x44,%zmm15,%zmm11,%zmm6
+	vshufi32x4	$0xee,%zmm15,%zmm11,%zmm15
+	vshufi32x4	$0x44,%zmm12,%zmm8,%zmm11
+	vshufi32x4	$0xee,%zmm12,%zmm8,%zmm12
+	vshufi32x4	$0x88,%zmm0,%zmm19,%zmm16
+	vshufi32x4	$0xdd,%zmm0,%zmm19,%zmm19
+	vshufi32x4	$0x88,%zmm13,%zmm5,%zmm0
+	vshufi32x4	$0xdd,%zmm13,%zmm5,%zmm13
+	vshufi32x4	$0x88,%zmm9,%zmm1,%zmm17
+	vshufi32x4	$0xdd,%zmm9,%zmm1,%zmm1
+	vshufi32x4	$0x88,%zmm10,%zmm2,%zmm9
+	vshufi32x4	$0xdd,%zmm10,%zmm2,%zmm10
+	vshufi32x4	$0x88,%zmm6,%zmm18,%zmm14
+	vshufi32x4	$0xdd,%zmm6,%zmm18,%zmm18
+	vshufi32x4	$0x88,%zmm15,%zmm7,%zmm6
+	vshufi32x4	$0xdd,%zmm15,%zmm7,%zmm15
+	vshufi32x4	$0x88,%zmm11,%zmm3,%zmm8
+	vshufi32x4	$0xdd,%zmm11,%zmm3,%zmm3
+	vshufi32x4	$0x88,%zmm12,%zmm4,%zmm11
+	vshufi32x4	$0xdd,%zmm12,%zmm4,%zmm12
+	cmpq	$1024,%rdx
+	jb	.Ltail16x
+
+	vpxord	0(%rsi),%zmm16,%zmm16
+	vpxord	64(%rsi),%zmm17,%zmm17
+	vpxord	128(%rsi),%zmm14,%zmm14
+	vpxord	192(%rsi),%zmm8,%zmm8
+	vmovdqu32	%zmm16,0(%rdi)
+	vmovdqu32	%zmm17,64(%rdi)
+	vmovdqu32	%zmm14,128(%rdi)
+	vmovdqu32	%zmm8,192(%rdi)
+
+	vpxord	256(%rsi),%zmm19,%zmm19
+	vpxord	320(%rsi),%zmm1,%zmm1
+	vpxord	384(%rsi),%zmm18,%zmm18
+	vpxord	448(%rsi),%zmm3,%zmm3
+	vmovdqu32	%zmm19,256(%rdi)
+	vmovdqu32	%zmm1,320(%rdi)
+	vmovdqu32	%zmm18,384(%rdi)
+	vmovdqu32	%zmm3,448(%rdi)
+
+	vpxord	512(%rsi),%zmm0,%zmm0
+	vpxord	576(%rsi),%zmm9,%zmm9
+	vpxord	640(%rsi),%zmm6,%zmm6
+	vpxord	704(%rsi),%zmm11,%zmm11
+	vmovdqu32	%zmm0,512(%rdi)
+	vmovdqu32	%zmm9,576(%rdi)
+	vmovdqu32	%zmm6,640(%rdi)
+	vmovdqu32	%zmm11,704(%rdi)
+
+	vpxord	768(%rsi),%zmm13,%zmm13
+	vpxord	832(%rsi),%zmm10,%zmm10
+	vpxord	896(%rsi),%zmm15,%zmm15
+	vpxord	960(%rsi),%zmm12,%zmm12
+	leaq	1024(%rsi),%rsi
+	vmovdqu32	%zmm13,768(%rdi)
+	vmovdqu32	%zmm10,832(%rdi)
+	vmovdqu32	%zmm15,896(%rdi)
+	vmovdqu32	%zmm12,960(%rdi)
+	leaq	1024(%rdi),%rdi
+
+	subq	$1024,%rdx
+	jnz	.Loop_outer16x
+
+	jmp	.Ldone16x
+
+.align	32
+.Ltail16x:
+	xorq	%r9,%r9
+	subq	%rsi,%rdi
+	cmpq	$64,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm16,%zmm16
+	vmovdqu32	%zmm16,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm17,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$128,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm17,%zmm17
+	vmovdqu32	%zmm17,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm14,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$192,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm14,%zmm14
+	vmovdqu32	%zmm14,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm8,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$256,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm8,%zmm8
+	vmovdqu32	%zmm8,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm19,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$320,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm19,%zmm19
+	vmovdqu32	%zmm19,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm1,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$384,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm1,%zmm1
+	vmovdqu32	%zmm1,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm18,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$448,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm18,%zmm18
+	vmovdqu32	%zmm18,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm3,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$512,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm3,%zmm3
+	vmovdqu32	%zmm3,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm0,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$576,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm0,%zmm0
+	vmovdqu32	%zmm0,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm9,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$640,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm9,%zmm9
+	vmovdqu32	%zmm9,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm6,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$704,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm6,%zmm6
+	vmovdqu32	%zmm6,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm11,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$768,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm11,%zmm11
+	vmovdqu32	%zmm11,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm13,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$832,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm13,%zmm13
+	vmovdqu32	%zmm13,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm10,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$896,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm10,%zmm10
+	vmovdqu32	%zmm10,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm15,%zmm16
+	leaq	64(%rsi),%rsi
+
+	cmpq	$960,%rdx
+	jb	.Less_than_64_16x
+	vpxord	(%rsi),%zmm15,%zmm15
+	vmovdqu32	%zmm15,(%rdi,%rsi,1)
+	je	.Ldone16x
+	vmovdqa32	%zmm12,%zmm16
+	leaq	64(%rsi),%rsi
+
+.Less_than_64_16x:
+	vmovdqa32	%zmm16,0(%rsp)
+	leaq	(%rdi,%rsi,1),%rdi
+	andq	$63,%rdx
+
+.Loop_tail16x:
+	movzbl	(%rsi,%r9,1),%eax
+	movzbl	(%rsp,%r9,1),%ecx
+	leaq	1(%r9),%r9
+	xorl	%ecx,%eax
+	movb	%al,-1(%rdi,%r9,1)
+	decq	%rdx
+	jnz	.Loop_tail16x
+
+	vpxord	%zmm16,%zmm16,%zmm16
+	vmovdqa32	%zmm16,0(%rsp)
+
+.Ldone16x:
+	vzeroall
+	leaq	-8(%r10),%rsp
+
+.L16x_epilogue:
+	ret
+ENDPROC(chacha20_avx512)
+
+.align	32
+ENTRY(chacha20_avx512vl)
+	cmpq	$0,%rdx
+	je	.Lavx512vl_epilogue
+
+	leaq	8(%rsp),%r10
+
+	cmpq	$128,%rdx
+	ja	.Lchacha20_8xvl
+
+	subq	$64+8,%rsp
+	andq	$-64,%rsp
+	vbroadcasti128	.Lsigma(%rip),%ymm0
+	vbroadcasti128	(%rcx),%ymm1
+	vbroadcasti128	16(%rcx),%ymm2
+	vbroadcasti128	(%r8),%ymm3
+
+	vmovdqa32	%ymm0,%ymm16
+	vmovdqa32	%ymm1,%ymm17
+	vmovdqa32	%ymm2,%ymm18
+	vpaddd	.Lzeroz(%rip),%ymm3,%ymm3
+	vmovdqa32	.Ltwoy(%rip),%ymm20
+	movq	$10,%r8
+	vmovdqa32	%ymm3,%ymm19
+	jmp	.Loop_avx512vl
+
+.align	16
+.Loop_outer_avx512vl:
+	vmovdqa32	%ymm18,%ymm2
+	vpaddd	%ymm20,%ymm19,%ymm3
+	movq	$10,%r8
+	vmovdqa32	%ymm3,%ymm19
+	jmp	.Loop_avx512vl
+
+.align	32
+.Loop_avx512vl:
+	vpaddd	%ymm1,%ymm0,%ymm0
+	vpxor	%ymm0,%ymm3,%ymm3
+	vprold	$16,%ymm3,%ymm3
+	vpaddd	%ymm3,%ymm2,%ymm2
+	vpxor	%ymm2,%ymm1,%ymm1
+	vprold	$12,%ymm1,%ymm1
+	vpaddd	%ymm1,%ymm0,%ymm0
+	vpxor	%ymm0,%ymm3,%ymm3
+	vprold	$8,%ymm3,%ymm3
+	vpaddd	%ymm3,%ymm2,%ymm2
+	vpxor	%ymm2,%ymm1,%ymm1
+	vprold	$7,%ymm1,%ymm1
+	vpshufd	$78,%ymm2,%ymm2
+	vpshufd	$57,%ymm1,%ymm1
+	vpshufd	$147,%ymm3,%ymm3
+	vpaddd	%ymm1,%ymm0,%ymm0
+	vpxor	%ymm0,%ymm3,%ymm3
+	vprold	$16,%ymm3,%ymm3
+	vpaddd	%ymm3,%ymm2,%ymm2
+	vpxor	%ymm2,%ymm1,%ymm1
+	vprold	$12,%ymm1,%ymm1
+	vpaddd	%ymm1,%ymm0,%ymm0
+	vpxor	%ymm0,%ymm3,%ymm3
+	vprold	$8,%ymm3,%ymm3
+	vpaddd	%ymm3,%ymm2,%ymm2
+	vpxor	%ymm2,%ymm1,%ymm1
+	vprold	$7,%ymm1,%ymm1
+	vpshufd	$78,%ymm2,%ymm2
+	vpshufd	$147,%ymm1,%ymm1
+	vpshufd	$57,%ymm3,%ymm3
+	decq	%r8
+	jnz	.Loop_avx512vl
+	vpaddd	%ymm16,%ymm0,%ymm0
+	vpaddd	%ymm17,%ymm1,%ymm1
+	vpaddd	%ymm18,%ymm2,%ymm2
+	vpaddd	%ymm19,%ymm3,%ymm3
+
+	subq	$64,%rdx
+	jb	.Ltail64_avx512vl
+
+	vpxor	0(%rsi),%xmm0,%xmm4
+	vpxor	16(%rsi),%xmm1,%xmm5
+	vpxor	32(%rsi),%xmm2,%xmm6
+	vpxor	48(%rsi),%xmm3,%xmm7
+	leaq	64(%rsi),%rsi
+
+	vmovdqu	%xmm4,0(%rdi)
+	vmovdqu	%xmm5,16(%rdi)
+	vmovdqu	%xmm6,32(%rdi)
+	vmovdqu	%xmm7,48(%rdi)
+	leaq	64(%rdi),%rdi
+
+	jz	.Ldone_avx512vl
+
+	vextracti128	$1,%ymm0,%xmm4
+	vextracti128	$1,%ymm1,%xmm5
+	vextracti128	$1,%ymm2,%xmm6
+	vextracti128	$1,%ymm3,%xmm7
+
+	subq	$64,%rdx
+	jb	.Ltail_avx512vl
+
+	vpxor	0(%rsi),%xmm4,%xmm4
+	vpxor	16(%rsi),%xmm5,%xmm5
+	vpxor	32(%rsi),%xmm6,%xmm6
+	vpxor	48(%rsi),%xmm7,%xmm7
+	leaq	64(%rsi),%rsi
+
+	vmovdqu	%xmm4,0(%rdi)
+	vmovdqu	%xmm5,16(%rdi)
+	vmovdqu	%xmm6,32(%rdi)
+	vmovdqu	%xmm7,48(%rdi)
+	leaq	64(%rdi),%rdi
+
+	vmovdqa32	%ymm16,%ymm0
+	vmovdqa32	%ymm17,%ymm1
+	jnz	.Loop_outer_avx512vl
+
+	jmp	.Ldone_avx512vl
+
+.align	16
+.Ltail64_avx512vl:
+	vmovdqa	%xmm0,0(%rsp)
+	vmovdqa	%xmm1,16(%rsp)
+	vmovdqa	%xmm2,32(%rsp)
+	vmovdqa	%xmm3,48(%rsp)
+	addq	$64,%rdx
+	jmp	.Loop_tail_avx512vl
+
+.align	16
+.Ltail_avx512vl:
+	vmovdqa	%xmm4,0(%rsp)
+	vmovdqa	%xmm5,16(%rsp)
+	vmovdqa	%xmm6,32(%rsp)
+	vmovdqa	%xmm7,48(%rsp)
+	addq	$64,%rdx
+
+.Loop_tail_avx512vl:
+	movzbl	(%rsi,%r8,1),%eax
+	movzbl	(%rsp,%r8,1),%ecx
+	leaq	1(%r8),%r8
+	xorl	%ecx,%eax
+	movb	%al,-1(%rdi,%r8,1)
+	decq	%rdx
+	jnz	.Loop_tail_avx512vl
+
+	vmovdqa32	%ymm16,0(%rsp)
+	vmovdqa32	%ymm16,32(%rsp)
+
+.Ldone_avx512vl:
+	vzeroall
+	leaq	-8(%r10),%rsp
+.Lavx512vl_epilogue:
+	ret
+
+.align	32
+.Lchacha20_8xvl:
+	leaq	8(%rsp),%r10
+	subq	$64+8,%rsp
+	andq	$-64,%rsp
+	vzeroupper
+
+	leaq	.Lsigma(%rip),%r9
+	vbroadcasti128	(%r9),%ymm3
+	vbroadcasti128	(%rcx),%ymm7
+	vbroadcasti128	16(%rcx),%ymm11
+	vbroadcasti128	(%r8),%ymm15
+
+	vpshufd	$0x00,%ymm3,%ymm0
+	vpshufd	$0x55,%ymm3,%ymm1
+	vpshufd	$0xaa,%ymm3,%ymm2
+	vpshufd	$0xff,%ymm3,%ymm3
+	vmovdqa64	%ymm0,%ymm16
+	vmovdqa64	%ymm1,%ymm17
+	vmovdqa64	%ymm2,%ymm18
+	vmovdqa64	%ymm3,%ymm19
+
+	vpshufd	$0x00,%ymm7,%ymm4
+	vpshufd	$0x55,%ymm7,%ymm5
+	vpshufd	$0xaa,%ymm7,%ymm6
+	vpshufd	$0xff,%ymm7,%ymm7
+	vmovdqa64	%ymm4,%ymm20
+	vmovdqa64	%ymm5,%ymm21
+	vmovdqa64	%ymm6,%ymm22
+	vmovdqa64	%ymm7,%ymm23
+
+	vpshufd	$0x00,%ymm11,%ymm8
+	vpshufd	$0x55,%ymm11,%ymm9
+	vpshufd	$0xaa,%ymm11,%ymm10
+	vpshufd	$0xff,%ymm11,%ymm11
+	vmovdqa64	%ymm8,%ymm24
+	vmovdqa64	%ymm9,%ymm25
+	vmovdqa64	%ymm10,%ymm26
+	vmovdqa64	%ymm11,%ymm27
+
+	vpshufd	$0x00,%ymm15,%ymm12
+	vpshufd	$0x55,%ymm15,%ymm13
+	vpshufd	$0xaa,%ymm15,%ymm14
+	vpshufd	$0xff,%ymm15,%ymm15
+	vpaddd	.Lincy(%rip),%ymm12,%ymm12
+	vmovdqa64	%ymm12,%ymm28
+	vmovdqa64	%ymm13,%ymm29
+	vmovdqa64	%ymm14,%ymm30
+	vmovdqa64	%ymm15,%ymm31
+
+	movl	$10,%eax
+	jmp	.Loop8xvl
+
+.align	32
+.Loop_outer8xvl:
+
+
+	vpbroadcastd	8(%r9),%ymm2
+	vpbroadcastd	12(%r9),%ymm3
+	vpaddd	.Leight(%rip),%ymm28,%ymm28
+	vmovdqa64	%ymm20,%ymm4
+	vmovdqa64	%ymm21,%ymm5
+	vmovdqa64	%ymm22,%ymm6
+	vmovdqa64	%ymm23,%ymm7
+	vmovdqa64	%ymm24,%ymm8
+	vmovdqa64	%ymm25,%ymm9
+	vmovdqa64	%ymm26,%ymm10
+	vmovdqa64	%ymm27,%ymm11
+	vmovdqa64	%ymm28,%ymm12
+	vmovdqa64	%ymm29,%ymm13
+	vmovdqa64	%ymm30,%ymm14
+	vmovdqa64	%ymm31,%ymm15
+
+	vmovdqa64	%ymm0,%ymm16
+	vmovdqa64	%ymm1,%ymm17
+	vmovdqa64	%ymm2,%ymm18
+	vmovdqa64	%ymm3,%ymm19
+
+	movl	$10,%eax
+	jmp	.Loop8xvl
+
+.align	32
+.Loop8xvl:
+	vpaddd	%ymm4,%ymm0,%ymm0
+	vpaddd	%ymm5,%ymm1,%ymm1
+	vpaddd	%ymm6,%ymm2,%ymm2
+	vpaddd	%ymm7,%ymm3,%ymm3
+	vpxor	%ymm0,%ymm12,%ymm12
+	vpxor	%ymm1,%ymm13,%ymm13
+	vpxor	%ymm2,%ymm14,%ymm14
+	vpxor	%ymm3,%ymm15,%ymm15
+	vprold	$16,%ymm12,%ymm12
+	vprold	$16,%ymm13,%ymm13
+	vprold	$16,%ymm14,%ymm14
+	vprold	$16,%ymm15,%ymm15
+	vpaddd	%ymm12,%ymm8,%ymm8
+	vpaddd	%ymm13,%ymm9,%ymm9
+	vpaddd	%ymm14,%ymm10,%ymm10
+	vpaddd	%ymm15,%ymm11,%ymm11
+	vpxor	%ymm8,%ymm4,%ymm4
+	vpxor	%ymm9,%ymm5,%ymm5
+	vpxor	%ymm10,%ymm6,%ymm6
+	vpxor	%ymm11,%ymm7,%ymm7
+	vprold	$12,%ymm4,%ymm4
+	vprold	$12,%ymm5,%ymm5
+	vprold	$12,%ymm6,%ymm6
+	vprold	$12,%ymm7,%ymm7
+	vpaddd	%ymm4,%ymm0,%ymm0
+	vpaddd	%ymm5,%ymm1,%ymm1
+	vpaddd	%ymm6,%ymm2,%ymm2
+	vpaddd	%ymm7,%ymm3,%ymm3
+	vpxor	%ymm0,%ymm12,%ymm12
+	vpxor	%ymm1,%ymm13,%ymm13
+	vpxor	%ymm2,%ymm14,%ymm14
+	vpxor	%ymm3,%ymm15,%ymm15
+	vprold	$8,%ymm12,%ymm12
+	vprold	$8,%ymm13,%ymm13
+	vprold	$8,%ymm14,%ymm14
+	vprold	$8,%ymm15,%ymm15
+	vpaddd	%ymm12,%ymm8,%ymm8
+	vpaddd	%ymm13,%ymm9,%ymm9
+	vpaddd	%ymm14,%ymm10,%ymm10
+	vpaddd	%ymm15,%ymm11,%ymm11
+	vpxor	%ymm8,%ymm4,%ymm4
+	vpxor	%ymm9,%ymm5,%ymm5
+	vpxor	%ymm10,%ymm6,%ymm6
+	vpxor	%ymm11,%ymm7,%ymm7
+	vprold	$7,%ymm4,%ymm4
+	vprold	$7,%ymm5,%ymm5
+	vprold	$7,%ymm6,%ymm6
+	vprold	$7,%ymm7,%ymm7
+	vpaddd	%ymm5,%ymm0,%ymm0
+	vpaddd	%ymm6,%ymm1,%ymm1
+	vpaddd	%ymm7,%ymm2,%ymm2
+	vpaddd	%ymm4,%ymm3,%ymm3
+	vpxor	%ymm0,%ymm15,%ymm15
+	vpxor	%ymm1,%ymm12,%ymm12
+	vpxor	%ymm2,%ymm13,%ymm13
+	vpxor	%ymm3,%ymm14,%ymm14
+	vprold	$16,%ymm15,%ymm15
+	vprold	$16,%ymm12,%ymm12
+	vprold	$16,%ymm13,%ymm13
+	vprold	$16,%ymm14,%ymm14
+	vpaddd	%ymm15,%ymm10,%ymm10
+	vpaddd	%ymm12,%ymm11,%ymm11
+	vpaddd	%ymm13,%ymm8,%ymm8
+	vpaddd	%ymm14,%ymm9,%ymm9
+	vpxor	%ymm10,%ymm5,%ymm5
+	vpxor	%ymm11,%ymm6,%ymm6
+	vpxor	%ymm8,%ymm7,%ymm7
+	vpxor	%ymm9,%ymm4,%ymm4
+	vprold	$12,%ymm5,%ymm5
+	vprold	$12,%ymm6,%ymm6
+	vprold	$12,%ymm7,%ymm7
+	vprold	$12,%ymm4,%ymm4
+	vpaddd	%ymm5,%ymm0,%ymm0
+	vpaddd	%ymm6,%ymm1,%ymm1
+	vpaddd	%ymm7,%ymm2,%ymm2
+	vpaddd	%ymm4,%ymm3,%ymm3
+	vpxor	%ymm0,%ymm15,%ymm15
+	vpxor	%ymm1,%ymm12,%ymm12
+	vpxor	%ymm2,%ymm13,%ymm13
+	vpxor	%ymm3,%ymm14,%ymm14
+	vprold	$8,%ymm15,%ymm15
+	vprold	$8,%ymm12,%ymm12
+	vprold	$8,%ymm13,%ymm13
+	vprold	$8,%ymm14,%ymm14
+	vpaddd	%ymm15,%ymm10,%ymm10
+	vpaddd	%ymm12,%ymm11,%ymm11
+	vpaddd	%ymm13,%ymm8,%ymm8
+	vpaddd	%ymm14,%ymm9,%ymm9
+	vpxor	%ymm10,%ymm5,%ymm5
+	vpxor	%ymm11,%ymm6,%ymm6
+	vpxor	%ymm8,%ymm7,%ymm7
+	vpxor	%ymm9,%ymm4,%ymm4
+	vprold	$7,%ymm5,%ymm5
+	vprold	$7,%ymm6,%ymm6
+	vprold	$7,%ymm7,%ymm7
+	vprold	$7,%ymm4,%ymm4
+	decl	%eax
+	jnz	.Loop8xvl
+
+	vpaddd	%ymm16,%ymm0,%ymm0
+	vpaddd	%ymm17,%ymm1,%ymm1
+	vpaddd	%ymm18,%ymm2,%ymm2
+	vpaddd	%ymm19,%ymm3,%ymm3
+
+	vpunpckldq	%ymm1,%ymm0,%ymm18
+	vpunpckldq	%ymm3,%ymm2,%ymm19
+	vpunpckhdq	%ymm1,%ymm0,%ymm0
+	vpunpckhdq	%ymm3,%ymm2,%ymm2
+	vpunpcklqdq	%ymm19,%ymm18,%ymm1
+	vpunpckhqdq	%ymm19,%ymm18,%ymm18
+	vpunpcklqdq	%ymm2,%ymm0,%ymm3
+	vpunpckhqdq	%ymm2,%ymm0,%ymm0
+	vpaddd	%ymm20,%ymm4,%ymm4
+	vpaddd	%ymm21,%ymm5,%ymm5
+	vpaddd	%ymm22,%ymm6,%ymm6
+	vpaddd	%ymm23,%ymm7,%ymm7
+
+	vpunpckldq	%ymm5,%ymm4,%ymm2
+	vpunpckldq	%ymm7,%ymm6,%ymm19
+	vpunpckhdq	%ymm5,%ymm4,%ymm4
+	vpunpckhdq	%ymm7,%ymm6,%ymm6
+	vpunpcklqdq	%ymm19,%ymm2,%ymm5
+	vpunpckhqdq	%ymm19,%ymm2,%ymm2
+	vpunpcklqdq	%ymm6,%ymm4,%ymm7
+	vpunpckhqdq	%ymm6,%ymm4,%ymm4
+	vshufi32x4	$0,%ymm5,%ymm1,%ymm19
+	vshufi32x4	$3,%ymm5,%ymm1,%ymm5
+	vshufi32x4	$0,%ymm2,%ymm18,%ymm1
+	vshufi32x4	$3,%ymm2,%ymm18,%ymm2
+	vshufi32x4	$0,%ymm7,%ymm3,%ymm18
+	vshufi32x4	$3,%ymm7,%ymm3,%ymm7
+	vshufi32x4	$0,%ymm4,%ymm0,%ymm3
+	vshufi32x4	$3,%ymm4,%ymm0,%ymm4
+	vpaddd	%ymm24,%ymm8,%ymm8
+	vpaddd	%ymm25,%ymm9,%ymm9
+	vpaddd	%ymm26,%ymm10,%ymm10
+	vpaddd	%ymm27,%ymm11,%ymm11
+
+	vpunpckldq	%ymm9,%ymm8,%ymm6
+	vpunpckldq	%ymm11,%ymm10,%ymm0
+	vpunpckhdq	%ymm9,%ymm8,%ymm8
+	vpunpckhdq	%ymm11,%ymm10,%ymm10
+	vpunpcklqdq	%ymm0,%ymm6,%ymm9
+	vpunpckhqdq	%ymm0,%ymm6,%ymm6
+	vpunpcklqdq	%ymm10,%ymm8,%ymm11
+	vpunpckhqdq	%ymm10,%ymm8,%ymm8
+	vpaddd	%ymm28,%ymm12,%ymm12
+	vpaddd	%ymm29,%ymm13,%ymm13
+	vpaddd	%ymm30,%ymm14,%ymm14
+	vpaddd	%ymm31,%ymm15,%ymm15
+
+	vpunpckldq	%ymm13,%ymm12,%ymm10
+	vpunpckldq	%ymm15,%ymm14,%ymm0
+	vpunpckhdq	%ymm13,%ymm12,%ymm12
+	vpunpckhdq	%ymm15,%ymm14,%ymm14
+	vpunpcklqdq	%ymm0,%ymm10,%ymm13
+	vpunpckhqdq	%ymm0,%ymm10,%ymm10
+	vpunpcklqdq	%ymm14,%ymm12,%ymm15
+	vpunpckhqdq	%ymm14,%ymm12,%ymm12
+	vperm2i128	$0x20,%ymm13,%ymm9,%ymm0
+	vperm2i128	$0x31,%ymm13,%ymm9,%ymm13
+	vperm2i128	$0x20,%ymm10,%ymm6,%ymm9
+	vperm2i128	$0x31,%ymm10,%ymm6,%ymm10
+	vperm2i128	$0x20,%ymm15,%ymm11,%ymm6
+	vperm2i128	$0x31,%ymm15,%ymm11,%ymm15
+	vperm2i128	$0x20,%ymm12,%ymm8,%ymm11
+	vperm2i128	$0x31,%ymm12,%ymm8,%ymm12
+	cmpq	$512,%rdx
+	jb	.Ltail8xvl
+
+	movl	$0x80,%eax
+	vpxord	0(%rsi),%ymm19,%ymm19
+	vpxor	32(%rsi),%ymm0,%ymm0
+	vpxor	64(%rsi),%ymm5,%ymm5
+	vpxor	96(%rsi),%ymm13,%ymm13
+	leaq	(%rsi,%rax,1),%rsi
+	vmovdqu32	%ymm19,0(%rdi)
+	vmovdqu	%ymm0,32(%rdi)
+	vmovdqu	%ymm5,64(%rdi)
+	vmovdqu	%ymm13,96(%rdi)
+	leaq	(%rdi,%rax,1),%rdi
+
+	vpxor	0(%rsi),%ymm1,%ymm1
+	vpxor	32(%rsi),%ymm9,%ymm9
+	vpxor	64(%rsi),%ymm2,%ymm2
+	vpxor	96(%rsi),%ymm10,%ymm10
+	leaq	(%rsi,%rax,1),%rsi
+	vmovdqu	%ymm1,0(%rdi)
+	vmovdqu	%ymm9,32(%rdi)
+	vmovdqu	%ymm2,64(%rdi)
+	vmovdqu	%ymm10,96(%rdi)
+	leaq	(%rdi,%rax,1),%rdi
+
+	vpxord	0(%rsi),%ymm18,%ymm18
+	vpxor	32(%rsi),%ymm6,%ymm6
+	vpxor	64(%rsi),%ymm7,%ymm7
+	vpxor	96(%rsi),%ymm15,%ymm15
+	leaq	(%rsi,%rax,1),%rsi
+	vmovdqu32	%ymm18,0(%rdi)
+	vmovdqu	%ymm6,32(%rdi)
+	vmovdqu	%ymm7,64(%rdi)
+	vmovdqu	%ymm15,96(%rdi)
+	leaq	(%rdi,%rax,1),%rdi
+
+	vpxor	0(%rsi),%ymm3,%ymm3
+	vpxor	32(%rsi),%ymm11,%ymm11
+	vpxor	64(%rsi),%ymm4,%ymm4
+	vpxor	96(%rsi),%ymm12,%ymm12
+	leaq	(%rsi,%rax,1),%rsi
+	vmovdqu	%ymm3,0(%rdi)
+	vmovdqu	%ymm11,32(%rdi)
+	vmovdqu	%ymm4,64(%rdi)
+	vmovdqu	%ymm12,96(%rdi)
+	leaq	(%rdi,%rax,1),%rdi
+
+	vpbroadcastd	0(%r9),%ymm0
+	vpbroadcastd	4(%r9),%ymm1
+
+	subq	$512,%rdx
+	jnz	.Loop_outer8xvl
+
+	jmp	.Ldone8xvl
+
+.align	32
+.Ltail8xvl:
+	vmovdqa64	%ymm19,%ymm8
+	xorq	%r9,%r9
+	subq	%rsi,%rdi
+	cmpq	$64,%rdx
+	jb	.Less_than_64_8xvl
+	vpxor	0(%rsi),%ymm8,%ymm8
+	vpxor	32(%rsi),%ymm0,%ymm0
+	vmovdqu	%ymm8,0(%rdi,%rsi,1)
+	vmovdqu	%ymm0,32(%rdi,%rsi,1)
+	je	.Ldone8xvl
+	vmovdqa	%ymm5,%ymm8
+	vmovdqa	%ymm13,%ymm0
+	leaq	64(%rsi),%rsi
+
+	cmpq	$128,%rdx
+	jb	.Less_than_64_8xvl
+	vpxor	0(%rsi),%ymm5,%ymm5
+	vpxor	32(%rsi),%ymm13,%ymm13
+	vmovdqu	%ymm5,0(%rdi,%rsi,1)
+	vmovdqu	%ymm13,32(%rdi,%rsi,1)
+	je	.Ldone8xvl
+	vmovdqa	%ymm1,%ymm8
+	vmovdqa	%ymm9,%ymm0
+	leaq	64(%rsi),%rsi
+
+	cmpq	$192,%rdx
+	jb	.Less_than_64_8xvl
+	vpxor	0(%rsi),%ymm1,%ymm1
+	vpxor	32(%rsi),%ymm9,%ymm9
+	vmovdqu	%ymm1,0(%rdi,%rsi,1)
+	vmovdqu	%ymm9,32(%rdi,%rsi,1)
+	je	.Ldone8xvl
+	vmovdqa	%ymm2,%ymm8
+	vmovdqa	%ymm10,%ymm0
+	leaq	64(%rsi),%rsi
+
+	cmpq	$256,%rdx
+	jb	.Less_than_64_8xvl
+	vpxor	0(%rsi),%ymm2,%ymm2
+	vpxor	32(%rsi),%ymm10,%ymm10
+	vmovdqu	%ymm2,0(%rdi,%rsi,1)
+	vmovdqu	%ymm10,32(%rdi,%rsi,1)
+	je	.Ldone8xvl
+	vmovdqa32	%ymm18,%ymm8
+	vmovdqa	%ymm6,%ymm0
+	leaq	64(%rsi),%rsi
+
+	cmpq	$320,%rdx
+	jb	.Less_than_64_8xvl
+	vpxord	0(%rsi),%ymm18,%ymm18
+	vpxor	32(%rsi),%ymm6,%ymm6
+	vmovdqu32	%ymm18,0(%rdi,%rsi,1)
+	vmovdqu	%ymm6,32(%rdi,%rsi,1)
+	je	.Ldone8xvl
+	vmovdqa	%ymm7,%ymm8
+	vmovdqa	%ymm15,%ymm0
+	leaq	64(%rsi),%rsi
+
+	cmpq	$384,%rdx
+	jb	.Less_than_64_8xvl
+	vpxor	0(%rsi),%ymm7,%ymm7
+	vpxor	32(%rsi),%ymm15,%ymm15
+	vmovdqu	%ymm7,0(%rdi,%rsi,1)
+	vmovdqu	%ymm15,32(%rdi,%rsi,1)
+	je	.Ldone8xvl
+	vmovdqa	%ymm3,%ymm8
+	vmovdqa	%ymm11,%ymm0
+	leaq	64(%rsi),%rsi
+
+	cmpq	$448,%rdx
+	jb	.Less_than_64_8xvl
+	vpxor	0(%rsi),%ymm3,%ymm3
+	vpxor	32(%rsi),%ymm11,%ymm11
+	vmovdqu	%ymm3,0(%rdi,%rsi,1)
+	vmovdqu	%ymm11,32(%rdi,%rsi,1)
+	je	.Ldone8xvl
+	vmovdqa	%ymm4,%ymm8
+	vmovdqa	%ymm12,%ymm0
+	leaq	64(%rsi),%rsi
+
+.Less_than_64_8xvl:
+	vmovdqa	%ymm8,0(%rsp)
+	vmovdqa	%ymm0,32(%rsp)
+	leaq	(%rdi,%rsi,1),%rdi
+	andq	$63,%rdx
+
+.Loop_tail8xvl:
+	movzbl	(%rsi,%r9,1),%eax
+	movzbl	(%rsp,%r9,1),%ecx
+	leaq	1(%r9),%r9
+	xorl	%ecx,%eax
+	movb	%al,-1(%rdi,%r9,1)
+	decq	%rdx
+	jnz	.Loop_tail8xvl
+
+	vpxor	%ymm8,%ymm8,%ymm8
+	vmovdqa	%ymm8,0(%rsp)
+	vmovdqa	%ymm8,32(%rsp)
+
+.Ldone8xvl:
+	vzeroall
+	leaq	-8(%r10),%rsp
+.L8xvl_epilogue:
+	ret
+ENDPROC(chacha20_avx512vl)
+
+#endif /* CONFIG_AS_AVX512 */
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 06/20] zinc: ChaCha20 MIPS32r2 implementation
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (4 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 05/20] zinc: ChaCha20 x86_64 implementation Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 07/20] zinc: Poly1305 generic C implementations and selftest Jason A. Donenfeld
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, René van Dorst, Samuel Neves,
	Andy Lutomirski, Jean-Philippe Aumasson, Ralf Baechle,
	Paul Burton, James Hogan, linux-mips

This MIPS32r2 implementation comes from René van Dorst and me and
results in a nice speedup on the usual OpenWRT targets.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
---
 lib/zinc/Makefile                      |   4 +
 lib/zinc/chacha20/chacha20-mips-glue.h |  28 ++
 lib/zinc/chacha20/chacha20-mips.S      | 474 +++++++++++++++++++++++++
 3 files changed, 506 insertions(+)
 create mode 100644 lib/zinc/chacha20/chacha20-mips-glue.h
 create mode 100644 lib/zinc/chacha20/chacha20-mips.S

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 32e4bd94ea0b..9f6a5e65d729 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -17,6 +17,10 @@ ifeq ($(CONFIG_ZINC_ARCH_ARM64),y)
 zinc-y += chacha20/chacha20-arm64.o
 CFLAGS_chacha20.o += -include $(srctree)/$(src)/chacha20/chacha20-arm-glue.h
 endif
+ifeq ($(CONFIG_ZINC_ARCH_MIPS)$(CONFIG_CPU_MIPS32_R2),yy)
+zinc-y += chacha20/chacha20-mips.o
+CFLAGS_chacha20.o += -include $(srctree)/$(src)/chacha20/chacha20-mips-glue.h
+endif
 endif
 
 zinc-y += main.o
diff --git a/lib/zinc/chacha20/chacha20-mips-glue.h b/lib/zinc/chacha20/chacha20-mips-glue.h
new file mode 100644
index 000000000000..5b2c8cec36c8
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-mips-glue.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/chacha20.h>
+
+asmlinkage void chacha20_mips(u8 *out, const u8 *in, const size_t len,
+			      const u32 key[8], const u32 counter[4]);
+void __init chacha20_fpu_init(void)
+{
+}
+
+static inline bool chacha20_arch(u8 *dst, const u8 *src, const size_t len,
+				 const u32 key[8], const u32 counter[4],
+				 simd_context_t simd_context)
+{
+	chacha20_mips(dst, src, len, key, counter);
+	return true;
+}
+
+static inline bool hchacha20_arch(u8 *derived_key, const u8 *nonce,
+				  const u8 *key, simd_context_t simd_context)
+{
+	return false;
+}
+
+#define HAVE_CHACHA20_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/chacha20/chacha20-mips.S b/lib/zinc/chacha20/chacha20-mips.S
new file mode 100644
index 000000000000..77da2c2fb240
--- /dev/null
+++ b/lib/zinc/chacha20/chacha20-mips.S
@@ -0,0 +1,474 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2016-2018 René van Dorst <opensource@vdorst.com>. All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#define MASK_U32	0x3c
+#define MASK_BYTES	0x03
+#define CHACHA20_BLOCK_SIZE 64
+#define STACK_SIZE	4*16
+
+#define X0  $t0
+#define X1  $t1
+#define X2  $t2
+#define X3  $t3
+#define X4  $t4
+#define X5  $t5
+#define X6  $t6
+#define X7  $t7
+#define X8  $v1
+#define X9  $fp
+#define X10 $s7
+#define X11 $s6
+#define X12 $s5
+#define X13 $s4
+#define X14 $s3
+#define X15 $s2
+/* Use regs which are overwritten on exit for Tx so we don't leak clear data. */
+#define T0  $s1
+#define T1  $s0
+#define T(n) T ## n
+#define X(n) X ## n
+
+/* Input arguments */
+#define OUT		$a0
+#define IN		$a1
+#define BYTES		$a2
+/* KEY and NONCE argument must be u32 aligned */
+#define KEY		$a3
+/* NONCE pointer is given via stack */
+#define NONCE		$t9
+
+/* Output argument */
+/* NONCE[0] is kept in a register and not in memory.
+ * We don't want to touch original value in memory.
+ * Must be incremented every loop iteration.
+ */
+#define NONCE_0		$v0
+
+/* SAVED_X and SAVED_CA are set in the jump table.
+ * Use regs which are overwritten on exit else we don't leak clear data.
+ * They are used to handling the last bytes which are not multiple of 4.
+ */
+#define SAVED_X		X15
+#define SAVED_CA	$ra
+
+#define PTR_LAST_ROUND	$t8
+
+/* ChaCha20 constants and stack location */
+#define CONSTANT_OFS_SP	48
+#define UNALIGNED_OFS_SP 40
+
+#define CONSTANT_1	0x61707865
+#define CONSTANT_2	0x3320646e
+#define CONSTANT_3	0x79622d32
+#define CONSTANT_4	0x6b206574
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define MSB 0
+#define LSB 3
+#define ROTx rotl
+#define ROTR(n) rotr n, 24
+#define	CPU_TO_LE32(n) \
+	wsbh	n; \
+	rotr	n, 16;
+#else
+#define MSB 3
+#define LSB 0
+#define ROTx rotr
+#define CPU_TO_LE32(n)
+#define ROTR(n)
+#endif
+
+#define STORE_UNALIGNED(x, a, s, o) \
+.Lchacha20_mips_xor_unaligned_ ## x ## _b: ; \
+	.if ((s != NONCE) || (o != 0)); \
+		lw	T0, o(s); \
+	.endif; \
+	lwl	T1, x-4+MSB ## (IN); \
+	lwr	T1, x-4+LSB ## (IN); \
+	.if ((s == NONCE) && (o == 0)); \
+		addu	X ## a, NONCE_0; \
+	.else; \
+		addu	X ## a, T0; \
+	.endif; \
+	CPU_TO_LE32(X ## a); \
+	xor	X ## a, T1; \
+	swl	X ## a, x-4+MSB ## (OUT); \
+	swr	X ## a, x-4+LSB ## (OUT);
+
+#define STORE_ALIGNED(x, a, s, o) \
+.Lchacha20_mips_xor_aligned_ ## x ## _b: ; \
+	.if ((s != NONCE) || (o != 0)); \
+		lw	T0, o(s); \
+	.endif; \
+	lw	T1, x-4 ## (IN); \
+	.if ((s == NONCE) && (o == 0)); \
+		addu	X ## a, NONCE_0; \
+	.else; \
+		addu	X ## a, T0; \
+	.endif; \
+	CPU_TO_LE32(X ## a); \
+	xor	X ## a, T1; \
+	sw	X ## a, x-4 ## (OUT);
+
+/* Jump table macro.
+ * Used for setup and handling the last bytes, which are not multiple of 4.
+ * X15 is free to store Xn
+ * Every jumptable entry must be equal in size.
+ */
+#define JMPTBL_ALIGNED(x, a, s, o) \
+.Lchacha20_mips_jmptbl_aligned_ ## a: ; \
+	.if ((s == NONCE) && (o == 0)); \
+		move	SAVED_CA, NONCE_0; \
+	.else; \
+		lw	SAVED_CA, o(s);\
+	.endif; \
+	b	.Lchacha20_mips_xor_aligned_ ## x ## _b; \
+	move	SAVED_X, X ## a;
+
+#define JMPTBL_UNALIGNED(x, a, s, o) \
+.Lchacha20_mips_jmptbl_unaligned_ ## a: ; \
+	.if ((s == NONCE) && (o == 0)); \
+		move	SAVED_CA, NONCE_0; \
+	.else; \
+		lw	SAVED_CA, o(s);\
+	.endif; \
+	b	.Lchacha20_mips_xor_unaligned_ ## x ## _b; \
+	move	SAVED_X, X ## a;
+
+#define AXR(A, B, C, D,  K, L, M, N,  V, W, Y, Z,  S) \
+	addu	X(A), X(K); \
+	addu	X(B), X(L); \
+	addu	X(C), X(M); \
+	addu	X(D), X(N); \
+	xor	X(V), X(A); \
+	xor	X(W), X(B); \
+	xor	X(Y), X(C); \
+	xor	X(Z), X(D); \
+	rotl	X(V), S;    \
+	rotl	X(W), S;    \
+	rotl	X(Y), S;    \
+	rotl	X(Z), S;
+
+.text
+.set reorder
+.set noat
+.globl chacha20_mips
+.ent   chacha20_mips
+chacha20_mips:
+	.frame $sp, STACK_SIZE, $ra
+	/* This is in the fifth argument */
+	lw	NONCE, 16($sp)
+
+	/* Return bytes = 0. */
+	.set noreorder
+	beqz	BYTES, .Lchacha20_mips_end
+	addiu	$sp, -STACK_SIZE
+	.set reorder
+
+	/* Calculate PTR_LAST_ROUND */
+	addiu	PTR_LAST_ROUND, BYTES, -1
+	ins	PTR_LAST_ROUND, $zero, 0, 6
+	addu	PTR_LAST_ROUND, OUT
+
+	/* Save s0-s7, fp, ra. */
+	sw	$ra,  0($sp)
+	sw	$fp,  4($sp)
+	sw	$s0,  8($sp)
+	sw	$s1, 12($sp)
+	sw	$s2, 16($sp)
+	sw	$s3, 20($sp)
+	sw	$s4, 24($sp)
+	sw	$s5, 28($sp)
+	sw	$s6, 32($sp)
+	sw	$s7, 36($sp)
+
+	lw	NONCE_0, 0(NONCE)
+	/* Test IN or OUT is unaligned.
+	 * UNALIGNED (T1) = ( IN | OUT ) & 0x00000003
+	 */
+	or	T1, IN, OUT
+	andi	T1, 0x3
+
+	/* Load constant */
+	lui	X0, %hi(CONSTANT_1)
+	lui	X1, %hi(CONSTANT_2)
+	lui	X2, %hi(CONSTANT_3)
+	lui	X3, %hi(CONSTANT_4)
+	ori	X0, %lo(CONSTANT_1)
+	ori	X1, %lo(CONSTANT_2)
+	ori	X2, %lo(CONSTANT_3)
+	ori	X3, %lo(CONSTANT_4)
+
+	/* Store constant on stack. */
+	sw	X0,  0+CONSTANT_OFS_SP($sp)
+	sw	X1,  4+CONSTANT_OFS_SP($sp)
+	sw	X2,  8+CONSTANT_OFS_SP($sp)
+	sw	X3, 12+CONSTANT_OFS_SP($sp)
+
+	sw	T1, UNALIGNED_OFS_SP($sp)
+
+	.set	noreorder
+	b	.Lchacha20_rounds_start
+	andi	BYTES, (CHACHA20_BLOCK_SIZE-1)
+	.set	reorder
+
+.align 4
+.Loop_chacha20_rounds:
+	addiu	IN,  CHACHA20_BLOCK_SIZE
+	addiu	OUT, CHACHA20_BLOCK_SIZE
+	addiu	NONCE_0, 1
+
+	lw	X0,  0+CONSTANT_OFS_SP($sp)
+	lw	X1,  4+CONSTANT_OFS_SP($sp)
+	lw	X2,  8+CONSTANT_OFS_SP($sp)
+	lw	X3, 12+CONSTANT_OFS_SP($sp)
+	lw	T1,   UNALIGNED_OFS_SP($sp)
+
+.Lchacha20_rounds_start:
+	lw	X4,   0(KEY)
+	lw	X5,   4(KEY)
+	lw	X6,   8(KEY)
+	lw	X7,  12(KEY)
+	lw	X8,  16(KEY)
+	lw	X9,  20(KEY)
+	lw	X10, 24(KEY)
+	lw	X11, 28(KEY)
+
+	move	X12, NONCE_0
+	lw	X13,  4(NONCE)
+	lw	X14,  8(NONCE)
+	lw	X15, 12(NONCE)
+
+	li	$at, 9
+.Loop_chacha20_xor_rounds:
+	AXR( 0, 1, 2, 3,  4, 5, 6, 7, 12,13,14,15, 16);
+	AXR( 8, 9,10,11, 12,13,14,15,  4, 5, 6, 7, 12);
+	AXR( 0, 1, 2, 3,  4, 5, 6, 7, 12,13,14,15,  8);
+	AXR( 8, 9,10,11, 12,13,14,15,  4, 5, 6, 7,  7);
+	AXR( 0, 1, 2, 3,  5, 6, 7, 4, 15,12,13,14, 16);
+	AXR(10,11, 8, 9, 15,12,13,14,  5, 6, 7, 4, 12);
+	AXR( 0, 1, 2, 3,  5, 6, 7, 4, 15,12,13,14,  8);
+	AXR(10,11, 8, 9, 15,12,13,14,  5, 6, 7, 4,  7);
+	.set noreorder
+	bnez	$at, .Loop_chacha20_xor_rounds
+	addiu	$at, -1
+
+	/* Unaligned? Jump */
+	bnez	T1, .Loop_chacha20_unaligned
+	andi	$at, BYTES, MASK_U32
+
+	/* Last round? No jump */
+	bne	OUT, PTR_LAST_ROUND, .Lchacha20_mips_xor_aligned_64_b
+	/* Load upper half of jump table addr */
+	lui	T0, %hi(.Lchacha20_mips_jmptbl_aligned_0)
+
+	/* Full block? Jump */
+	beqz	BYTES, .Lchacha20_mips_xor_aligned_64_b
+	/* Calculate lower half jump table addr and offset */
+	ins	T0, $at, 2, 6
+
+	subu	T0, $at
+	addiu	T0, %lo(.Lchacha20_mips_jmptbl_aligned_0)
+
+	jr	T0
+	/* Delay slot */
+	nop
+
+	.set	reorder
+
+.Loop_chacha20_unaligned:
+	.set noreorder
+
+	/* Last round? no jump */
+	bne	OUT, PTR_LAST_ROUND, .Lchacha20_mips_xor_unaligned_64_b
+	/* Load upper half of jump table addr */
+	lui	T0, %hi(.Lchacha20_mips_jmptbl_unaligned_0)
+
+	/* Full block? Jump */
+	beqz	BYTES, .Lchacha20_mips_xor_unaligned_64_b
+
+	/* Calculate lower half jump table addr and offset */
+	ins     T0, $at, 2, 6
+	subu	T0, $at
+	addiu	T0, %lo(.Lchacha20_mips_jmptbl_unaligned_0)
+
+	jr	T0
+	/* Delay slot */
+	nop
+
+	.set	reorder
+
+/* Aligned code path
+ */
+.align 4
+	STORE_ALIGNED(64, 15, NONCE,12)
+	STORE_ALIGNED(60, 14, NONCE, 8)
+	STORE_ALIGNED(56, 13, NONCE, 4)
+	STORE_ALIGNED(52, 12, NONCE, 0)
+	STORE_ALIGNED(48, 11, KEY, 28)
+	STORE_ALIGNED(44, 10, KEY, 24)
+	STORE_ALIGNED(40,  9, KEY, 20)
+	STORE_ALIGNED(36,  8, KEY, 16)
+	STORE_ALIGNED(32,  7, KEY, 12)
+	STORE_ALIGNED(28,  6, KEY,  8)
+	STORE_ALIGNED(24,  5, KEY,  4)
+	STORE_ALIGNED(20,  4, KEY,  0)
+	STORE_ALIGNED(16,  3, $sp, 12+CONSTANT_OFS_SP)
+	STORE_ALIGNED(12,  2, $sp,  8+CONSTANT_OFS_SP)
+	STORE_ALIGNED( 8,  1, $sp,  4+CONSTANT_OFS_SP)
+.Lchacha20_mips_xor_aligned_4_b:
+	/* STORE_ALIGNED( 4,  0, $sp, 0+CONSTANT_OFS_SP) */
+	lw	T0, 0+CONSTANT_OFS_SP($sp)
+	lw	T1, 0(IN)
+	addu	X0, T0
+	CPU_TO_LE32(X0)
+	xor	X0, T1
+	.set noreorder
+	bne	OUT, PTR_LAST_ROUND, .Loop_chacha20_rounds
+	sw	X0, 0(OUT)
+	.set reorder
+
+	.set noreorder
+	bne	$at, BYTES, .Lchacha20_mips_xor_bytes
+	/* Empty delayslot, Increase NONCE_0, return NONCE_0 value */
+	addiu	NONCE_0, 1
+	.set noreorder
+
+.Lchacha20_mips_xor_done:
+	/* Restore used registers */
+	lw	$ra,  0($sp)
+	lw	$fp,  4($sp)
+	lw	$s0,  8($sp)
+	lw	$s1, 12($sp)
+	lw	$s2, 16($sp)
+	lw	$s3, 20($sp)
+	lw	$s4, 24($sp)
+	lw	$s5, 28($sp)
+	lw	$s6, 32($sp)
+	lw	$s7, 36($sp)
+.Lchacha20_mips_end:
+	.set noreorder
+	jr	$ra
+	addiu	$sp, STACK_SIZE
+	.set reorder
+
+	.set noreorder
+	/* Start jump table */
+	JMPTBL_ALIGNED( 0,  0, $sp,  0+CONSTANT_OFS_SP)
+	JMPTBL_ALIGNED( 4,  1, $sp,  4+CONSTANT_OFS_SP)
+	JMPTBL_ALIGNED( 8,  2, $sp,  8+CONSTANT_OFS_SP)
+	JMPTBL_ALIGNED(12,  3, $sp, 12+CONSTANT_OFS_SP)
+	JMPTBL_ALIGNED(16,  4, KEY,  0)
+	JMPTBL_ALIGNED(20,  5, KEY,  4)
+	JMPTBL_ALIGNED(24,  6, KEY,  8)
+	JMPTBL_ALIGNED(28,  7, KEY, 12)
+	JMPTBL_ALIGNED(32,  8, KEY, 16)
+	JMPTBL_ALIGNED(36,  9, KEY, 20)
+	JMPTBL_ALIGNED(40, 10, KEY, 24)
+	JMPTBL_ALIGNED(44, 11, KEY, 28)
+	JMPTBL_ALIGNED(48, 12, NONCE, 0)
+	JMPTBL_ALIGNED(52, 13, NONCE, 4)
+	JMPTBL_ALIGNED(56, 14, NONCE, 8)
+	JMPTBL_ALIGNED(60, 15, NONCE,12)
+	/* End jump table */
+	.set reorder
+
+/* Unaligned code path
+ */
+	STORE_UNALIGNED(64, 15, NONCE,12)
+	STORE_UNALIGNED(60, 14, NONCE, 8)
+	STORE_UNALIGNED(56, 13, NONCE, 4)
+	STORE_UNALIGNED(52, 12, NONCE, 0)
+	STORE_UNALIGNED(48, 11, KEY, 28)
+	STORE_UNALIGNED(44, 10, KEY, 24)
+	STORE_UNALIGNED(40,  9, KEY, 20)
+	STORE_UNALIGNED(36,  8, KEY, 16)
+	STORE_UNALIGNED(32,  7, KEY, 12)
+	STORE_UNALIGNED(28,  6, KEY,  8)
+	STORE_UNALIGNED(24,  5, KEY,  4)
+	STORE_UNALIGNED(20,  4, KEY,  0)
+	STORE_UNALIGNED(16,  3, $sp, 12+CONSTANT_OFS_SP)
+	STORE_UNALIGNED(12,  2, $sp,  8+CONSTANT_OFS_SP)
+	STORE_UNALIGNED( 8,  1, $sp,  4+CONSTANT_OFS_SP)
+.Lchacha20_mips_xor_unaligned_4_b:
+	/* STORE_UNALIGNED( 4,  0, $sp, 0+CONSTANT_OFS_SP) */
+	lw	T0, 0+CONSTANT_OFS_SP($sp)
+	lwl	T1, 0+MSB(IN)
+	lwr	T1, 0+LSB(IN)
+	addu	X0, T0
+	CPU_TO_LE32(X0)
+	xor	X0, T1
+	swl	X0, 0+MSB(OUT)
+	.set noreorder
+	bne	OUT, PTR_LAST_ROUND, .Loop_chacha20_rounds
+	swr	X0, 0+LSB(OUT)
+	.set reorder
+
+	/* Fall through to byte handling */
+	.set noreorder
+	beq	$at, BYTES, .Lchacha20_mips_xor_done
+	/* Empty delayslot, increase NONCE_0, return NONCE_0 value */
+.Lchacha20_mips_xor_unaligned_0_b:
+.Lchacha20_mips_xor_aligned_0_b:
+	addiu	NONCE_0, 1
+	.set reorder
+
+.Lchacha20_mips_xor_bytes:
+	addu	OUT, $at
+	addu	IN, $at
+	addu	SAVED_X, SAVED_CA
+	/* First byte */
+	lbu	T1, 0(IN)
+	andi	$at, BYTES, 2
+	CPU_TO_LE32(SAVED_X)
+	ROTR(SAVED_X)
+	xor	T1, SAVED_X
+	.set noreorder
+	beqz	$at, .Lchacha20_mips_xor_done
+	sb	T1, 0(OUT)
+	.set reorder
+	/* Second byte */
+	lbu	T1, 1(IN)
+	andi	$at, BYTES, 1
+	ROTx	SAVED_X, 8
+	xor	T1, SAVED_X
+	.set noreorder
+	beqz	$at, .Lchacha20_mips_xor_done
+	sb	T1, 1(OUT)
+	.set reorder
+	/* Third byte */
+	lbu	T1, 2(IN)
+	ROTx	SAVED_X, 8
+	xor	T1, SAVED_X
+	.set noreorder
+	b	.Lchacha20_mips_xor_done
+	sb	T1, 2(OUT)
+	.set reorder
+.set noreorder
+
+.Lchacha20_mips_jmptbl_unaligned:
+	/* Start jump table */
+	JMPTBL_UNALIGNED( 0,  0, $sp,  0+CONSTANT_OFS_SP)
+	JMPTBL_UNALIGNED( 4,  1, $sp,  4+CONSTANT_OFS_SP)
+	JMPTBL_UNALIGNED( 8,  2, $sp,  8+CONSTANT_OFS_SP)
+	JMPTBL_UNALIGNED(12,  3, $sp, 12+CONSTANT_OFS_SP)
+	JMPTBL_UNALIGNED(16,  4, KEY,  0)
+	JMPTBL_UNALIGNED(20,  5, KEY,  4)
+	JMPTBL_UNALIGNED(24,  6, KEY,  8)
+	JMPTBL_UNALIGNED(28,  7, KEY, 12)
+	JMPTBL_UNALIGNED(32,  8, KEY, 16)
+	JMPTBL_UNALIGNED(36,  9, KEY, 20)
+	JMPTBL_UNALIGNED(40, 10, KEY, 24)
+	JMPTBL_UNALIGNED(44, 11, KEY, 28)
+	JMPTBL_UNALIGNED(48, 12, NONCE, 0)
+	JMPTBL_UNALIGNED(52, 13, NONCE, 4)
+	JMPTBL_UNALIGNED(56, 14, NONCE, 8)
+	JMPTBL_UNALIGNED(60, 15, NONCE,12)
+	/* End jump table */
+.set reorder
+
+.end chacha20_mips
+.set at
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 07/20] zinc: Poly1305 generic C implementations and selftest
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (5 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 06/20] zinc: ChaCha20 MIPS32r2 implementation Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations Jason A. Donenfeld
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson

These two C implementations -- a 32x32 one and a 64x64 one, depending on
the platform -- come from Andrew Moon's public domain poly1305-donna
portable code, modified for usage in the kernel and for usage with
accelerated primitives.

Information: https://cr.yp.to/mac.html

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
 include/zinc/poly1305.h              |  38 ++
 lib/zinc/Kconfig                     |   4 +
 lib/zinc/Makefile                    |   4 +
 lib/zinc/main.c                      |   5 +
 lib/zinc/poly1305/poly1305-donna32.h | 205 +++++++
 lib/zinc/poly1305/poly1305-donna64.h | 182 ++++++
 lib/zinc/poly1305/poly1305.c         | 131 ++++
 lib/zinc/selftest/poly1305.h         | 876 +++++++++++++++++++++++++++
 8 files changed, 1445 insertions(+)
 create mode 100644 include/zinc/poly1305.h
 create mode 100644 lib/zinc/poly1305/poly1305-donna32.h
 create mode 100644 lib/zinc/poly1305/poly1305-donna64.h
 create mode 100644 lib/zinc/poly1305/poly1305.c
 create mode 100644 lib/zinc/selftest/poly1305.h

diff --git a/include/zinc/poly1305.h b/include/zinc/poly1305.h
new file mode 100644
index 000000000000..338430c8477a
--- /dev/null
+++ b/include/zinc/poly1305.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_POLY1305_H
+#define _ZINC_POLY1305_H
+
+#include <linux/simd.h>
+#include <linux/types.h>
+
+enum poly1305_lengths {
+	POLY1305_BLOCK_SIZE = 16,
+	POLY1305_KEY_SIZE = 32,
+	POLY1305_MAC_SIZE = 16
+};
+
+struct poly1305_ctx {
+	u8 opaque[24 * sizeof(u64)];
+	u32 nonce[4];
+	u8 data[POLY1305_BLOCK_SIZE];
+	size_t num;
+} __aligned(8);
+
+void poly1305_fpu_init(void);
+
+void poly1305_init(struct poly1305_ctx *ctx, const u8 key[POLY1305_KEY_SIZE],
+		   simd_context_t simd_context);
+void poly1305_update(struct poly1305_ctx *ctx, const u8 *input, size_t len,
+		     simd_context_t simd_context);
+void poly1305_final(struct poly1305_ctx *ctx, u8 mac[POLY1305_MAC_SIZE],
+		    simd_context_t simd_context);
+
+#ifdef DEBUG
+bool poly1305_selftest(void);
+#endif
+
+#endif /* _ZINC_POLY1305_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index e7d396d61607..bc8c61334362 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -6,6 +6,10 @@ config ZINC_CHACHA20
 	select ZINC
 	select CRYPTO_ALGAPI
 
+config ZINC_POLY1305
+	bool
+	select ZINC
+
 config ZINC_DEBUG
 	bool "Zinc cryptography library debugging and self-tests"
 	depends on ZINC
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 9f6a5e65d729..d1e3892e06d9 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -23,6 +23,10 @@ CFLAGS_chacha20.o += -include $(srctree)/$(src)/chacha20/chacha20-mips-glue.h
 endif
 endif
 
+ifeq ($(CONFIG_ZINC_POLY1305),y)
+zinc-y += poly1305/poly1305.o
+endif
+
 zinc-y += main.o
 
 obj-$(CONFIG_ZINC) := zinc.o
diff --git a/lib/zinc/main.c b/lib/zinc/main.c
index 7e8e84b706b7..d871dd406a5c 100644
--- a/lib/zinc/main.c
+++ b/lib/zinc/main.c
@@ -4,6 +4,7 @@
  */
 
 #include <zinc/chacha20.h>
+#include <zinc/poly1305.h>
 
 #include <linux/init.h>
 #include <linux/module.h>
@@ -21,6 +22,10 @@ static int __init mod_init(void)
 {
 #ifdef CONFIG_ZINC_CHACHA20
 	chacha20_fpu_init();
+#endif
+#ifdef CONFIG_ZINC_POLY1305
+	poly1305_fpu_init();
+	selftest(poly1305);
 #endif
 	return 0;
 }
diff --git a/lib/zinc/poly1305/poly1305-donna32.h b/lib/zinc/poly1305/poly1305-donna32.h
new file mode 100644
index 000000000000..dc32123210f9
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-donna32.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is based in part on Andrew Moon's poly1305-donna, which is in the
+ * public domain.
+ */
+
+struct poly1305_internal {
+	u32 h[5];
+	u32 r[5];
+	u32 s[4];
+};
+
+static void poly1305_init_generic(void *ctx, const u8 key[16])
+{
+	struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+
+	/* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
+	st->r[0] = (get_unaligned_le32(&key[0])) & 0x3ffffff;
+	st->r[1] = (get_unaligned_le32(&key[3]) >> 2) & 0x3ffff03;
+	st->r[2] = (get_unaligned_le32(&key[6]) >> 4) & 0x3ffc0ff;
+	st->r[3] = (get_unaligned_le32(&key[9]) >> 6) & 0x3f03fff;
+	st->r[4] = (get_unaligned_le32(&key[12]) >> 8) & 0x00fffff;
+
+	/* s = 5*r */
+	st->s[0] = st->r[1] * 5;
+	st->s[1] = st->r[2] * 5;
+	st->s[2] = st->r[3] * 5;
+	st->s[3] = st->r[4] * 5;
+
+	/* h = 0 */
+	st->h[0] = 0;
+	st->h[1] = 0;
+	st->h[2] = 0;
+	st->h[3] = 0;
+	st->h[4] = 0;
+}
+
+static void poly1305_blocks_generic(void *ctx, const u8 *input, size_t len,
+				    const u32 padbit)
+{
+	struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+	const u32 hibit = padbit << 24;
+	u32 r0, r1, r2, r3, r4;
+	u32 s1, s2, s3, s4;
+	u32 h0, h1, h2, h3, h4;
+	u64 d0, d1, d2, d3, d4;
+	u32 c;
+
+	r0 = st->r[0];
+	r1 = st->r[1];
+	r2 = st->r[2];
+	r3 = st->r[3];
+	r4 = st->r[4];
+
+	s1 = st->s[0];
+	s2 = st->s[1];
+	s3 = st->s[2];
+	s4 = st->s[3];
+
+	h0 = st->h[0];
+	h1 = st->h[1];
+	h2 = st->h[2];
+	h3 = st->h[3];
+	h4 = st->h[4];
+
+	while (len >= POLY1305_BLOCK_SIZE) {
+		/* h += m[i] */
+		h0 += (get_unaligned_le32(&input[0])) & 0x3ffffff;
+		h1 += (get_unaligned_le32(&input[3]) >> 2) & 0x3ffffff;
+		h2 += (get_unaligned_le32(&input[6]) >> 4) & 0x3ffffff;
+		h3 += (get_unaligned_le32(&input[9]) >> 6) & 0x3ffffff;
+		h4 += (get_unaligned_le32(&input[12]) >> 8) | hibit;
+
+		/* h *= r */
+		d0 = ((u64)h0 * r0) + ((u64)h1 * s4) +
+		     ((u64)h2 * s3) + ((u64)h3 * s2) +
+		     ((u64)h4 * s1);
+		d1 = ((u64)h0 * r1) + ((u64)h1 * r0) +
+		     ((u64)h2 * s4) + ((u64)h3 * s3) +
+		     ((u64)h4 * s2);
+		d2 = ((u64)h0 * r2) + ((u64)h1 * r1) +
+		     ((u64)h2 * r0) + ((u64)h3 * s4) +
+		     ((u64)h4 * s3);
+		d3 = ((u64)h0 * r3) + ((u64)h1 * r2) +
+		     ((u64)h2 * r1) + ((u64)h3 * r0) +
+		     ((u64)h4 * s4);
+		d4 = ((u64)h0 * r4) + ((u64)h1 * r3) +
+		     ((u64)h2 * r2) + ((u64)h3 * r1) +
+		     ((u64)h4 * r0);
+
+		/* (partial) h %= p */
+		c = (u32)(d0 >> 26);
+		h0 = (u32)d0 & 0x3ffffff;
+		d1 += c;
+		c = (u32)(d1 >> 26);
+		h1 = (u32)d1 & 0x3ffffff;
+		d2 += c;
+		c = (u32)(d2 >> 26);
+		h2 = (u32)d2 & 0x3ffffff;
+		d3 += c;
+		c = (u32)(d3 >> 26);
+		h3 = (u32)d3 & 0x3ffffff;
+		d4 += c;
+		c = (u32)(d4 >> 26);
+		h4 = (u32)d4 & 0x3ffffff;
+		h0 += c * 5;
+		c = (h0 >> 26);
+		h0 = h0 & 0x3ffffff;
+		h1 += c;
+
+		input += POLY1305_BLOCK_SIZE;
+		len -= POLY1305_BLOCK_SIZE;
+	}
+
+	st->h[0] = h0;
+	st->h[1] = h1;
+	st->h[2] = h2;
+	st->h[3] = h3;
+	st->h[4] = h4;
+}
+
+static void poly1305_emit_generic(void *ctx, u8 mac[16], const u32 nonce[4])
+{
+	struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+	u32 h0, h1, h2, h3, h4, c;
+	u32 g0, g1, g2, g3, g4;
+	u64 f;
+	u32 mask;
+
+	/* fully carry h */
+	h0 = st->h[0];
+	h1 = st->h[1];
+	h2 = st->h[2];
+	h3 = st->h[3];
+	h4 = st->h[4];
+
+	c = h1 >> 26;
+	h1 = h1 & 0x3ffffff;
+	h2 += c;
+	c = h2 >> 26;
+	h2 = h2 & 0x3ffffff;
+	h3 += c;
+	c = h3 >> 26;
+	h3 = h3 & 0x3ffffff;
+	h4 += c;
+	c = h4 >> 26;
+	h4 = h4 & 0x3ffffff;
+	h0 += c * 5;
+	c = h0 >> 26;
+	h0 = h0 & 0x3ffffff;
+	h1 += c;
+
+	/* compute h + -p */
+	g0 = h0 + 5;
+	c = g0 >> 26;
+	g0 &= 0x3ffffff;
+	g1 = h1 + c;
+	c = g1 >> 26;
+	g1 &= 0x3ffffff;
+	g2 = h2 + c;
+	c = g2 >> 26;
+	g2 &= 0x3ffffff;
+	g3 = h3 + c;
+	c = g3 >> 26;
+	g3 &= 0x3ffffff;
+	g4 = h4 + c - (1UL << 26);
+
+	/* select h if h < p, or h + -p if h >= p */
+	mask = (g4 >> ((sizeof(u32) * 8) - 1)) - 1;
+	g0 &= mask;
+	g1 &= mask;
+	g2 &= mask;
+	g3 &= mask;
+	g4 &= mask;
+	mask = ~mask;
+
+	h0 = (h0 & mask) | g0;
+	h1 = (h1 & mask) | g1;
+	h2 = (h2 & mask) | g2;
+	h3 = (h3 & mask) | g3;
+	h4 = (h4 & mask) | g4;
+
+	/* h = h % (2^128) */
+	h0 = ((h0) | (h1 << 26)) & 0xffffffff;
+	h1 = ((h1 >> 6) | (h2 << 20)) & 0xffffffff;
+	h2 = ((h2 >> 12) | (h3 << 14)) & 0xffffffff;
+	h3 = ((h3 >> 18) | (h4 << 8)) & 0xffffffff;
+
+	/* mac = (h + nonce) % (2^128) */
+	f = (u64)h0 + nonce[0];
+	h0 = (u32)f;
+	f = (u64)h1 + nonce[1] + (f >> 32);
+	h1 = (u32)f;
+	f = (u64)h2 + nonce[2] + (f >> 32);
+	h2 = (u32)f;
+	f = (u64)h3 + nonce[3] + (f >> 32);
+	h3 = (u32)f;
+
+	put_unaligned_le32(h0, &mac[0]);
+	put_unaligned_le32(h1, &mac[4]);
+	put_unaligned_le32(h2, &mac[8]);
+	put_unaligned_le32(h3, &mac[12]);
+}
diff --git a/lib/zinc/poly1305/poly1305-donna64.h b/lib/zinc/poly1305/poly1305-donna64.h
new file mode 100644
index 000000000000..de7ab1246024
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-donna64.h
@@ -0,0 +1,182 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is based in part on Andrew Moon's poly1305-donna, which is in the
+ * public domain.
+ */
+
+typedef __uint128_t u128;
+
+struct poly1305_internal {
+	u64 r[3];
+	u64 h[3];
+	u64 s[2];
+};
+
+static void poly1305_init_generic(void *ctx, const u8 key[16])
+{
+	struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+	u64 t0, t1;
+
+	/* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
+	t0 = get_unaligned_le64(&key[0]);
+	t1 = get_unaligned_le64(&key[8]);
+
+	st->r[0] = t0 & 0xffc0fffffff;
+	st->r[1] = ((t0 >> 44) | (t1 << 20)) & 0xfffffc0ffff;
+	st->r[2] = ((t1 >> 24)) & 0x00ffffffc0f;
+
+	/* s = 20*r */
+	st->s[0] = st->r[1] * 20;
+	st->s[1] = st->r[2] * 20;
+
+	/* h = 0 */
+	st->h[0] = 0;
+	st->h[1] = 0;
+	st->h[2] = 0;
+}
+
+static void poly1305_blocks_generic(void *ctx, const u8 *input, size_t len,
+				    const u32 padbit)
+{
+	struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+	const u64 hibit = ((u64)padbit) << 40;
+	u64 r0, r1, r2;
+	u64 s1, s2;
+	u64 h0, h1, h2;
+	u64 c;
+	u128 d0, d1, d2, d;
+
+	r0 = st->r[0];
+	r1 = st->r[1];
+	r2 = st->r[2];
+
+	h0 = st->h[0];
+	h1 = st->h[1];
+	h2 = st->h[2];
+
+	s1 = st->s[0];
+	s2 = st->s[1];
+
+	while (len >= POLY1305_BLOCK_SIZE) {
+		u64 t0, t1;
+
+		/* h += m[i] */
+		t0 = get_unaligned_le64(&input[0]);
+		t1 = get_unaligned_le64(&input[8]);
+
+		h0 += t0 & 0xfffffffffff;
+		h1 += ((t0 >> 44) | (t1 << 20)) & 0xfffffffffff;
+		h2 += (((t1 >> 24)) & 0x3ffffffffff) | hibit;
+
+		/* h *= r */
+		d0 = (u128)h0 * r0;
+		d = (u128)h1 * s2;
+		d0 += d;
+		d = (u128)h2 * s1;
+		d0 += d;
+		d1 = (u128)h0 * r1;
+		d = (u128)h1 * r0;
+		d1 += d;
+		d = (u128)h2 * s2;
+		d1 += d;
+		d2 = (u128)h0 * r2;
+		d = (u128)h1 * r1;
+		d2 += d;
+		d = (u128)h2 * r0;
+		d2 += d;
+
+		/* (partial) h %= p */
+		c = (u64)(d0 >> 44);
+		h0 = (u64)d0 & 0xfffffffffff;
+		d1 += c;
+		c = (u64)(d1 >> 44);
+		h1 = (u64)d1 & 0xfffffffffff;
+		d2 += c;
+		c = (u64)(d2 >> 42);
+		h2 = (u64)d2 & 0x3ffffffffff;
+		h0 += c * 5;
+		c = h0 >> 44;
+		h0 = h0 & 0xfffffffffff;
+		h1 += c;
+
+		input += POLY1305_BLOCK_SIZE;
+		len -= POLY1305_BLOCK_SIZE;
+	}
+
+	st->h[0] = h0;
+	st->h[1] = h1;
+	st->h[2] = h2;
+}
+
+static void poly1305_emit_generic(void *ctx, u8 mac[16], const u32 nonce[4])
+{
+	struct poly1305_internal *st = (struct poly1305_internal *)ctx;
+	u64 h0, h1, h2, c;
+	u64 g0, g1, g2;
+	u64 t0, t1;
+
+	/* fully carry h */
+	h0 = st->h[0];
+	h1 = st->h[1];
+	h2 = st->h[2];
+
+	c = h1 >> 44;
+	h1 &= 0xfffffffffff;
+	h2 += c;
+	c = h2 >> 42;
+	h2 &= 0x3ffffffffff;
+	h0 += c * 5;
+	c = h0 >> 44;
+	h0 &= 0xfffffffffff;
+	h1 += c;
+	c = h1 >> 44;
+	h1 &= 0xfffffffffff;
+	h2 += c;
+	c = h2 >> 42;
+	h2 &= 0x3ffffffffff;
+	h0 += c * 5;
+	c = h0 >> 44;
+	h0 &= 0xfffffffffff;
+	h1 += c;
+
+	/* compute h + -p */
+	g0 = h0 + 5;
+	c  = g0 >> 44;
+	g0 &= 0xfffffffffff;
+	g1 = h1 + c;
+	c  = g1 >> 44;
+	g1 &= 0xfffffffffff;
+	g2 = h2 + c - (1ULL << 42);
+
+	/* select h if h < p, or h + -p if h >= p */
+	c = (g2 >> ((sizeof(u64) * 8) - 1)) - 1;
+	g0 &= c;
+	g1 &= c;
+	g2 &= c;
+	c  = ~c;
+	h0 = (h0 & c) | g0;
+	h1 = (h1 & c) | g1;
+	h2 = (h2 & c) | g2;
+
+	/* h = (h + nonce) */
+	t0 = ((u64)nonce[1] << 32) | nonce[0];
+	t1 = ((u64)nonce[3] << 32) | nonce[2];
+
+	h0 += t0 & 0xfffffffffff;
+	c = h0 >> 44;
+	h0 &= 0xfffffffffff;
+	h1 += (((t0 >> 44) | (t1 << 20)) & 0xfffffffffff) + c;
+	c = h1 >> 44;
+	h1 &= 0xfffffffffff;
+	h2 += (((t1 >> 24)) & 0x3ffffffffff) + c;
+	h2 &= 0x3ffffffffff;
+
+	/* mac = h % (2^128) */
+	h0 = h0 | (h1 << 44);
+	h1 = (h1 >> 20) | (h2 << 24);
+
+	put_unaligned_le64(h0, &mac[0]);
+	put_unaligned_le64(h1, &mac[8]);
+}
diff --git a/lib/zinc/poly1305/poly1305.c b/lib/zinc/poly1305/poly1305.c
new file mode 100644
index 000000000000..9a71ac1d4e39
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305.c
@@ -0,0 +1,131 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * Implementation of the Poly1305 message authenticator.
+ *
+ * Information: https://cr.yp.to/mac.html
+ */
+
+#include <zinc/poly1305.h>
+
+#include <asm/unaligned.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+
+#ifndef HAVE_POLY1305_ARCH_IMPLEMENTATION
+static inline bool poly1305_init_arch(void *ctx,
+				      const u8 key[POLY1305_KEY_SIZE],
+				      simd_context_t simd_context)
+{
+	return false;
+}
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *input,
+					const size_t len, const u32 padbit,
+					simd_context_t simd_context)
+{
+	return false;
+}
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+				      const u32 nonce[4],
+				      simd_context_t simd_context)
+{
+	return false;
+}
+void __init poly1305_fpu_init(void)
+{
+}
+#endif
+
+#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__)
+#include "poly1305-donna64.h"
+#else
+#include "poly1305-donna32.h"
+#endif
+
+void poly1305_init(struct poly1305_ctx *ctx, const u8 key[POLY1305_KEY_SIZE],
+		   simd_context_t simd_context)
+{
+	ctx->nonce[0] = get_unaligned_le32(&key[16]);
+	ctx->nonce[1] = get_unaligned_le32(&key[20]);
+	ctx->nonce[2] = get_unaligned_le32(&key[24]);
+	ctx->nonce[3] = get_unaligned_le32(&key[28]);
+
+	if (!poly1305_init_arch(ctx->opaque, key, simd_context))
+		poly1305_init_generic(ctx->opaque, key);
+
+	ctx->num = 0;
+}
+EXPORT_SYMBOL(poly1305_init);
+
+static inline void poly1305_blocks(void *ctx, const u8 *input, const size_t len,
+				   const u32 padbit,
+				   simd_context_t simd_context)
+{
+	if (!poly1305_blocks_arch(ctx, input, len, padbit, simd_context))
+		poly1305_blocks_generic(ctx, input, len, padbit);
+}
+
+static inline void poly1305_emit(void *ctx, u8 mac[POLY1305_KEY_SIZE],
+				 const u32 nonce[4],
+				 simd_context_t simd_context)
+{
+	if (!poly1305_emit_arch(ctx, mac, nonce, simd_context))
+		poly1305_emit_generic(ctx, mac, nonce);
+}
+
+void poly1305_update(struct poly1305_ctx *ctx, const u8 *input, size_t len,
+		     simd_context_t simd_context)
+{
+	const size_t num = ctx->num % POLY1305_BLOCK_SIZE;
+	size_t rem;
+
+	if (num) {
+		rem = POLY1305_BLOCK_SIZE - num;
+		if (len < rem) {
+			memcpy(ctx->data + num, input, len);
+			ctx->num = num + len;
+			return;
+		}
+		memcpy(ctx->data + num, input, rem);
+		poly1305_blocks(ctx->opaque, ctx->data, POLY1305_BLOCK_SIZE, 1,
+				simd_context);
+		input += rem;
+		len -= rem;
+	}
+
+	rem = len % POLY1305_BLOCK_SIZE;
+	len -= rem;
+
+	if (len >= POLY1305_BLOCK_SIZE) {
+		poly1305_blocks(ctx->opaque, input, len, 1, simd_context);
+		input += len;
+	}
+
+	if (rem)
+		memcpy(ctx->data, input, rem);
+
+	ctx->num = rem;
+}
+EXPORT_SYMBOL(poly1305_update);
+
+void poly1305_final(struct poly1305_ctx *ctx, u8 mac[POLY1305_MAC_SIZE],
+		    simd_context_t simd_context)
+{
+	size_t num = ctx->num % POLY1305_BLOCK_SIZE;
+
+	if (num) {
+		ctx->data[num++] = 1;
+		while (num < POLY1305_BLOCK_SIZE)
+			ctx->data[num++] = 0;
+		poly1305_blocks(ctx->opaque, ctx->data, POLY1305_BLOCK_SIZE, 0,
+				simd_context);
+	}
+
+	poly1305_emit(ctx->opaque, mac, ctx->nonce, simd_context);
+
+	memzero_explicit(ctx, sizeof(*ctx));
+}
+EXPORT_SYMBOL(poly1305_final);
+
+#include "../selftest/poly1305.h"
diff --git a/lib/zinc/selftest/poly1305.h b/lib/zinc/selftest/poly1305.h
new file mode 100644
index 000000000000..8138a9399aed
--- /dev/null
+++ b/lib/zinc/selftest/poly1305.h
@@ -0,0 +1,876 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+struct poly1305_testvec {
+	u8 input[600];
+	u8 output[POLY1305_MAC_SIZE];
+	u8 key[POLY1305_KEY_SIZE];
+	size_t ilen;
+};
+
+static const struct poly1305_testvec poly1305_testvecs[] __initconst = {
+{ /* RFC7539 */
+	.input	= { 0x43, 0x72, 0x79, 0x70, 0x74, 0x6f, 0x67, 0x72,
+		    0x61, 0x70, 0x68, 0x69, 0x63, 0x20, 0x46, 0x6f,
+		    0x72, 0x75, 0x6d, 0x20, 0x52, 0x65, 0x73, 0x65,
+		    0x61, 0x72, 0x63, 0x68, 0x20, 0x47, 0x72, 0x6f,
+		    0x75, 0x70 },
+	.ilen	= 34,
+	.output	= { 0xa8, 0x06, 0x1d, 0xc1, 0x30, 0x51, 0x36, 0xc6,
+		    0xc2, 0x2b, 0x8b, 0xaf, 0x0c, 0x01, 0x27, 0xa9 },
+	.key	= { 0x85, 0xd6, 0xbe, 0x78, 0x57, 0x55, 0x6d, 0x33,
+		    0x7f, 0x44, 0x52, 0xfe, 0x42, 0xd5, 0x06, 0xa8,
+		    0x01, 0x03, 0x80, 0x8a, 0xfb, 0x0d, 0xb2, 0xfd,
+		    0x4a, 0xbf, 0xf6, 0xaf, 0x41, 0x49, 0xf5, 0x1b },
+}, { /* "The Poly1305-AES message-authentication code" */
+	.input	= { 0xf3, 0xf6 },
+	.ilen	= 2,
+	.output	= { 0xf4, 0xc6, 0x33, 0xc3, 0x04, 0x4f, 0xc1, 0x45,
+		    0xf8, 0x4f, 0x33, 0x5c, 0xb8, 0x19, 0x53, 0xde },
+	.key	= { 0x85, 0x1f, 0xc4, 0x0c, 0x34, 0x67, 0xac, 0x0b,
+		    0xe0, 0x5c, 0xc2, 0x04, 0x04, 0xf3, 0xf7, 0x00,
+		    0x58, 0x0b, 0x3b, 0x0f, 0x94, 0x47, 0xbb, 0x1e,
+		    0x69, 0xd0, 0x95, 0xb5, 0x92, 0x8b, 0x6d, 0xbc },
+}, {
+	.input	= "",
+	.ilen	= 0,
+	.output	= { 0xdd, 0x3f, 0xab, 0x22, 0x51, 0xf1, 0x1a, 0xc7,
+		    0x59, 0xf0, 0x88, 0x71, 0x29, 0xcc, 0x2e, 0xe7 },
+	.key	= { 0xa0, 0xf3, 0x08, 0x00, 0x00, 0xf4, 0x64, 0x00,
+		    0xd0, 0xc7, 0xe9, 0x07, 0x6c, 0x83, 0x44, 0x03,
+		    0xdd, 0x3f, 0xab, 0x22, 0x51, 0xf1, 0x1a, 0xc7,
+		    0x59, 0xf0, 0x88, 0x71, 0x29, 0xcc, 0x2e, 0xe7 },
+}, {
+	.input	= { 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36 },
+	.ilen	= 32,
+	.output	= { 0x0e, 0xe1, 0xc1, 0x6b, 0xb7, 0x3f, 0x0f, 0x4f,
+		    0xd1, 0x98, 0x81, 0x75, 0x3c, 0x01, 0xcd, 0xbe },
+	.key	= { 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9 },
+	.ilen	= 63,
+	.output	= { 0x51, 0x54, 0xad, 0x0d, 0x2c, 0xb2, 0x6e, 0x01,
+		    0x27, 0x4f, 0xc5, 0x11, 0x48, 0x49, 0x1f, 0x1b },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, { /* self-generated vectors exercise "significant" lengths, such that they
+      * are handled by different code paths */
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf },
+	.ilen	= 64,
+	.output	= { 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+		    0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67 },
+	.ilen	= 48,
+	.output	= { 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+		    0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36 },
+	.ilen	= 96,
+	.output	= { 0xbb, 0xb6, 0x13, 0xb2, 0xb6, 0xd7, 0x53, 0xba,
+		    0x07, 0x39, 0x5b, 0x91, 0x6a, 0xae, 0xce, 0x15 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24 },
+	.ilen	= 112,
+	.output	= { 0xc7, 0x94, 0xd7, 0x05, 0x7d, 0x17, 0x78, 0xc4,
+		    0xbb, 0xee, 0x0a, 0x39, 0xb3, 0xd9, 0x73, 0x42 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36 },
+	.ilen	= 128,
+	.output	= { 0xff, 0xbc, 0xb9, 0xb3, 0x71, 0x42, 0x31, 0x52,
+		    0xd7, 0xfc, 0xa5, 0xad, 0x04, 0x2f, 0xba, 0xa9 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+		    0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+		    0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66 },
+	.ilen	= 144,
+	.output	= { 0x06, 0x9e, 0xd6, 0xb8, 0xef, 0x0f, 0x20, 0x7b,
+		    0x3e, 0x24, 0x3b, 0xb1, 0x01, 0x9f, 0xe6, 0x32 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+		    0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+		    0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66,
+		    0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+		    0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61 },
+	.ilen	= 160,
+	.output	= { 0xcc, 0xa3, 0x39, 0xd9, 0xa4, 0x5f, 0xa2, 0x36,
+		    0x8c, 0x2c, 0x68, 0xb3, 0xa4, 0x17, 0x91, 0x33 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+		    0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+		    0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66,
+		    0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+		    0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61,
+		    0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36 },
+	.ilen	= 288,
+	.output	= { 0x53, 0xf6, 0xe8, 0x28, 0xa2, 0xf0, 0xfe, 0x0e,
+		    0xe8, 0x15, 0xbf, 0x0b, 0xd5, 0x84, 0x1a, 0x34 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, {
+	.input	= { 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+		    0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+		    0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66,
+		    0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+		    0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61,
+		    0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34,
+		    0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1,
+		    0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81,
+		    0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0,
+		    0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2,
+		    0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67,
+		    0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61,
+		    0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf,
+		    0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09,
+		    0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08,
+		    0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88,
+		    0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef,
+		    0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8,
+		    0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24,
+		    0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb,
+		    0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36,
+		    0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37,
+		    0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66,
+		    0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2,
+		    0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61 },
+	.ilen	= 320,
+	.output	= { 0xb8, 0x46, 0xd4, 0x4e, 0x9b, 0xbd, 0x53, 0xce,
+		    0xdf, 0xfb, 0xfb, 0xb6, 0xb7, 0xfa, 0x49, 0x33 },
+	.key	= { 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c,
+		    0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07,
+		    0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1,
+		    0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 },
+}, { /* 4th power of the key spills to 131th bit in SIMD key setup */
+	.input	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.ilen	= 256,
+	.output	= { 0x07, 0x14, 0x5a, 0x4c, 0x02, 0xfe, 0x5f, 0xa3,
+		    0x20, 0x36, 0xde, 0x68, 0xfa, 0xbe, 0x90, 0x66 },
+	.key	= { 0xad, 0x62, 0x81, 0x07, 0xe8, 0x35, 0x1d, 0x0f,
+		    0x2c, 0x23, 0x1a, 0x05, 0xdc, 0x4a, 0x41, 0x06,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+}, { /* OpenSSL's poly1305_ieee754.c failed this in final stage */
+	.input	= { 0x84, 0x23, 0x64, 0xe1, 0x56, 0x33, 0x6c, 0x09,
+		    0x98, 0xb9, 0x33, 0xa6, 0x23, 0x77, 0x26, 0x18,
+		    0x0d, 0x9e, 0x3f, 0xdc, 0xbd, 0xe4, 0xcd, 0x5d,
+		    0x17, 0x08, 0x0f, 0xc3, 0xbe, 0xb4, 0x96, 0x14,
+		    0xd7, 0x12, 0x2c, 0x03, 0x74, 0x63, 0xff, 0x10,
+		    0x4d, 0x73, 0xf1, 0x9c, 0x12, 0x70, 0x46, 0x28,
+		    0xd4, 0x17, 0xc4, 0xc5, 0x4a, 0x3f, 0xe3, 0x0d,
+		    0x3c, 0x3d, 0x77, 0x14, 0x38, 0x2d, 0x43, 0xb0,
+		    0x38, 0x2a, 0x50, 0xa5, 0xde, 0xe5, 0x4b, 0xe8,
+		    0x44, 0xb0, 0x76, 0xe8, 0xdf, 0x88, 0x20, 0x1a,
+		    0x1c, 0xd4, 0x3b, 0x90, 0xeb, 0x21, 0x64, 0x3f,
+		    0xa9, 0x6f, 0x39, 0xb5, 0x18, 0xaa, 0x83, 0x40,
+		    0xc9, 0x42, 0xff, 0x3c, 0x31, 0xba, 0xf7, 0xc9,
+		    0xbd, 0xbf, 0x0f, 0x31, 0xae, 0x3f, 0xa0, 0x96,
+		    0xbf, 0x8c, 0x63, 0x03, 0x06, 0x09, 0x82, 0x9f,
+		    0xe7, 0x2e, 0x17, 0x98, 0x24, 0x89, 0x0b, 0xc8,
+		    0xe0, 0x8c, 0x31, 0x5c, 0x1c, 0xce, 0x2a, 0x83,
+		    0x14, 0x4d, 0xbb, 0xff, 0x09, 0xf7, 0x4e, 0x3e,
+		    0xfc, 0x77, 0x0b, 0x54, 0xd0, 0x98, 0x4a, 0x8f,
+		    0x19, 0xb1, 0x47, 0x19, 0xe6, 0x36, 0x35, 0x64,
+		    0x1d, 0x6b, 0x1e, 0xed, 0xf6, 0x3e, 0xfb, 0xf0,
+		    0x80, 0xe1, 0x78, 0x3d, 0x32, 0x44, 0x54, 0x12,
+		    0x11, 0x4c, 0x20, 0xde, 0x0b, 0x83, 0x7a, 0x0d,
+		    0xfa, 0x33, 0xd6, 0xb8, 0x28, 0x25, 0xff, 0xf4,
+		    0x4c, 0x9a, 0x70, 0xea, 0x54, 0xce, 0x47, 0xf0,
+		    0x7d, 0xf6, 0x98, 0xe6, 0xb0, 0x33, 0x23, 0xb5,
+		    0x30, 0x79, 0x36, 0x4a, 0x5f, 0xc3, 0xe9, 0xdd,
+		    0x03, 0x43, 0x92, 0xbd, 0xde, 0x86, 0xdc, 0xcd,
+		    0xda, 0x94, 0x32, 0x1c, 0x5e, 0x44, 0x06, 0x04,
+		    0x89, 0x33, 0x6c, 0xb6, 0x5b, 0xf3, 0x98, 0x9c,
+		    0x36, 0xf7, 0x28, 0x2c, 0x2f, 0x5d, 0x2b, 0x88,
+		    0x2c, 0x17, 0x1e, 0x74 },
+	.ilen	= 252,
+	.output	= { 0xf2, 0x48, 0x31, 0x2e, 0x57, 0x8d, 0x9d, 0x58,
+		    0xf8, 0xb7, 0xbb, 0x4d, 0x19, 0x10, 0x54, 0x31 },
+	.key	= { 0x95, 0xd5, 0xc0, 0x05, 0x50, 0x3e, 0x51, 0x0d,
+		    0x8c, 0xd0, 0xaa, 0x07, 0x2c, 0x4a, 0x4d, 0x06,
+		    0x6e, 0xab, 0xc5, 0x2d, 0x11, 0x65, 0x3d, 0xf4,
+		    0x7f, 0xbf, 0x63, 0xab, 0x19, 0x8b, 0xcc, 0x26 },
+}, { /* AVX2 in OpenSSL's poly1305-x86.pl failed this with 176+32 split */
+	.input	= { 0x24, 0x8a, 0xc3, 0x10, 0x85, 0xb6, 0xc2, 0xad,
+		    0xaa, 0xa3, 0x82, 0x59, 0xa0, 0xd7, 0x19, 0x2c,
+		    0x5c, 0x35, 0xd1, 0xbb, 0x4e, 0xf3, 0x9a, 0xd9,
+		    0x4c, 0x38, 0xd1, 0xc8, 0x24, 0x79, 0xe2, 0xdd,
+		    0x21, 0x59, 0xa0, 0x77, 0x02, 0x4b, 0x05, 0x89,
+		    0xbc, 0x8a, 0x20, 0x10, 0x1b, 0x50, 0x6f, 0x0a,
+		    0x1a, 0xd0, 0xbb, 0xab, 0x76, 0xe8, 0x3a, 0x83,
+		    0xf1, 0xb9, 0x4b, 0xe6, 0xbe, 0xae, 0x74, 0xe8,
+		    0x74, 0xca, 0xb6, 0x92, 0xc5, 0x96, 0x3a, 0x75,
+		    0x43, 0x6b, 0x77, 0x61, 0x21, 0xec, 0x9f, 0x62,
+		    0x39, 0x9a, 0x3e, 0x66, 0xb2, 0xd2, 0x27, 0x07,
+		    0xda, 0xe8, 0x19, 0x33, 0xb6, 0x27, 0x7f, 0x3c,
+		    0x85, 0x16, 0xbc, 0xbe, 0x26, 0xdb, 0xbd, 0x86,
+		    0xf3, 0x73, 0x10, 0x3d, 0x7c, 0xf4, 0xca, 0xd1,
+		    0x88, 0x8c, 0x95, 0x21, 0x18, 0xfb, 0xfb, 0xd0,
+		    0xd7, 0xb4, 0xbe, 0xdc, 0x4a, 0xe4, 0x93, 0x6a,
+		    0xff, 0x91, 0x15, 0x7e, 0x7a, 0xa4, 0x7c, 0x54,
+		    0x44, 0x2e, 0xa7, 0x8d, 0x6a, 0xc2, 0x51, 0xd3,
+		    0x24, 0xa0, 0xfb, 0xe4, 0x9d, 0x89, 0xcc, 0x35,
+		    0x21, 0xb6, 0x6d, 0x16, 0xe9, 0xc6, 0x6a, 0x37,
+		    0x09, 0x89, 0x4e, 0x4e, 0xb0, 0xa4, 0xee, 0xdc,
+		    0x4a, 0xe1, 0x94, 0x68, 0xe6, 0x6b, 0x81, 0xf2,
+		    0x71, 0x35, 0x1b, 0x1d, 0x92, 0x1e, 0xa5, 0x51,
+		    0x04, 0x7a, 0xbc, 0xc6, 0xb8, 0x7a, 0x90, 0x1f,
+		    0xde, 0x7d, 0xb7, 0x9f, 0xa1, 0x81, 0x8c, 0x11,
+		    0x33, 0x6d, 0xbc, 0x07, 0x24, 0x4a, 0x40, 0xeb },
+	.ilen	= 208,
+	.output	= { 0xbc, 0x93, 0x9b, 0xc5, 0x28, 0x14, 0x80, 0xfa,
+		    0x99, 0xc6, 0xd6, 0x8c, 0x25, 0x8e, 0xc4, 0x2f },
+	.key	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+}, { /* test vectors from Google */
+	.input	= "",
+	.ilen	= 0,
+	.output	= { 0x47, 0x10, 0x13, 0x0e, 0x9f, 0x6f, 0xea, 0x8d,
+		    0x72, 0x29, 0x38, 0x50, 0xa6, 0x67, 0xd8, 0x6c },
+	.key	= { 0xc8, 0xaf, 0xaa, 0xc3, 0x31, 0xee, 0x37, 0x2c,
+		    0xd6, 0x08, 0x2d, 0xe1, 0x34, 0x94, 0x3b, 0x17,
+		    0x47, 0x10, 0x13, 0x0e, 0x9f, 0x6f, 0xea, 0x8d,
+		    0x72, 0x29, 0x38, 0x50, 0xa6, 0x67, 0xd8, 0x6c },
+}, {
+	.input	= { 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f,
+		    0x72, 0x6c, 0x64, 0x21 },
+	.ilen	= 12,
+	.output	= { 0xa6, 0xf7, 0x45, 0x00, 0x8f, 0x81, 0xc9, 0x16,
+		    0xa2, 0x0d, 0xcc, 0x74, 0xee, 0xf2, 0xb2, 0xf0 },
+	.key	= { 0x74, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20,
+		    0x33, 0x32, 0x2d, 0x62, 0x79, 0x74, 0x65, 0x20,
+		    0x6b, 0x65, 0x79, 0x20, 0x66, 0x6f, 0x72, 0x20,
+		    0x50, 0x6f, 0x6c, 0x79, 0x31, 0x33, 0x30, 0x35 },
+}, {
+	.input	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.ilen	= 32,
+	.output	= { 0x49, 0xec, 0x78, 0x09, 0x0e, 0x48, 0x1e, 0xc6,
+		    0xc2, 0x6b, 0x33, 0xb9, 0x1c, 0xcc, 0x03, 0x07 },
+	.key	= { 0x74, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20,
+		    0x33, 0x32, 0x2d, 0x62, 0x79, 0x74, 0x65, 0x20,
+		    0x6b, 0x65, 0x79, 0x20, 0x66, 0x6f, 0x72, 0x20,
+		    0x50, 0x6f, 0x6c, 0x79, 0x31, 0x33, 0x30, 0x35 },
+}, {
+	.input	= { 0x89, 0xda, 0xb8, 0x0b, 0x77, 0x17, 0xc1, 0xdb,
+		    0x5d, 0xb4, 0x37, 0x86, 0x0a, 0x3f, 0x70, 0x21,
+		    0x8e, 0x93, 0xe1, 0xb8, 0xf4, 0x61, 0xfb, 0x67,
+		    0x7f, 0x16, 0xf3, 0x5f, 0x6f, 0x87, 0xe2, 0xa9,
+		    0x1c, 0x99, 0xbc, 0x3a, 0x47, 0xac, 0xe4, 0x76,
+		    0x40, 0xcc, 0x95, 0xc3, 0x45, 0xbe, 0x5e, 0xcc,
+		    0xa5, 0xa3, 0x52, 0x3c, 0x35, 0xcc, 0x01, 0x89,
+		    0x3a, 0xf0, 0xb6, 0x4a, 0x62, 0x03, 0x34, 0x27,
+		    0x03, 0x72, 0xec, 0x12, 0x48, 0x2d, 0x1b, 0x1e,
+		    0x36, 0x35, 0x61, 0x69, 0x8a, 0x57, 0x8b, 0x35,
+		    0x98, 0x03, 0x49, 0x5b, 0xb4, 0xe2, 0xef, 0x19,
+		    0x30, 0xb1, 0x7a, 0x51, 0x90, 0xb5, 0x80, 0xf1,
+		    0x41, 0x30, 0x0d, 0xf3, 0x0a, 0xdb, 0xec, 0xa2,
+		    0x8f, 0x64, 0x27, 0xa8, 0xbc, 0x1a, 0x99, 0x9f,
+		    0xd5, 0x1c, 0x55, 0x4a, 0x01, 0x7d, 0x09, 0x5d,
+		    0x8c, 0x3e, 0x31, 0x27, 0xda, 0xf9, 0xf5, 0x95 },
+	.ilen	= 128,
+	.output	= { 0xc8, 0x5d, 0x15, 0xed, 0x44, 0xc3, 0x78, 0xd6,
+		    0xb0, 0x0e, 0x23, 0x06, 0x4c, 0x7b, 0xcd, 0x51 },
+	.key	= { 0x2d, 0x77, 0x3b, 0xe3, 0x7a, 0xdb, 0x1e, 0x4d,
+		    0x68, 0x3b, 0xf0, 0x07, 0x5e, 0x79, 0xc4, 0xee,
+		    0x03, 0x79, 0x18, 0x53, 0x5a, 0x7f, 0x99, 0xcc,
+		    0xb7, 0x04, 0x0f, 0xb5, 0xf5, 0xf4, 0x3a, 0xea },
+}, {
+	.input	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0b,
+		    0x17, 0x03, 0x03, 0x02, 0x00, 0x00, 0x00, 0x00,
+		    0x06, 0xdb, 0x1f, 0x1f, 0x36, 0x8d, 0x69, 0x6a,
+		    0x81, 0x0a, 0x34, 0x9c, 0x0c, 0x71, 0x4c, 0x9a,
+		    0x5e, 0x78, 0x50, 0xc2, 0x40, 0x7d, 0x72, 0x1a,
+		    0xcd, 0xed, 0x95, 0xe0, 0x18, 0xd7, 0xa8, 0x52,
+		    0x66, 0xa6, 0xe1, 0x28, 0x9c, 0xdb, 0x4a, 0xeb,
+		    0x18, 0xda, 0x5a, 0xc8, 0xa2, 0xb0, 0x02, 0x6d,
+		    0x24, 0xa5, 0x9a, 0xd4, 0x85, 0x22, 0x7f, 0x3e,
+		    0xae, 0xdb, 0xb2, 0xe7, 0xe3, 0x5e, 0x1c, 0x66,
+		    0xcd, 0x60, 0xf9, 0xab, 0xf7, 0x16, 0xdc, 0xc9,
+		    0xac, 0x42, 0x68, 0x2d, 0xd7, 0xda, 0xb2, 0x87,
+		    0xa7, 0x02, 0x4c, 0x4e, 0xef, 0xc3, 0x21, 0xcc,
+		    0x05, 0x74, 0xe1, 0x67, 0x93, 0xe3, 0x7c, 0xec,
+		    0x03, 0xc5, 0xbd, 0xa4, 0x2b, 0x54, 0xc1, 0x14,
+		    0xa8, 0x0b, 0x57, 0xaf, 0x26, 0x41, 0x6c, 0x7b,
+		    0xe7, 0x42, 0x00, 0x5e, 0x20, 0x85, 0x5c, 0x73,
+		    0xe2, 0x1d, 0xc8, 0xe2, 0xed, 0xc9, 0xd4, 0x35,
+		    0xcb, 0x6f, 0x60, 0x59, 0x28, 0x00, 0x11, 0xc2,
+		    0x70, 0xb7, 0x15, 0x70, 0x05, 0x1c, 0x1c, 0x9b,
+		    0x30, 0x52, 0x12, 0x66, 0x20, 0xbc, 0x1e, 0x27,
+		    0x30, 0xfa, 0x06, 0x6c, 0x7a, 0x50, 0x9d, 0x53,
+		    0xc6, 0x0e, 0x5a, 0xe1, 0xb4, 0x0a, 0xa6, 0xe3,
+		    0x9e, 0x49, 0x66, 0x92, 0x28, 0xc9, 0x0e, 0xec,
+		    0xb4, 0xa5, 0x0d, 0xb3, 0x2a, 0x50, 0xbc, 0x49,
+		    0xe9, 0x0b, 0x4f, 0x4b, 0x35, 0x9a, 0x1d, 0xfd,
+		    0x11, 0x74, 0x9c, 0xd3, 0x86, 0x7f, 0xcf, 0x2f,
+		    0xb7, 0xbb, 0x6c, 0xd4, 0x73, 0x8f, 0x6a, 0x4a,
+		    0xd6, 0xf7, 0xca, 0x50, 0x58, 0xf7, 0x61, 0x88,
+		    0x45, 0xaf, 0x9f, 0x02, 0x0f, 0x6c, 0x3b, 0x96,
+		    0x7b, 0x8f, 0x4c, 0xd4, 0xa9, 0x1e, 0x28, 0x13,
+		    0xb5, 0x07, 0xae, 0x66, 0xf2, 0xd3, 0x5c, 0x18,
+		    0x28, 0x4f, 0x72, 0x92, 0x18, 0x60, 0x62, 0xe1,
+		    0x0f, 0xd5, 0x51, 0x0d, 0x18, 0x77, 0x53, 0x51,
+		    0xef, 0x33, 0x4e, 0x76, 0x34, 0xab, 0x47, 0x43,
+		    0xf5, 0xb6, 0x8f, 0x49, 0xad, 0xca, 0xb3, 0x84,
+		    0xd3, 0xfd, 0x75, 0xf7, 0x39, 0x0f, 0x40, 0x06,
+		    0xef, 0x2a, 0x29, 0x5c, 0x8c, 0x7a, 0x07, 0x6a,
+		    0xd5, 0x45, 0x46, 0xcd, 0x25, 0xd2, 0x10, 0x7f,
+		    0xbe, 0x14, 0x36, 0xc8, 0x40, 0x92, 0x4a, 0xae,
+		    0xbe, 0x5b, 0x37, 0x08, 0x93, 0xcd, 0x63, 0xd1,
+		    0x32, 0x5b, 0x86, 0x16, 0xfc, 0x48, 0x10, 0x88,
+		    0x6b, 0xc1, 0x52, 0xc5, 0x32, 0x21, 0xb6, 0xdf,
+		    0x37, 0x31, 0x19, 0x39, 0x32, 0x55, 0xee, 0x72,
+		    0xbc, 0xaa, 0x88, 0x01, 0x74, 0xf1, 0x71, 0x7f,
+		    0x91, 0x84, 0xfa, 0x91, 0x64, 0x6f, 0x17, 0xa2,
+		    0x4a, 0xc5, 0x5d, 0x16, 0xbf, 0xdd, 0xca, 0x95,
+		    0x81, 0xa9, 0x2e, 0xda, 0x47, 0x92, 0x01, 0xf0,
+		    0xed, 0xbf, 0x63, 0x36, 0x00, 0xd6, 0x06, 0x6d,
+		    0x1a, 0xb3, 0x6d, 0x5d, 0x24, 0x15, 0xd7, 0x13,
+		    0x51, 0xbb, 0xcd, 0x60, 0x8a, 0x25, 0x10, 0x8d,
+		    0x25, 0x64, 0x19, 0x92, 0xc1, 0xf2, 0x6c, 0x53,
+		    0x1c, 0xf9, 0xf9, 0x02, 0x03, 0xbc, 0x4c, 0xc1,
+		    0x9f, 0x59, 0x27, 0xd8, 0x34, 0xb0, 0xa4, 0x71,
+		    0x16, 0xd3, 0x88, 0x4b, 0xbb, 0x16, 0x4b, 0x8e,
+		    0xc8, 0x83, 0xd1, 0xac, 0x83, 0x2e, 0x56, 0xb3,
+		    0x91, 0x8a, 0x98, 0x60, 0x1a, 0x08, 0xd1, 0x71,
+		    0x88, 0x15, 0x41, 0xd5, 0x94, 0xdb, 0x39, 0x9c,
+		    0x6a, 0xe6, 0x15, 0x12, 0x21, 0x74, 0x5a, 0xec,
+		    0x81, 0x4c, 0x45, 0xb0, 0xb0, 0x5b, 0x56, 0x54,
+		    0x36, 0xfd, 0x6f, 0x13, 0x7a, 0xa1, 0x0a, 0x0c,
+		    0x0b, 0x64, 0x37, 0x61, 0xdb, 0xd6, 0xf9, 0xa9,
+		    0xdc, 0xb9, 0x9b, 0x1a, 0x6e, 0x69, 0x08, 0x54,
+		    0xce, 0x07, 0x69, 0xcd, 0xe3, 0x97, 0x61, 0xd8,
+		    0x2f, 0xcd, 0xec, 0x15, 0xf0, 0xd9, 0x2d, 0x7d,
+		    0x8e, 0x94, 0xad, 0xe8, 0xeb, 0x83, 0xfb, 0xe0 },
+	.ilen	= 528,
+	.output	= { 0x26, 0x37, 0x40, 0x8f, 0xe1, 0x30, 0x86, 0xea,
+		    0x73, 0xf9, 0x71, 0xe3, 0x42, 0x5e, 0x28, 0x20 },
+	.key	= { 0x99, 0xe5, 0x82, 0x2d, 0xd4, 0x17, 0x3c, 0x99,
+		    0x5e, 0x3d, 0xae, 0x0d, 0xde, 0xfb, 0x97, 0x74,
+		    0x3f, 0xde, 0x3b, 0x08, 0x01, 0x34, 0xb3, 0x9f,
+		    0x76, 0xe9, 0xbf, 0x8d, 0x0e, 0x88, 0xd5, 0x46 },
+}, { /* test vectors from Hanno Böck */
+	.input	= { 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0x80, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xce, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xc5,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xe3, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xac, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xe6,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0x00, 0x00, 0x00,
+		    0xaf, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc,
+		    0xcc, 0xcc, 0xff, 0xff, 0xff, 0xf5, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0xff, 0xff, 0xff, 0xe7, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x71, 0x92, 0x05, 0xa8, 0x52, 0x1d,
+		    0xfc },
+	.ilen	= 257,
+	.output	= { 0x85, 0x59, 0xb8, 0x76, 0xec, 0xee, 0xd6, 0x6e,
+		    0xb3, 0x77, 0x98, 0xc0, 0x45, 0x7b, 0xaf, 0xf9 },
+	.key	= { 0x7f, 0x1b, 0x02, 0x64, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc },
+}, {
+	.input	= { 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa,
+		    0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa,
+		    0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa,
+		    0xaa, 0xaa, 0xaa, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x80, 0x02, 0x64 },
+	.ilen	= 39,
+	.output	= { 0x00, 0xbd, 0x12, 0x58, 0x97, 0x8e, 0x20, 0x54,
+		    0x44, 0xc9, 0xaa, 0xaa, 0x82, 0x00, 0x6f, 0xed },
+	.key	= { 0xe0, 0x00, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa,
+		    0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa },
+}, {
+	.input	= { 0x02, 0xfc },
+	.ilen	= 2,
+	.output	= { 0x06, 0x12, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+		    0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c },
+	.key	= { 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+		    0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+		    0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c,
+		    0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c },
+}, {
+	.input	= { 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7a, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x5c, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x6e, 0x7b, 0x00, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7a, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x5c,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b,
+		    0x7b, 0x6e, 0x7b, 0x00, 0x13, 0x00, 0x00, 0x00,
+		    0x00, 0xb3, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0xf2, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x20, 0x00, 0xef, 0xff, 0x00,
+		    0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00,
+		    0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x64, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x13, 0x00, 0x00, 0x00, 0x00,
+		    0xb3, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf2,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x20, 0x00, 0xef, 0xff, 0x00, 0x09,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x7a, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00,
+		    0x00, 0x09, 0x00, 0x00, 0x00, 0x64, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfc },
+	.ilen	= 415,
+	.output	= { 0x33, 0x20, 0x5b, 0xbf, 0x9e, 0x9f, 0x8f, 0x72,
+		    0x12, 0xab, 0x9e, 0x2a, 0xb9, 0xb7, 0xe4, 0xa5 },
+	.key	= { 0x00, 0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x1e, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7b, 0x7b },
+}, {
+	.input	= { 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0xff, 0xff, 0xff, 0xe9,
+		    0xe9, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac,
+		    0xac, 0xac, 0xac, 0xac, 0x00, 0x00, 0xac, 0xac,
+		    0xec, 0x01, 0x00, 0xac, 0xac, 0xac, 0x2c, 0xac,
+		    0xa2, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac,
+		    0xac, 0xac, 0xac, 0xac, 0x64, 0xf2 },
+	.ilen	= 118,
+	.output	= { 0x02, 0xee, 0x7c, 0x8c, 0x54, 0x6d, 0xde, 0xb1,
+		    0xa4, 0x67, 0xe4, 0xc3, 0x98, 0x11, 0x58, 0xb9 },
+	.key	= { 0x00, 0x00, 0x00, 0x7f, 0x00, 0x00, 0x00, 0x7f,
+		    0x01, 0x00, 0x00, 0x20, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0xcf, 0x77, 0x77, 0x77, 0x77, 0x77,
+		    0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77 },
+}, { /* nacl */
+	.input	= { 0x8e, 0x99, 0x3b, 0x9f, 0x48, 0x68, 0x12, 0x73,
+		    0xc2, 0x96, 0x50, 0xba, 0x32, 0xfc, 0x76, 0xce,
+		    0x48, 0x33, 0x2e, 0xa7, 0x16, 0x4d, 0x96, 0xa4,
+		    0x47, 0x6f, 0xb8, 0xc5, 0x31, 0xa1, 0x18, 0x6a,
+		    0xc0, 0xdf, 0xc1, 0x7c, 0x98, 0xdc, 0xe8, 0x7b,
+		    0x4d, 0xa7, 0xf0, 0x11, 0xec, 0x48, 0xc9, 0x72,
+		    0x71, 0xd2, 0xc2, 0x0f, 0x9b, 0x92, 0x8f, 0xe2,
+		    0x27, 0x0d, 0x6f, 0xb8, 0x63, 0xd5, 0x17, 0x38,
+		    0xb4, 0x8e, 0xee, 0xe3, 0x14, 0xa7, 0xcc, 0x8a,
+		    0xb9, 0x32, 0x16, 0x45, 0x48, 0xe5, 0x26, 0xae,
+		    0x90, 0x22, 0x43, 0x68, 0x51, 0x7a, 0xcf, 0xea,
+		    0xbd, 0x6b, 0xb3, 0x73, 0x2b, 0xc0, 0xe9, 0xda,
+		    0x99, 0x83, 0x2b, 0x61, 0xca, 0x01, 0xb6, 0xde,
+		    0x56, 0x24, 0x4a, 0x9e, 0x88, 0xd5, 0xf9, 0xb3,
+		    0x79, 0x73, 0xf6, 0x22, 0xa4, 0x3d, 0x14, 0xa6,
+		    0x59, 0x9b, 0x1f, 0x65, 0x4c, 0xb4, 0x5a, 0x74,
+		    0xe3, 0x55, 0xa5 },
+	.ilen	= 131,
+	.output	= { 0xf3, 0xff, 0xc7, 0x70, 0x3f, 0x94, 0x00, 0xe5,
+		    0x2a, 0x7d, 0xfb, 0x4b, 0x3d, 0x33, 0x05, 0xd9 },
+	.key	= { 0xee, 0xa6, 0xa7, 0x25, 0x1c, 0x1e, 0x72, 0x91,
+		    0x6d, 0x11, 0xc2, 0xcb, 0x21, 0x4d, 0x3c, 0x25,
+		    0x25, 0x39, 0x12, 0x1d, 0x8e, 0x23, 0x4e, 0x65,
+		    0x2d, 0x65, 0x1f, 0xa4, 0xc8, 0xcf, 0xf8, 0x80 },
+}, { /* wrap 2^130-5 */
+	.input	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.ilen	= 16,
+	.output	= { 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.key	= { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+}, { /* wrap 2^128 */
+	.input	= { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.ilen	= 16,
+	.output	= { 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.key	= { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+}, { /* limb carry */
+	.input	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.ilen	= 48,
+	.output	= { 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.key	= { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+}, { /* 2^130-5 */
+	.input	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xfb, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe,
+		    0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe,
+		    0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
+		    0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01 },
+	.ilen	= 48,
+	.output	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.key	= { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+}, { /* 2^130-6 */
+	.input	= { 0xfd, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.ilen	= 16,
+	.output	= { 0xfa, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.key	= { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+}, { /* 5*H+L reduction intermediate */
+	.input	= { 0xe3, 0x35, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0xb9,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x33, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0x79, 0xcd,
+		    0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.ilen	= 64,
+	.output	= { 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.key	= { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+}, { /* 5*H+L reduction final */
+	.input	= { 0xe3, 0x35, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0xb9,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x33, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0x79, 0xcd,
+		    0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.ilen	= 48,
+	.output	= { 0x13, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.key	= { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+} };
+
+bool __init poly1305_selftest(void)
+{
+	simd_context_t simd_context = simd_get();
+	bool success = true;
+	size_t i, j;
+
+	for (i = 0; i < ARRAY_SIZE(poly1305_testvecs); ++i) {
+		struct poly1305_ctx poly1305;
+		u8 out[POLY1305_MAC_SIZE];
+
+		memset(out, 0, sizeof(out));
+		memset(&poly1305, 0, sizeof(poly1305));
+		poly1305_init(&poly1305, poly1305_testvecs[i].key,
+			      simd_context);
+		poly1305_update(&poly1305, poly1305_testvecs[i].input,
+				poly1305_testvecs[i].ilen, simd_context);
+		poly1305_final(&poly1305, out, simd_context);
+		if (memcmp(out, poly1305_testvecs[i].output,
+			   POLY1305_MAC_SIZE)) {
+			pr_info("poly1305 self-test %zu: FAIL\n", i + 1);
+			success = false;
+		}
+		simd_context = simd_relax(simd_context);
+
+		if (poly1305_testvecs[i].ilen <= 1)
+			continue;
+
+		for (j = 1; j < poly1305_testvecs[i].ilen - 1; ++j) {
+			memset(out, 0, sizeof(out));
+			memset(&poly1305, 0, sizeof(poly1305));
+			poly1305_init(&poly1305, poly1305_testvecs[i].key,
+				      simd_context);
+			poly1305_update(&poly1305, poly1305_testvecs[i].input,
+					j, simd_context);
+			poly1305_update(&poly1305,
+					poly1305_testvecs[i].input + j,
+					poly1305_testvecs[i].ilen - j,
+					simd_context);
+			poly1305_final(&poly1305, out, simd_context);
+			if (memcmp(out, poly1305_testvecs[i].output,
+				   POLY1305_MAC_SIZE)) {
+				pr_info("poly1305 self-test %zu (split %zu): FAIL\n",
+					i + 1, j);
+				success = false;
+			}
+			simd_context = simd_relax(simd_context);
+		}
+	}
+	simd_put(simd_context);
+
+	if (success)
+		pr_info("poly1305 self-tests: pass\n");
+
+	return success;
+}
+#endif
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (6 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 07/20] zinc: Poly1305 generic C implementations and selftest Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 17:27   ` Ard Biesheuvel
  2018-09-14 16:22 ` [PATCH net-next v4 09/20] zinc: Poly1305 x86_64 implementation Jason A. Donenfeld
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Andy Polyakov, Russell King,
	linux-arm-kernel

These NEON and non-NEON implementations come from Andy Polyakov's
implementation. They are exactly the same as Andy Polyakov's original,
with the following exceptions:

- Entries and exits use the proper kernel convention macro.
- CPU feature checking is done in C by the glue code, so that has been
  removed from the assembly.
- The function names have been renamed to fit kernel conventions.
- Labels have been renamed to fit kernel conventions.
- The neon code can jump to the scalar code when it makes sense to do
  so.

After '/^#/d;/^\..*[^:]$/d', the code has the following diff in actual
instructions from the original.

ARM:

-poly1305_init:
-.Lpoly1305_init:
+ENTRY(poly1305_init_arm)
 	stmdb	sp!,{r4-r11}

 	eor	r3,r3,r3
@@ -18,8 +25,6 @@
 	moveq	r0,#0
 	beq	.Lno_key

-	adr	r11,.Lpoly1305_init
-	ldr	r12,.LOPENSSL_armcap
 	ldrb	r4,[r1,#0]
 	mov	r10,#0x0fffffff
 	ldrb	r5,[r1,#1]
@@ -34,8 +39,6 @@
 	ldrb	r7,[r1,#6]
 	and	r4,r4,r10

-	ldr	r12,[r11,r12]		@ OPENSSL_armcap_P
-	ldr	r12,[r12]
 	ldrb	r8,[r1,#7]
 	orr	r5,r5,r6,lsl#8
 	ldrb	r6,[r1,#8]
@@ -45,22 +48,6 @@
 	ldrb	r8,[r1,#10]
 	and	r5,r5,r3

-	tst	r12,#ARMV7_NEON		@ check for NEON
-	adr	r9,poly1305_blocks_neon
-	adr	r11,poly1305_blocks
-	it	ne
-	movne	r11,r9
-	adr	r12,poly1305_emit
-	adr	r10,poly1305_emit_neon
-	it	ne
-	movne	r12,r10
-	itete	eq
-	addeq	r12,r11,#(poly1305_emit-.Lpoly1305_init)
-	addne	r12,r11,#(poly1305_emit_neon-.Lpoly1305_init)
-	addeq	r11,r11,#(poly1305_blocks-.Lpoly1305_init)
-	addne	r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init)
-	orr	r12,r12,#1	@ thumb-ify address
-	orr	r11,r11,#1
 	ldrb	r9,[r1,#11]
 	orr	r6,r6,r7,lsl#8
 	ldrb	r7,[r1,#12]
@@ -79,17 +66,16 @@
 	str	r6,[r0,#8]
 	and	r7,r7,r3
 	str	r7,[r0,#12]
-	stmia	r2,{r11,r12}		@ fill functions table
-	mov	r0,#1
-	mov	r0,#0
 .Lno_key:
 	ldmia	sp!,{r4-r11}
 	bx	lr				@ bx	lr
 	tst	lr,#1
 	moveq	pc,lr			@ be binary compatible with V4, yet
 	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
-poly1305_blocks:
-.Lpoly1305_blocks:
+ENDPROC(poly1305_init_arm)
+
+ENTRY(poly1305_blocks_arm)
+.Lpoly1305_blocks_arm:
 	stmdb	sp!,{r3-r11,lr}

 	ands	r2,r2,#-16
@@ -231,10 +217,11 @@
 	tst	lr,#1
 	moveq	pc,lr			@ be binary compatible with V4, yet
 	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
-poly1305_emit:
+ENDPROC(poly1305_blocks_arm)
+
+ENTRY(poly1305_emit_arm)
 	stmdb	sp!,{r4-r11}
 .Lpoly1305_emit_enter:
-
 	ldmia	r0,{r3-r7}
 	adds	r8,r3,#5		@ compare to modulus
 	adcs	r9,r4,#0
@@ -305,8 +292,12 @@
 	tst	lr,#1
 	moveq	pc,lr			@ be binary compatible with V4, yet
 	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
+ENDPROC(poly1305_emit_arm)
+
+

-poly1305_init_neon:
+ENTRY(poly1305_init_neon)
+.Lpoly1305_init_neon:
 	ldr	r4,[r0,#20]		@ load key base 2^32
 	ldr	r5,[r0,#24]
 	ldr	r6,[r0,#28]
@@ -515,8 +506,9 @@
 	vst1.32		{d8[1]},[r7]

 	bx	lr				@ bx	lr
+ENDPROC(poly1305_init_neon)

-poly1305_blocks_neon:
+ENTRY(poly1305_blocks_neon)
 	ldr	ip,[r0,#36]		@ is_base2_26
 	ands	r2,r2,#-16
 	beq	.Lno_data_neon
@@ -524,7 +516,7 @@
 	cmp	r2,#64
 	bhs	.Lenter_neon
 	tst	ip,ip			@ is_base2_26?
-	beq	.Lpoly1305_blocks
+	beq	.Lpoly1305_blocks_arm

 .Lenter_neon:
 	stmdb	sp!,{r4-r7}
@@ -534,7 +526,7 @@
 	bne	.Lbase2_26_neon

 	stmdb	sp!,{r1-r3,lr}
-	bl	poly1305_init_neon
+	bl	.Lpoly1305_init_neon

 	ldr	r4,[r0,#0]		@ load hash value base 2^32
 	ldr	r5,[r0,#4]
@@ -989,8 +981,9 @@
 	ldmia	sp!,{r4-r7}
 .Lno_data_neon:
 	bx	lr					@ bx	lr
+ENDPROC(poly1305_blocks_neon)

-poly1305_emit_neon:
+ENTRY(poly1305_emit_neon)
 	ldr	ip,[r0,#36]		@ is_base2_26

 	stmdb	sp!,{r4-r11}
@@ -1055,6 +1048,6 @@

 	ldmia	sp!,{r4-r11}
 	bx	lr				@ bx	lr
+ENDPROC(poly1305_emit_neon)

ARM64:

-poly1305_init:
+ENTRY(poly1305_init_arm)
 	cmp	x1,xzr
 	stp	xzr,xzr,[x0]		// zero hash value
 	stp	xzr,xzr,[x0,#16]	// [along with is_base2_26]
@@ -11,14 +15,9 @@
 	csel	x0,xzr,x0,eq
 	b.eq	.Lno_key

-	ldrsw	x11,.LOPENSSL_armcap_P
-	ldr	x11,.LOPENSSL_armcap_P
-	adr	x10,.LOPENSSL_armcap_P
-
 	ldp	x7,x8,[x1]		// load key
 	mov	x9,#0xfffffffc0fffffff
 	movk	x9,#0x0fff,lsl#48
-	ldr	w17,[x10,x11]
 	rev	x7,x7			// flip bytes
 	rev	x8,x8
 	and	x7,x7,x9		// &=0ffffffc0fffffff
@@ -26,24 +25,11 @@
 	and	x8,x8,x9		// &=0ffffffc0ffffffc
 	stp	x7,x8,[x0,#32]	// save key value

-	tst	w17,#ARMV7_NEON
-
-	adr	x12,poly1305_blocks
-	adr	x7,poly1305_blocks_neon
-	adr	x13,poly1305_emit
-	adr	x8,poly1305_emit_neon
-
-	csel	x12,x12,x7,eq
-	csel	x13,x13,x8,eq
-
-	stp	w12,w13,[x2]
-	stp	x12,x13,[x2]
-
-	mov	x0,#1
 .Lno_key:
 	ret
+ENDPROC(poly1305_init_arm)

-poly1305_blocks:
+ENTRY(poly1305_blocks_arm)
 	ands	x2,x2,#-16
 	b.eq	.Lno_data

@@ -100,8 +86,9 @@

 .Lno_data:
 	ret
+ENDPROC(poly1305_blocks_arm)

-poly1305_emit:
+ENTRY(poly1305_emit_arm)
 	ldp	x4,x5,[x0]		// load hash base 2^64
 	ldr	x6,[x0,#16]
 	ldp	x10,x11,[x2]	// load nonce
@@ -124,7 +111,9 @@
 	stp	x4,x5,[x1]		// write result

 	ret
-poly1305_mult:
+ENDPROC(poly1305_emit_arm)
+
+__poly1305_mult:
 	mul	x12,x4,x7		// h0*r0
 	umulh	x13,x4,x7

@@ -158,7 +147,7 @@

 	ret

-poly1305_splat:
+__poly1305_splat:
 	and	x12,x4,#0x03ffffff	// base 2^64 -> base 2^26
 	ubfx	x13,x4,#26,#26
 	extr	x14,x5,x4,#52
@@ -182,11 +171,11 @@

 	ret

-poly1305_blocks_neon:
+ENTRY(poly1305_blocks_neon)
 	ldr	x17,[x0,#24]
 	cmp	x2,#128
 	b.hs	.Lblocks_neon
-	cbz	x17,poly1305_blocks
+	cbz	x17,poly1305_blocks_arm

 .Lblocks_neon:
 	stp	x29,x30,[sp,#-80]!
@@ -232,7 +221,7 @@
 	adcs	x5,x5,x13
 	adc	x6,x6,x3

-	bl	poly1305_mult
+	bl	__poly1305_mult
 	ldr	x30,[sp,#8]

 	cbz	x3,.Lstore_base2_64_neon
@@ -274,7 +263,7 @@
 	adcs	x5,x5,x13
 	adc	x6,x6,x3

-	bl	poly1305_mult
+	bl	__poly1305_mult

 .Linit_neon:
 	and	x10,x4,#0x03ffffff	// base 2^64 -> base 2^26
@@ -301,19 +290,19 @@
 	mov	x5,x8
 	mov	x6,xzr
 	add	x0,x0,#48+12
-	bl	poly1305_splat
+	bl	__poly1305_splat

-	bl	poly1305_mult		// r^2
+	bl	__poly1305_mult		// r^2
 	sub	x0,x0,#4
-	bl	poly1305_splat
+	bl	__poly1305_splat

-	bl	poly1305_mult		// r^3
+	bl	__poly1305_mult		// r^3
 	sub	x0,x0,#4
-	bl	poly1305_splat
+	bl	__poly1305_splat

-	bl	poly1305_mult		// r^4
+	bl	__poly1305_mult		// r^4
 	sub	x0,x0,#4
-	bl	poly1305_splat
+	bl	__poly1305_splat
 	ldr	x30,[sp,#8]

 	add	x16,x1,#32
@@ -743,10 +732,11 @@
 .Lno_data_neon:
 	ldr	x29,[sp],#80
 	ret
+ENDPROC(poly1305_blocks_neon)

-poly1305_emit_neon:
+ENTRY(poly1305_emit_neon)
 	ldr	x17,[x0,#24]
-	cbz	x17,poly1305_emit
+	cbz	x17,poly1305_emit_arm

 	ldp	w10,w11,[x0]		// load hash value base 2^26
 	ldp	w12,w13,[x0,#8]
@@ -788,6 +778,6 @@
 	stp	x4,x5,[x1]		// write result

 	ret
+ENDPROC(poly1305_emit_neon)

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
 lib/zinc/Makefile                     |    8 +
 lib/zinc/poly1305/poly1305-arm-glue.h |   69 ++
 lib/zinc/poly1305/poly1305-arm.S      | 1117 +++++++++++++++++++++++++
 lib/zinc/poly1305/poly1305-arm64.S    |  822 ++++++++++++++++++
 4 files changed, 2016 insertions(+)
 create mode 100644 lib/zinc/poly1305/poly1305-arm-glue.h
 create mode 100644 lib/zinc/poly1305/poly1305-arm.S
 create mode 100644 lib/zinc/poly1305/poly1305-arm64.S

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index d1e3892e06d9..f37df89a3f87 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -25,6 +25,14 @@ endif
 
 ifeq ($(CONFIG_ZINC_POLY1305),y)
 zinc-y += poly1305/poly1305.o
+ifeq ($(CONFIG_ZINC_ARCH_ARM),y)
+zinc-y += poly1305/poly1305-arm.o
+CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-arm-glue.h
+endif
+ifeq ($(CONFIG_ZINC_ARCH_ARM64),y)
+zinc-y += poly1305/poly1305-arm64.o
+CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-arm-glue.h
+endif
 endif
 
 zinc-y += main.o
diff --git a/lib/zinc/poly1305/poly1305-arm-glue.h b/lib/zinc/poly1305/poly1305-arm-glue.h
new file mode 100644
index 000000000000..53f8fec7f858
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm-glue.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/poly1305.h>
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+
+asmlinkage void poly1305_init_arm(void *ctx, const u8 key[16]);
+asmlinkage void poly1305_blocks_arm(void *ctx, const u8 *inp, const size_t len,
+				    const u32 padbit);
+asmlinkage void poly1305_emit_arm(void *ctx, u8 mac[16], const u32 nonce[4]);
+#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&                                     \
+	(defined(CONFIG_64BIT) || __LINUX_ARM_ARCH__ >= 7)
+#define ARM_USE_NEON
+asmlinkage void poly1305_blocks_neon(void *ctx, const u8 *inp, const size_t len,
+				     const u32 padbit);
+asmlinkage void poly1305_emit_neon(void *ctx, u8 mac[16], const u32 nonce[4]);
+#endif
+
+static bool poly1305_use_neon __ro_after_init;
+
+void __init poly1305_fpu_init(void)
+{
+#if defined(CONFIG_ARM64)
+	poly1305_use_neon = elf_hwcap & HWCAP_ASIMD;
+#elif defined(CONFIG_ARM)
+	poly1305_use_neon = elf_hwcap & HWCAP_NEON;
+#endif
+}
+
+static inline bool poly1305_init_arch(void *ctx,
+				      const u8 key[POLY1305_KEY_SIZE],
+				      simd_context_t simd_context)
+{
+	poly1305_init_arm(ctx, key);
+	return true;
+}
+
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
+					const size_t len, const u32 padbit,
+					simd_context_t simd_context)
+{
+#if defined(ARM_USE_NEON)
+	if (simd_context == HAVE_FULL_SIMD && poly1305_use_neon) {
+		poly1305_blocks_neon(ctx, inp, len, padbit);
+		return true;
+	}
+#endif
+	poly1305_blocks_arm(ctx, inp, len, padbit);
+	return true;
+}
+
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+				      const u32 nonce[4],
+				      simd_context_t simd_context)
+{
+#if defined(ARM_USE_NEON)
+	if (simd_context == HAVE_FULL_SIMD && poly1305_use_neon) {
+		poly1305_emit_neon(ctx, mac, nonce);
+		return true;
+	}
+#endif
+	poly1305_emit_arm(ctx, mac, nonce);
+	return true;
+}
+
+#define HAVE_POLY1305_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/poly1305/poly1305-arm.S b/lib/zinc/poly1305/poly1305-arm.S
new file mode 100644
index 000000000000..110f4317b5d7
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm.S
@@ -0,0 +1,1117 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#include <linux/linkage.h>
+
+.text
+#if defined(__thumb2__)
+.syntax	unified
+.thumb
+#else
+.code	32
+#endif
+
+.align	5
+ENTRY(poly1305_init_arm)
+	stmdb	sp!,{r4-r11}
+
+	eor	r3,r3,r3
+	cmp	r1,#0
+	str	r3,[r0,#0]		@ zero hash value
+	str	r3,[r0,#4]
+	str	r3,[r0,#8]
+	str	r3,[r0,#12]
+	str	r3,[r0,#16]
+	str	r3,[r0,#36]		@ is_base2_26
+	add	r0,r0,#20
+
+#ifdef	__thumb2__
+	it	eq
+#endif
+	moveq	r0,#0
+	beq	.Lno_key
+
+	ldrb	r4,[r1,#0]
+	mov	r10,#0x0fffffff
+	ldrb	r5,[r1,#1]
+	and	r3,r10,#-4		@ 0x0ffffffc
+	ldrb	r6,[r1,#2]
+	ldrb	r7,[r1,#3]
+	orr	r4,r4,r5,lsl#8
+	ldrb	r5,[r1,#4]
+	orr	r4,r4,r6,lsl#16
+	ldrb	r6,[r1,#5]
+	orr	r4,r4,r7,lsl#24
+	ldrb	r7,[r1,#6]
+	and	r4,r4,r10
+
+	ldrb	r8,[r1,#7]
+	orr	r5,r5,r6,lsl#8
+	ldrb	r6,[r1,#8]
+	orr	r5,r5,r7,lsl#16
+	ldrb	r7,[r1,#9]
+	orr	r5,r5,r8,lsl#24
+	ldrb	r8,[r1,#10]
+	and	r5,r5,r3
+
+	ldrb	r9,[r1,#11]
+	orr	r6,r6,r7,lsl#8
+	ldrb	r7,[r1,#12]
+	orr	r6,r6,r8,lsl#16
+	ldrb	r8,[r1,#13]
+	orr	r6,r6,r9,lsl#24
+	ldrb	r9,[r1,#14]
+	and	r6,r6,r3
+
+	ldrb	r10,[r1,#15]
+	orr	r7,r7,r8,lsl#8
+	str	r4,[r0,#0]
+	orr	r7,r7,r9,lsl#16
+	str	r5,[r0,#4]
+	orr	r7,r7,r10,lsl#24
+	str	r6,[r0,#8]
+	and	r7,r7,r3
+	str	r7,[r0,#12]
+.Lno_key:
+	ldmia	sp!,{r4-r11}
+#if __LINUX_ARM_ARCH__ >= 5
+	bx	lr				@ bx	lr
+#else
+	tst	lr,#1
+	moveq	pc,lr			@ be binary compatible with V4, yet
+	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
+#endif
+ENDPROC(poly1305_init_arm)
+
+.align	5
+ENTRY(poly1305_blocks_arm)
+.Lpoly1305_blocks_arm:
+	stmdb	sp!,{r3-r11,lr}
+
+	ands	r2,r2,#-16
+	beq	.Lno_data
+
+	cmp	r3,#0
+	add	r2,r2,r1		@ end pointer
+	sub	sp,sp,#32
+
+	ldmia	r0,{r4-r12}		@ load context
+
+	str	r0,[sp,#12]		@ offload stuff
+	mov	lr,r1
+	str	r2,[sp,#16]
+	str	r10,[sp,#20]
+	str	r11,[sp,#24]
+	str	r12,[sp,#28]
+	b	.Loop
+
+.Loop:
+#if __LINUX_ARM_ARCH__ < 7
+	ldrb	r0,[lr],#16		@ load input
+#ifdef	__thumb2__
+	it	hi
+#endif
+	addhi	r8,r8,#1		@ 1<<128
+	ldrb	r1,[lr,#-15]
+	ldrb	r2,[lr,#-14]
+	ldrb	r3,[lr,#-13]
+	orr	r1,r0,r1,lsl#8
+	ldrb	r0,[lr,#-12]
+	orr	r2,r1,r2,lsl#16
+	ldrb	r1,[lr,#-11]
+	orr	r3,r2,r3,lsl#24
+	ldrb	r2,[lr,#-10]
+	adds	r4,r4,r3		@ accumulate input
+
+	ldrb	r3,[lr,#-9]
+	orr	r1,r0,r1,lsl#8
+	ldrb	r0,[lr,#-8]
+	orr	r2,r1,r2,lsl#16
+	ldrb	r1,[lr,#-7]
+	orr	r3,r2,r3,lsl#24
+	ldrb	r2,[lr,#-6]
+	adcs	r5,r5,r3
+
+	ldrb	r3,[lr,#-5]
+	orr	r1,r0,r1,lsl#8
+	ldrb	r0,[lr,#-4]
+	orr	r2,r1,r2,lsl#16
+	ldrb	r1,[lr,#-3]
+	orr	r3,r2,r3,lsl#24
+	ldrb	r2,[lr,#-2]
+	adcs	r6,r6,r3
+
+	ldrb	r3,[lr,#-1]
+	orr	r1,r0,r1,lsl#8
+	str	lr,[sp,#8]		@ offload input pointer
+	orr	r2,r1,r2,lsl#16
+	add	r10,r10,r10,lsr#2
+	orr	r3,r2,r3,lsl#24
+#else
+	ldr	r0,[lr],#16		@ load input
+#ifdef	__thumb2__
+	it	hi
+#endif
+	addhi	r8,r8,#1		@ padbit
+	ldr	r1,[lr,#-12]
+	ldr	r2,[lr,#-8]
+	ldr	r3,[lr,#-4]
+#ifdef	__ARMEB__
+	rev	r0,r0
+	rev	r1,r1
+	rev	r2,r2
+	rev	r3,r3
+#endif
+	adds	r4,r4,r0		@ accumulate input
+	str	lr,[sp,#8]		@ offload input pointer
+	adcs	r5,r5,r1
+	add	r10,r10,r10,lsr#2
+	adcs	r6,r6,r2
+#endif
+	add	r11,r11,r11,lsr#2
+	adcs	r7,r7,r3
+	add	r12,r12,r12,lsr#2
+
+	umull	r2,r3,r5,r9
+	 adc	r8,r8,#0
+	umull	r0,r1,r4,r9
+	umlal	r2,r3,r8,r10
+	umlal	r0,r1,r7,r10
+	ldr	r10,[sp,#20]		@ reload r10
+	umlal	r2,r3,r6,r12
+	umlal	r0,r1,r5,r12
+	umlal	r2,r3,r7,r11
+	umlal	r0,r1,r6,r11
+	umlal	r2,r3,r4,r10
+	str	r0,[sp,#0]		@ future r4
+	 mul	r0,r11,r8
+	ldr	r11,[sp,#24]		@ reload r11
+	adds	r2,r2,r1		@ d1+=d0>>32
+	 eor	r1,r1,r1
+	adc	lr,r3,#0		@ future r6
+	str	r2,[sp,#4]		@ future r5
+
+	mul	r2,r12,r8
+	eor	r3,r3,r3
+	umlal	r0,r1,r7,r12
+	ldr	r12,[sp,#28]		@ reload r12
+	umlal	r2,r3,r7,r9
+	umlal	r0,r1,r6,r9
+	umlal	r2,r3,r6,r10
+	umlal	r0,r1,r5,r10
+	umlal	r2,r3,r5,r11
+	umlal	r0,r1,r4,r11
+	umlal	r2,r3,r4,r12
+	ldr	r4,[sp,#0]
+	mul	r8,r9,r8
+	ldr	r5,[sp,#4]
+
+	adds	r6,lr,r0		@ d2+=d1>>32
+	ldr	lr,[sp,#8]		@ reload input pointer
+	adc	r1,r1,#0
+	adds	r7,r2,r1		@ d3+=d2>>32
+	ldr	r0,[sp,#16]		@ reload end pointer
+	adc	r3,r3,#0
+	add	r8,r8,r3		@ h4+=d3>>32
+
+	and	r1,r8,#-4
+	and	r8,r8,#3
+	add	r1,r1,r1,lsr#2		@ *=5
+	adds	r4,r4,r1
+	adcs	r5,r5,#0
+	adcs	r6,r6,#0
+	adcs	r7,r7,#0
+	adc	r8,r8,#0
+
+	cmp	r0,lr			@ done yet?
+	bhi	.Loop
+
+	ldr	r0,[sp,#12]
+	add	sp,sp,#32
+	stmia	r0,{r4-r8}		@ store the result
+
+.Lno_data:
+#if __LINUX_ARM_ARCH__ >= 5
+	ldmia	sp!,{r3-r11,pc}
+#else
+	ldmia	sp!,{r3-r11,lr}
+	tst	lr,#1
+	moveq	pc,lr			@ be binary compatible with V4, yet
+	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
+#endif
+ENDPROC(poly1305_blocks_arm)
+
+.align	5
+ENTRY(poly1305_emit_arm)
+	stmdb	sp!,{r4-r11}
+.Lpoly1305_emit_enter:
+	ldmia	r0,{r3-r7}
+	adds	r8,r3,#5		@ compare to modulus
+	adcs	r9,r4,#0
+	adcs	r10,r5,#0
+	adcs	r11,r6,#0
+	adc	r7,r7,#0
+	tst	r7,#4			@ did it carry/borrow?
+
+#ifdef	__thumb2__
+	it	ne
+#endif
+	movne	r3,r8
+	ldr	r8,[r2,#0]
+#ifdef	__thumb2__
+	it	ne
+#endif
+	movne	r4,r9
+	ldr	r9,[r2,#4]
+#ifdef	__thumb2__
+	it	ne
+#endif
+	movne	r5,r10
+	ldr	r10,[r2,#8]
+#ifdef	__thumb2__
+	it	ne
+#endif
+	movne	r6,r11
+	ldr	r11,[r2,#12]
+
+	adds	r3,r3,r8
+	adcs	r4,r4,r9
+	adcs	r5,r5,r10
+	adc	r6,r6,r11
+
+#if __LINUX_ARM_ARCH__ >= 7
+#ifdef __ARMEB__
+	rev	r3,r3
+	rev	r4,r4
+	rev	r5,r5
+	rev	r6,r6
+#endif
+	str	r3,[r1,#0]
+	str	r4,[r1,#4]
+	str	r5,[r1,#8]
+	str	r6,[r1,#12]
+#else
+	strb	r3,[r1,#0]
+	mov	r3,r3,lsr#8
+	strb	r4,[r1,#4]
+	mov	r4,r4,lsr#8
+	strb	r5,[r1,#8]
+	mov	r5,r5,lsr#8
+	strb	r6,[r1,#12]
+	mov	r6,r6,lsr#8
+
+	strb	r3,[r1,#1]
+	mov	r3,r3,lsr#8
+	strb	r4,[r1,#5]
+	mov	r4,r4,lsr#8
+	strb	r5,[r1,#9]
+	mov	r5,r5,lsr#8
+	strb	r6,[r1,#13]
+	mov	r6,r6,lsr#8
+
+	strb	r3,[r1,#2]
+	mov	r3,r3,lsr#8
+	strb	r4,[r1,#6]
+	mov	r4,r4,lsr#8
+	strb	r5,[r1,#10]
+	mov	r5,r5,lsr#8
+	strb	r6,[r1,#14]
+	mov	r6,r6,lsr#8
+
+	strb	r3,[r1,#3]
+	strb	r4,[r1,#7]
+	strb	r5,[r1,#11]
+	strb	r6,[r1,#15]
+#endif
+	ldmia	sp!,{r4-r11}
+#if __LINUX_ARM_ARCH__ >= 5
+	bx	lr				@ bx	lr
+#else
+	tst	lr,#1
+	moveq	pc,lr			@ be binary compatible with V4, yet
+	.word	0xe12fff1e			@ interoperable with Thumb ISA:-)
+#endif
+ENDPROC(poly1305_emit_arm)
+
+
+#if __LINUX_ARM_ARCH__ >= 7
+.fpu	neon
+
+.align	5
+ENTRY(poly1305_init_neon)
+.Lpoly1305_init_neon:
+	ldr	r4,[r0,#20]		@ load key base 2^32
+	ldr	r5,[r0,#24]
+	ldr	r6,[r0,#28]
+	ldr	r7,[r0,#32]
+
+	and	r2,r4,#0x03ffffff	@ base 2^32 -> base 2^26
+	mov	r3,r4,lsr#26
+	mov	r4,r5,lsr#20
+	orr	r3,r3,r5,lsl#6
+	mov	r5,r6,lsr#14
+	orr	r4,r4,r6,lsl#12
+	mov	r6,r7,lsr#8
+	orr	r5,r5,r7,lsl#18
+	and	r3,r3,#0x03ffffff
+	and	r4,r4,#0x03ffffff
+	and	r5,r5,#0x03ffffff
+
+	vdup.32	d0,r2			@ r^1 in both lanes
+	add	r2,r3,r3,lsl#2		@ *5
+	vdup.32	d1,r3
+	add	r3,r4,r4,lsl#2
+	vdup.32	d2,r2
+	vdup.32	d3,r4
+	add	r4,r5,r5,lsl#2
+	vdup.32	d4,r3
+	vdup.32	d5,r5
+	add	r5,r6,r6,lsl#2
+	vdup.32	d6,r4
+	vdup.32	d7,r6
+	vdup.32	d8,r5
+
+	mov	r5,#2		@ counter
+
+.Lsquare_neon:
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
+	@ d1 = h1*r0 + h0*r1   + h4*5*r2 + h3*5*r3 + h2*5*r4
+	@ d2 = h2*r0 + h1*r1   + h0*r2   + h4*5*r3 + h3*5*r4
+	@ d3 = h3*r0 + h2*r1   + h1*r2   + h0*r3   + h4*5*r4
+	@ d4 = h4*r0 + h3*r1   + h2*r2   + h1*r3   + h0*r4
+
+	vmull.u32	q5,d0,d0[1]
+	vmull.u32	q6,d1,d0[1]
+	vmull.u32	q7,d3,d0[1]
+	vmull.u32	q8,d5,d0[1]
+	vmull.u32	q9,d7,d0[1]
+
+	vmlal.u32	q5,d7,d2[1]
+	vmlal.u32	q6,d0,d1[1]
+	vmlal.u32	q7,d1,d1[1]
+	vmlal.u32	q8,d3,d1[1]
+	vmlal.u32	q9,d5,d1[1]
+
+	vmlal.u32	q5,d5,d4[1]
+	vmlal.u32	q6,d7,d4[1]
+	vmlal.u32	q8,d1,d3[1]
+	vmlal.u32	q7,d0,d3[1]
+	vmlal.u32	q9,d3,d3[1]
+
+	vmlal.u32	q5,d3,d6[1]
+	vmlal.u32	q8,d0,d5[1]
+	vmlal.u32	q6,d5,d6[1]
+	vmlal.u32	q7,d7,d6[1]
+	vmlal.u32	q9,d1,d5[1]
+
+	vmlal.u32	q8,d7,d8[1]
+	vmlal.u32	q5,d1,d8[1]
+	vmlal.u32	q6,d3,d8[1]
+	vmlal.u32	q7,d5,d8[1]
+	vmlal.u32	q9,d0,d7[1]
+
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
+	@ and P. Schwabe
+	@
+	@ H0>>+H1>>+H2>>+H3>>+H4
+	@ H3>>+H4>>*5+H0>>+H1
+	@
+	@ Trivia.
+	@
+	@ Result of multiplication of n-bit number by m-bit number is
+	@ n+m bits wide. However! Even though 2^n is a n+1-bit number,
+	@ m-bit number multiplied by 2^n is still n+m bits wide.
+	@
+	@ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2,
+	@ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit
+	@ one is n+1 bits wide.
+	@
+	@ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that
+	@ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4
+	@ can be 27. However! In cases when their width exceeds 26 bits
+	@ they are limited by 2^26+2^6. This in turn means that *sum*
+	@ of the products with these values can still be viewed as sum
+	@ of 52-bit numbers as long as the amount of addends is not a
+	@ power of 2. For example,
+	@
+	@ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4,
+	@
+	@ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or
+	@ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than
+	@ 8 * (2^52) or 2^55. However, the value is then multiplied by
+	@ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12),
+	@ which is less than 32 * (2^52) or 2^57. And when processing
+	@ data we are looking at triple as many addends...
+	@
+	@ In key setup procedure pre-reduced H0 is limited by 5*4+1 and
+	@ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the
+	@ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while
+	@ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32
+	@ instruction accepts 2x32-bit input and writes 2x64-bit result.
+	@ This means that result of reduction have to be compressed upon
+	@ loop wrap-around. This can be done in the process of reduction
+	@ to minimize amount of instructions [as well as amount of
+	@ 128-bit instructions, which benefits low-end processors], but
+	@ one has to watch for H2 (which is narrower than H0) and 5*H4
+	@ not being wider than 58 bits, so that result of right shift
+	@ by 26 bits fits in 32 bits. This is also useful on x86,
+	@ because it allows to use paddd in place for paddq, which
+	@ benefits Atom, where paddq is ridiculously slow.
+
+	vshr.u64	q15,q8,#26
+	vmovn.i64	d16,q8
+	 vshr.u64	q4,q5,#26
+	 vmovn.i64	d10,q5
+	vadd.i64	q9,q9,q15		@ h3 -> h4
+	vbic.i32	d16,#0xfc000000	@ &=0x03ffffff
+	 vadd.i64	q6,q6,q4		@ h0 -> h1
+	 vbic.i32	d10,#0xfc000000
+
+	vshrn.u64	d30,q9,#26
+	vmovn.i64	d18,q9
+	 vshr.u64	q4,q6,#26
+	 vmovn.i64	d12,q6
+	 vadd.i64	q7,q7,q4		@ h1 -> h2
+	vbic.i32	d18,#0xfc000000
+	 vbic.i32	d12,#0xfc000000
+
+	vadd.i32	d10,d10,d30
+	vshl.u32	d30,d30,#2
+	 vshrn.u64	d8,q7,#26
+	 vmovn.i64	d14,q7
+	vadd.i32	d10,d10,d30	@ h4 -> h0
+	 vadd.i32	d16,d16,d8	@ h2 -> h3
+	 vbic.i32	d14,#0xfc000000
+
+	vshr.u32	d30,d10,#26
+	vbic.i32	d10,#0xfc000000
+	 vshr.u32	d8,d16,#26
+	 vbic.i32	d16,#0xfc000000
+	vadd.i32	d12,d12,d30	@ h0 -> h1
+	 vadd.i32	d18,d18,d8	@ h3 -> h4
+
+	subs		r5,r5,#1
+	beq		.Lsquare_break_neon
+
+	add		r6,r0,#(48+0*9*4)
+	add		r7,r0,#(48+1*9*4)
+
+	vtrn.32		d0,d10		@ r^2:r^1
+	vtrn.32		d3,d14
+	vtrn.32		d5,d16
+	vtrn.32		d1,d12
+	vtrn.32		d7,d18
+
+	vshl.u32	d4,d3,#2		@ *5
+	vshl.u32	d6,d5,#2
+	vshl.u32	d2,d1,#2
+	vshl.u32	d8,d7,#2
+	vadd.i32	d4,d4,d3
+	vadd.i32	d2,d2,d1
+	vadd.i32	d6,d6,d5
+	vadd.i32	d8,d8,d7
+
+	vst4.32		{d0[0],d1[0],d2[0],d3[0]},[r6]!
+	vst4.32		{d0[1],d1[1],d2[1],d3[1]},[r7]!
+	vst4.32		{d4[0],d5[0],d6[0],d7[0]},[r6]!
+	vst4.32		{d4[1],d5[1],d6[1],d7[1]},[r7]!
+	vst1.32		{d8[0]},[r6,:32]
+	vst1.32		{d8[1]},[r7,:32]
+
+	b		.Lsquare_neon
+
+.align	4
+.Lsquare_break_neon:
+	add		r6,r0,#(48+2*4*9)
+	add		r7,r0,#(48+3*4*9)
+
+	vmov		d0,d10		@ r^4:r^3
+	vshl.u32	d2,d12,#2		@ *5
+	vmov		d1,d12
+	vshl.u32	d4,d14,#2
+	vmov		d3,d14
+	vshl.u32	d6,d16,#2
+	vmov		d5,d16
+	vshl.u32	d8,d18,#2
+	vmov		d7,d18
+	vadd.i32	d2,d2,d12
+	vadd.i32	d4,d4,d14
+	vadd.i32	d6,d6,d16
+	vadd.i32	d8,d8,d18
+
+	vst4.32		{d0[0],d1[0],d2[0],d3[0]},[r6]!
+	vst4.32		{d0[1],d1[1],d2[1],d3[1]},[r7]!
+	vst4.32		{d4[0],d5[0],d6[0],d7[0]},[r6]!
+	vst4.32		{d4[1],d5[1],d6[1],d7[1]},[r7]!
+	vst1.32		{d8[0]},[r6]
+	vst1.32		{d8[1]},[r7]
+
+	bx	lr				@ bx	lr
+ENDPROC(poly1305_init_neon)
+
+.align	5
+ENTRY(poly1305_blocks_neon)
+	ldr	ip,[r0,#36]		@ is_base2_26
+	ands	r2,r2,#-16
+	beq	.Lno_data_neon
+
+	cmp	r2,#64
+	bhs	.Lenter_neon
+	tst	ip,ip			@ is_base2_26?
+	beq	.Lpoly1305_blocks_arm
+
+.Lenter_neon:
+	stmdb	sp!,{r4-r7}
+	vstmdb	sp!,{d8-d15}		@ ABI specification says so
+
+	tst	ip,ip			@ is_base2_26?
+	bne	.Lbase2_26_neon
+
+	stmdb	sp!,{r1-r3,lr}
+	bl	.Lpoly1305_init_neon
+
+	ldr	r4,[r0,#0]		@ load hash value base 2^32
+	ldr	r5,[r0,#4]
+	ldr	r6,[r0,#8]
+	ldr	r7,[r0,#12]
+	ldr	ip,[r0,#16]
+
+	and	r2,r4,#0x03ffffff	@ base 2^32 -> base 2^26
+	mov	r3,r4,lsr#26
+	 veor	d10,d10,d10
+	mov	r4,r5,lsr#20
+	orr	r3,r3,r5,lsl#6
+	 veor	d12,d12,d12
+	mov	r5,r6,lsr#14
+	orr	r4,r4,r6,lsl#12
+	 veor	d14,d14,d14
+	mov	r6,r7,lsr#8
+	orr	r5,r5,r7,lsl#18
+	 veor	d16,d16,d16
+	and	r3,r3,#0x03ffffff
+	orr	r6,r6,ip,lsl#24
+	 veor	d18,d18,d18
+	and	r4,r4,#0x03ffffff
+	mov	r1,#1
+	and	r5,r5,#0x03ffffff
+	str	r1,[r0,#36]		@ is_base2_26
+
+	vmov.32	d10[0],r2
+	vmov.32	d12[0],r3
+	vmov.32	d14[0],r4
+	vmov.32	d16[0],r5
+	vmov.32	d18[0],r6
+	adr	r5,.Lzeros
+
+	ldmia	sp!,{r1-r3,lr}
+	b	.Lbase2_32_neon
+
+.align	4
+.Lbase2_26_neon:
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ load hash value
+
+	veor		d10,d10,d10
+	veor		d12,d12,d12
+	veor		d14,d14,d14
+	veor		d16,d16,d16
+	veor		d18,d18,d18
+	vld4.32		{d10[0],d12[0],d14[0],d16[0]},[r0]!
+	adr		r5,.Lzeros
+	vld1.32		{d18[0]},[r0]
+	sub		r0,r0,#16		@ rewind
+
+.Lbase2_32_neon:
+	add		r4,r1,#32
+	mov		r3,r3,lsl#24
+	tst		r2,#31
+	beq		.Leven
+
+	vld4.32		{d20[0],d22[0],d24[0],d26[0]},[r1]!
+	vmov.32		d28[0],r3
+	sub		r2,r2,#16
+	add		r4,r1,#32
+
+#ifdef	__ARMEB__
+	vrev32.8	q10,q10
+	vrev32.8	q13,q13
+	vrev32.8	q11,q11
+	vrev32.8	q12,q12
+#endif
+	vsri.u32	d28,d26,#8	@ base 2^32 -> base 2^26
+	vshl.u32	d26,d26,#18
+
+	vsri.u32	d26,d24,#14
+	vshl.u32	d24,d24,#12
+	vadd.i32	d29,d28,d18	@ add hash value and move to #hi
+
+	vbic.i32	d26,#0xfc000000
+	vsri.u32	d24,d22,#20
+	vshl.u32	d22,d22,#6
+
+	vbic.i32	d24,#0xfc000000
+	vsri.u32	d22,d20,#26
+	vadd.i32	d27,d26,d16
+
+	vbic.i32	d20,#0xfc000000
+	vbic.i32	d22,#0xfc000000
+	vadd.i32	d25,d24,d14
+
+	vadd.i32	d21,d20,d10
+	vadd.i32	d23,d22,d12
+
+	mov		r7,r5
+	add		r6,r0,#48
+
+	cmp		r2,r2
+	b		.Long_tail
+
+.align	4
+.Leven:
+	subs		r2,r2,#64
+	it		lo
+	movlo		r4,r5
+
+	vmov.i32	q14,#1<<24		@ padbit, yes, always
+	vld4.32		{d20,d22,d24,d26},[r1]	@ inp[0:1]
+	add		r1,r1,#64
+	vld4.32		{d21,d23,d25,d27},[r4]	@ inp[2:3] (or 0)
+	add		r4,r4,#64
+	itt		hi
+	addhi		r7,r0,#(48+1*9*4)
+	addhi		r6,r0,#(48+3*9*4)
+
+#ifdef	__ARMEB__
+	vrev32.8	q10,q10
+	vrev32.8	q13,q13
+	vrev32.8	q11,q11
+	vrev32.8	q12,q12
+#endif
+	vsri.u32	q14,q13,#8		@ base 2^32 -> base 2^26
+	vshl.u32	q13,q13,#18
+
+	vsri.u32	q13,q12,#14
+	vshl.u32	q12,q12,#12
+
+	vbic.i32	q13,#0xfc000000
+	vsri.u32	q12,q11,#20
+	vshl.u32	q11,q11,#6
+
+	vbic.i32	q12,#0xfc000000
+	vsri.u32	q11,q10,#26
+
+	vbic.i32	q10,#0xfc000000
+	vbic.i32	q11,#0xfc000000
+
+	bls		.Lskip_loop
+
+	vld4.32		{d0[1],d1[1],d2[1],d3[1]},[r7]!	@ load r^2
+	vld4.32		{d0[0],d1[0],d2[0],d3[0]},[r6]!	@ load r^4
+	vld4.32		{d4[1],d5[1],d6[1],d7[1]},[r7]!
+	vld4.32		{d4[0],d5[0],d6[0],d7[0]},[r6]!
+	b		.Loop_neon
+
+.align	5
+.Loop_neon:
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
+	@ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
+	@   ___________________/
+	@ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
+	@ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
+	@   ___________________/ ____________________/
+	@
+	@ Note that we start with inp[2:3]*r^2. This is because it
+	@ doesn't depend on reduction in previous iteration.
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ d4 = h4*r0 + h3*r1   + h2*r2   + h1*r3   + h0*r4
+	@ d3 = h3*r0 + h2*r1   + h1*r2   + h0*r3   + h4*5*r4
+	@ d2 = h2*r0 + h1*r1   + h0*r2   + h4*5*r3 + h3*5*r4
+	@ d1 = h1*r0 + h0*r1   + h4*5*r2 + h3*5*r3 + h2*5*r4
+	@ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
+
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ inp[2:3]*r^2
+
+	vadd.i32	d24,d24,d14	@ accumulate inp[0:1]
+	vmull.u32	q7,d25,d0[1]
+	vadd.i32	d20,d20,d10
+	vmull.u32	q5,d21,d0[1]
+	vadd.i32	d26,d26,d16
+	vmull.u32	q8,d27,d0[1]
+	vmlal.u32	q7,d23,d1[1]
+	vadd.i32	d22,d22,d12
+	vmull.u32	q6,d23,d0[1]
+
+	vadd.i32	d28,d28,d18
+	vmull.u32	q9,d29,d0[1]
+	subs		r2,r2,#64
+	vmlal.u32	q5,d29,d2[1]
+	it		lo
+	movlo		r4,r5
+	vmlal.u32	q8,d25,d1[1]
+	vld1.32		d8[1],[r7,:32]
+	vmlal.u32	q6,d21,d1[1]
+	vmlal.u32	q9,d27,d1[1]
+
+	vmlal.u32	q5,d27,d4[1]
+	vmlal.u32	q8,d23,d3[1]
+	vmlal.u32	q9,d25,d3[1]
+	vmlal.u32	q6,d29,d4[1]
+	vmlal.u32	q7,d21,d3[1]
+
+	vmlal.u32	q8,d21,d5[1]
+	vmlal.u32	q5,d25,d6[1]
+	vmlal.u32	q9,d23,d5[1]
+	vmlal.u32	q6,d27,d6[1]
+	vmlal.u32	q7,d29,d6[1]
+
+	vmlal.u32	q8,d29,d8[1]
+	vmlal.u32	q5,d23,d8[1]
+	vmlal.u32	q9,d21,d7[1]
+	vmlal.u32	q6,d25,d8[1]
+	vmlal.u32	q7,d27,d8[1]
+
+	vld4.32		{d21,d23,d25,d27},[r4]	@ inp[2:3] (or 0)
+	add		r4,r4,#64
+
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ (hash+inp[0:1])*r^4 and accumulate
+
+	vmlal.u32	q8,d26,d0[0]
+	vmlal.u32	q5,d20,d0[0]
+	vmlal.u32	q9,d28,d0[0]
+	vmlal.u32	q6,d22,d0[0]
+	vmlal.u32	q7,d24,d0[0]
+	vld1.32		d8[0],[r6,:32]
+
+	vmlal.u32	q8,d24,d1[0]
+	vmlal.u32	q5,d28,d2[0]
+	vmlal.u32	q9,d26,d1[0]
+	vmlal.u32	q6,d20,d1[0]
+	vmlal.u32	q7,d22,d1[0]
+
+	vmlal.u32	q8,d22,d3[0]
+	vmlal.u32	q5,d26,d4[0]
+	vmlal.u32	q9,d24,d3[0]
+	vmlal.u32	q6,d28,d4[0]
+	vmlal.u32	q7,d20,d3[0]
+
+	vmlal.u32	q8,d20,d5[0]
+	vmlal.u32	q5,d24,d6[0]
+	vmlal.u32	q9,d22,d5[0]
+	vmlal.u32	q6,d26,d6[0]
+	vmlal.u32	q8,d28,d8[0]
+
+	vmlal.u32	q7,d28,d6[0]
+	vmlal.u32	q5,d22,d8[0]
+	vmlal.u32	q9,d20,d7[0]
+	vmov.i32	q14,#1<<24		@ padbit, yes, always
+	vmlal.u32	q6,d24,d8[0]
+	vmlal.u32	q7,d26,d8[0]
+
+	vld4.32		{d20,d22,d24,d26},[r1]	@ inp[0:1]
+	add		r1,r1,#64
+#ifdef	__ARMEB__
+	vrev32.8	q10,q10
+	vrev32.8	q11,q11
+	vrev32.8	q12,q12
+	vrev32.8	q13,q13
+#endif
+
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ lazy reduction interleaved with base 2^32 -> base 2^26 of
+	@ inp[0:3] previously loaded to q10-q13 and smashed to q10-q14.
+
+	vshr.u64	q15,q8,#26
+	vmovn.i64	d16,q8
+	 vshr.u64	q4,q5,#26
+	 vmovn.i64	d10,q5
+	vadd.i64	q9,q9,q15		@ h3 -> h4
+	vbic.i32	d16,#0xfc000000
+	  vsri.u32	q14,q13,#8		@ base 2^32 -> base 2^26
+	 vadd.i64	q6,q6,q4		@ h0 -> h1
+	  vshl.u32	q13,q13,#18
+	 vbic.i32	d10,#0xfc000000
+
+	vshrn.u64	d30,q9,#26
+	vmovn.i64	d18,q9
+	 vshr.u64	q4,q6,#26
+	 vmovn.i64	d12,q6
+	 vadd.i64	q7,q7,q4		@ h1 -> h2
+	  vsri.u32	q13,q12,#14
+	vbic.i32	d18,#0xfc000000
+	  vshl.u32	q12,q12,#12
+	 vbic.i32	d12,#0xfc000000
+
+	vadd.i32	d10,d10,d30
+	vshl.u32	d30,d30,#2
+	  vbic.i32	q13,#0xfc000000
+	 vshrn.u64	d8,q7,#26
+	 vmovn.i64	d14,q7
+	vaddl.u32	q5,d10,d30	@ h4 -> h0 [widen for a sec]
+	  vsri.u32	q12,q11,#20
+	 vadd.i32	d16,d16,d8	@ h2 -> h3
+	  vshl.u32	q11,q11,#6
+	 vbic.i32	d14,#0xfc000000
+	  vbic.i32	q12,#0xfc000000
+
+	vshrn.u64	d30,q5,#26		@ re-narrow
+	vmovn.i64	d10,q5
+	  vsri.u32	q11,q10,#26
+	  vbic.i32	q10,#0xfc000000
+	 vshr.u32	d8,d16,#26
+	 vbic.i32	d16,#0xfc000000
+	vbic.i32	d10,#0xfc000000
+	vadd.i32	d12,d12,d30	@ h0 -> h1
+	 vadd.i32	d18,d18,d8	@ h3 -> h4
+	  vbic.i32	q11,#0xfc000000
+
+	bhi		.Loop_neon
+
+.Lskip_loop:
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
+
+	add		r7,r0,#(48+0*9*4)
+	add		r6,r0,#(48+1*9*4)
+	adds		r2,r2,#32
+	it		ne
+	movne		r2,#0
+	bne		.Long_tail
+
+	vadd.i32	d25,d24,d14	@ add hash value and move to #hi
+	vadd.i32	d21,d20,d10
+	vadd.i32	d27,d26,d16
+	vadd.i32	d23,d22,d12
+	vadd.i32	d29,d28,d18
+
+.Long_tail:
+	vld4.32		{d0[1],d1[1],d2[1],d3[1]},[r7]!	@ load r^1
+	vld4.32		{d0[0],d1[0],d2[0],d3[0]},[r6]!	@ load r^2
+
+	vadd.i32	d24,d24,d14	@ can be redundant
+	vmull.u32	q7,d25,d0
+	vadd.i32	d20,d20,d10
+	vmull.u32	q5,d21,d0
+	vadd.i32	d26,d26,d16
+	vmull.u32	q8,d27,d0
+	vadd.i32	d22,d22,d12
+	vmull.u32	q6,d23,d0
+	vadd.i32	d28,d28,d18
+	vmull.u32	q9,d29,d0
+
+	vmlal.u32	q5,d29,d2
+	vld4.32		{d4[1],d5[1],d6[1],d7[1]},[r7]!
+	vmlal.u32	q8,d25,d1
+	vld4.32		{d4[0],d5[0],d6[0],d7[0]},[r6]!
+	vmlal.u32	q6,d21,d1
+	vmlal.u32	q9,d27,d1
+	vmlal.u32	q7,d23,d1
+
+	vmlal.u32	q8,d23,d3
+	vld1.32		d8[1],[r7,:32]
+	vmlal.u32	q5,d27,d4
+	vld1.32		d8[0],[r6,:32]
+	vmlal.u32	q9,d25,d3
+	vmlal.u32	q6,d29,d4
+	vmlal.u32	q7,d21,d3
+
+	vmlal.u32	q8,d21,d5
+	 it		ne
+	 addne		r7,r0,#(48+2*9*4)
+	vmlal.u32	q5,d25,d6
+	 it		ne
+	 addne		r6,r0,#(48+3*9*4)
+	vmlal.u32	q9,d23,d5
+	vmlal.u32	q6,d27,d6
+	vmlal.u32	q7,d29,d6
+
+	vmlal.u32	q8,d29,d8
+	 vorn		q0,q0,q0	@ all-ones, can be redundant
+	vmlal.u32	q5,d23,d8
+	 vshr.u64	q0,q0,#38
+	vmlal.u32	q9,d21,d7
+	vmlal.u32	q6,d25,d8
+	vmlal.u32	q7,d27,d8
+
+	beq		.Lshort_tail
+
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ (hash+inp[0:1])*r^4:r^3 and accumulate
+
+	vld4.32		{d0[1],d1[1],d2[1],d3[1]},[r7]!	@ load r^3
+	vld4.32		{d0[0],d1[0],d2[0],d3[0]},[r6]!	@ load r^4
+
+	vmlal.u32	q7,d24,d0
+	vmlal.u32	q5,d20,d0
+	vmlal.u32	q8,d26,d0
+	vmlal.u32	q6,d22,d0
+	vmlal.u32	q9,d28,d0
+
+	vmlal.u32	q5,d28,d2
+	vld4.32		{d4[1],d5[1],d6[1],d7[1]},[r7]!
+	vmlal.u32	q8,d24,d1
+	vld4.32		{d4[0],d5[0],d6[0],d7[0]},[r6]!
+	vmlal.u32	q6,d20,d1
+	vmlal.u32	q9,d26,d1
+	vmlal.u32	q7,d22,d1
+
+	vmlal.u32	q8,d22,d3
+	vld1.32		d8[1],[r7,:32]
+	vmlal.u32	q5,d26,d4
+	vld1.32		d8[0],[r6,:32]
+	vmlal.u32	q9,d24,d3
+	vmlal.u32	q6,d28,d4
+	vmlal.u32	q7,d20,d3
+
+	vmlal.u32	q8,d20,d5
+	vmlal.u32	q5,d24,d6
+	vmlal.u32	q9,d22,d5
+	vmlal.u32	q6,d26,d6
+	vmlal.u32	q7,d28,d6
+
+	vmlal.u32	q8,d28,d8
+	 vorn		q0,q0,q0	@ all-ones
+	vmlal.u32	q5,d22,d8
+	 vshr.u64	q0,q0,#38
+	vmlal.u32	q9,d20,d7
+	vmlal.u32	q6,d24,d8
+	vmlal.u32	q7,d26,d8
+
+.Lshort_tail:
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ horizontal addition
+
+	vadd.i64	d16,d16,d17
+	vadd.i64	d10,d10,d11
+	vadd.i64	d18,d18,d19
+	vadd.i64	d12,d12,d13
+	vadd.i64	d14,d14,d15
+
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ lazy reduction, but without narrowing
+
+	vshr.u64	q15,q8,#26
+	vand.i64	q8,q8,q0
+	 vshr.u64	q4,q5,#26
+	 vand.i64	q5,q5,q0
+	vadd.i64	q9,q9,q15		@ h3 -> h4
+	 vadd.i64	q6,q6,q4		@ h0 -> h1
+
+	vshr.u64	q15,q9,#26
+	vand.i64	q9,q9,q0
+	 vshr.u64	q4,q6,#26
+	 vand.i64	q6,q6,q0
+	 vadd.i64	q7,q7,q4		@ h1 -> h2
+
+	vadd.i64	q5,q5,q15
+	vshl.u64	q15,q15,#2
+	 vshr.u64	q4,q7,#26
+	 vand.i64	q7,q7,q0
+	vadd.i64	q5,q5,q15		@ h4 -> h0
+	 vadd.i64	q8,q8,q4		@ h2 -> h3
+
+	vshr.u64	q15,q5,#26
+	vand.i64	q5,q5,q0
+	 vshr.u64	q4,q8,#26
+	 vand.i64	q8,q8,q0
+	vadd.i64	q6,q6,q15		@ h0 -> h1
+	 vadd.i64	q9,q9,q4		@ h3 -> h4
+
+	cmp		r2,#0
+	bne		.Leven
+
+	@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+	@ store hash value
+
+	vst4.32		{d10[0],d12[0],d14[0],d16[0]},[r0]!
+	vst1.32		{d18[0]},[r0]
+
+	vldmia	sp!,{d8-d15}			@ epilogue
+	ldmia	sp!,{r4-r7}
+.Lno_data_neon:
+	bx	lr					@ bx	lr
+ENDPROC(poly1305_blocks_neon)
+
+.align	5
+ENTRY(poly1305_emit_neon)
+	ldr	ip,[r0,#36]		@ is_base2_26
+
+	stmdb	sp!,{r4-r11}
+
+	tst	ip,ip
+	beq	.Lpoly1305_emit_enter
+
+	ldmia	r0,{r3-r7}
+	eor	r8,r8,r8
+
+	adds	r3,r3,r4,lsl#26	@ base 2^26 -> base 2^32
+	mov	r4,r4,lsr#6
+	adcs	r4,r4,r5,lsl#20
+	mov	r5,r5,lsr#12
+	adcs	r5,r5,r6,lsl#14
+	mov	r6,r6,lsr#18
+	adcs	r6,r6,r7,lsl#8
+	adc	r7,r8,r7,lsr#24	@ can be partially reduced ...
+
+	and	r8,r7,#-4		@ ... so reduce
+	and	r7,r6,#3
+	add	r8,r8,r8,lsr#2	@ *= 5
+	adds	r3,r3,r8
+	adcs	r4,r4,#0
+	adcs	r5,r5,#0
+	adcs	r6,r6,#0
+	adc	r7,r7,#0
+
+	adds	r8,r3,#5		@ compare to modulus
+	adcs	r9,r4,#0
+	adcs	r10,r5,#0
+	adcs	r11,r6,#0
+	adc	r7,r7,#0
+	tst	r7,#4			@ did it carry/borrow?
+
+	it	ne
+	movne	r3,r8
+	ldr	r8,[r2,#0]
+	it	ne
+	movne	r4,r9
+	ldr	r9,[r2,#4]
+	it	ne
+	movne	r5,r10
+	ldr	r10,[r2,#8]
+	it	ne
+	movne	r6,r11
+	ldr	r11,[r2,#12]
+
+	adds	r3,r3,r8		@ accumulate nonce
+	adcs	r4,r4,r9
+	adcs	r5,r5,r10
+	adc	r6,r6,r11
+
+#ifdef __ARMEB__
+	rev	r3,r3
+	rev	r4,r4
+	rev	r5,r5
+	rev	r6,r6
+#endif
+	str	r3,[r1,#0]		@ store the result
+	str	r4,[r1,#4]
+	str	r5,[r1,#8]
+	str	r6,[r1,#12]
+
+	ldmia	sp!,{r4-r11}
+	bx	lr				@ bx	lr
+ENDPROC(poly1305_emit_neon)
+
+.align	5
+.Lzeros:
+.long	0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+#endif
diff --git a/lib/zinc/poly1305/poly1305-arm64.S b/lib/zinc/poly1305/poly1305-arm64.S
new file mode 100644
index 000000000000..c20023544183
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-arm64.S
@@ -0,0 +1,822 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#include <linux/linkage.h>
+.text
+
+.align	5
+ENTRY(poly1305_init_arm)
+	cmp	x1,xzr
+	stp	xzr,xzr,[x0]		// zero hash value
+	stp	xzr,xzr,[x0,#16]	// [along with is_base2_26]
+
+	csel	x0,xzr,x0,eq
+	b.eq	.Lno_key
+
+	ldp	x7,x8,[x1]		// load key
+	mov	x9,#0xfffffffc0fffffff
+	movk	x9,#0x0fff,lsl#48
+#ifdef	__ARMEB__
+	rev	x7,x7			// flip bytes
+	rev	x8,x8
+#endif
+	and	x7,x7,x9		// &=0ffffffc0fffffff
+	and	x9,x9,#-4
+	and	x8,x8,x9		// &=0ffffffc0ffffffc
+	stp	x7,x8,[x0,#32]	// save key value
+
+.Lno_key:
+	ret
+ENDPROC(poly1305_init_arm)
+
+.align	5
+ENTRY(poly1305_blocks_arm)
+	ands	x2,x2,#-16
+	b.eq	.Lno_data
+
+	ldp	x4,x5,[x0]		// load hash value
+	ldp	x7,x8,[x0,#32]	// load key value
+	ldr	x6,[x0,#16]
+	add	x9,x8,x8,lsr#2	// s1 = r1 + (r1 >> 2)
+	b	.Loop
+
+.align	5
+.Loop:
+	ldp	x10,x11,[x1],#16	// load input
+	sub	x2,x2,#16
+#ifdef	__ARMEB__
+	rev	x10,x10
+	rev	x11,x11
+#endif
+	adds	x4,x4,x10		// accumulate input
+	adcs	x5,x5,x11
+
+	mul	x12,x4,x7		// h0*r0
+	adc	x6,x6,x3
+	umulh	x13,x4,x7
+
+	mul	x10,x5,x9		// h1*5*r1
+	umulh	x11,x5,x9
+
+	adds	x12,x12,x10
+	mul	x10,x4,x8		// h0*r1
+	adc	x13,x13,x11
+	umulh	x14,x4,x8
+
+	adds	x13,x13,x10
+	mul	x10,x5,x7		// h1*r0
+	adc	x14,x14,xzr
+	umulh	x11,x5,x7
+
+	adds	x13,x13,x10
+	mul	x10,x6,x9		// h2*5*r1
+	adc	x14,x14,x11
+	mul	x11,x6,x7		// h2*r0
+
+	adds	x13,x13,x10
+	adc	x14,x14,x11
+
+	and	x10,x14,#-4		// final reduction
+	and	x6,x14,#3
+	add	x10,x10,x14,lsr#2
+	adds	x4,x12,x10
+	adcs	x5,x13,xzr
+	adc	x6,x6,xzr
+
+	cbnz	x2,.Loop
+
+	stp	x4,x5,[x0]		// store hash value
+	str	x6,[x0,#16]
+
+.Lno_data:
+	ret
+ENDPROC(poly1305_blocks_arm)
+
+.align	5
+ENTRY(poly1305_emit_arm)
+	ldp	x4,x5,[x0]		// load hash base 2^64
+	ldr	x6,[x0,#16]
+	ldp	x10,x11,[x2]	// load nonce
+
+	adds	x12,x4,#5		// compare to modulus
+	adcs	x13,x5,xzr
+	adc	x14,x6,xzr
+
+	tst	x14,#-4			// see if it's carried/borrowed
+
+	csel	x4,x4,x12,eq
+	csel	x5,x5,x13,eq
+
+#ifdef	__ARMEB__
+	ror	x10,x10,#32		// flip nonce words
+	ror	x11,x11,#32
+#endif
+	adds	x4,x4,x10		// accumulate nonce
+	adc	x5,x5,x11
+#ifdef	__ARMEB__
+	rev	x4,x4			// flip output bytes
+	rev	x5,x5
+#endif
+	stp	x4,x5,[x1]		// write result
+
+	ret
+ENDPROC(poly1305_emit_arm)
+
+.align	5
+__poly1305_mult:
+	mul	x12,x4,x7		// h0*r0
+	umulh	x13,x4,x7
+
+	mul	x10,x5,x9		// h1*5*r1
+	umulh	x11,x5,x9
+
+	adds	x12,x12,x10
+	mul	x10,x4,x8		// h0*r1
+	adc	x13,x13,x11
+	umulh	x14,x4,x8
+
+	adds	x13,x13,x10
+	mul	x10,x5,x7		// h1*r0
+	adc	x14,x14,xzr
+	umulh	x11,x5,x7
+
+	adds	x13,x13,x10
+	mul	x10,x6,x9		// h2*5*r1
+	adc	x14,x14,x11
+	mul	x11,x6,x7		// h2*r0
+
+	adds	x13,x13,x10
+	adc	x14,x14,x11
+
+	and	x10,x14,#-4		// final reduction
+	and	x6,x14,#3
+	add	x10,x10,x14,lsr#2
+	adds	x4,x12,x10
+	adcs	x5,x13,xzr
+	adc	x6,x6,xzr
+
+	ret
+
+__poly1305_splat:
+	and	x12,x4,#0x03ffffff	// base 2^64 -> base 2^26
+	ubfx	x13,x4,#26,#26
+	extr	x14,x5,x4,#52
+	and	x14,x14,#0x03ffffff
+	ubfx	x15,x5,#14,#26
+	extr	x16,x6,x5,#40
+
+	str	w12,[x0,#16*0]	// r0
+	add	w12,w13,w13,lsl#2	// r1*5
+	str	w13,[x0,#16*1]	// r1
+	add	w13,w14,w14,lsl#2	// r2*5
+	str	w12,[x0,#16*2]	// s1
+	str	w14,[x0,#16*3]	// r2
+	add	w14,w15,w15,lsl#2	// r3*5
+	str	w13,[x0,#16*4]	// s2
+	str	w15,[x0,#16*5]	// r3
+	add	w15,w16,w16,lsl#2	// r4*5
+	str	w14,[x0,#16*6]	// s3
+	str	w16,[x0,#16*7]	// r4
+	str	w15,[x0,#16*8]	// s4
+
+	ret
+
+.align	5
+ENTRY(poly1305_blocks_neon)
+	ldr	x17,[x0,#24]
+	cmp	x2,#128
+	b.hs	.Lblocks_neon
+	cbz	x17,poly1305_blocks_arm
+
+.Lblocks_neon:
+	stp	x29,x30,[sp,#-80]!
+	add	x29,sp,#0
+
+	ands	x2,x2,#-16
+	b.eq	.Lno_data_neon
+
+	cbz	x17,.Lbase2_64_neon
+
+	ldp	w10,w11,[x0]		// load hash value base 2^26
+	ldp	w12,w13,[x0,#8]
+	ldr	w14,[x0,#16]
+
+	tst	x2,#31
+	b.eq	.Leven_neon
+
+	ldp	x7,x8,[x0,#32]	// load key value
+
+	add	x4,x10,x11,lsl#26	// base 2^26 -> base 2^64
+	lsr	x5,x12,#12
+	adds	x4,x4,x12,lsl#52
+	add	x5,x5,x13,lsl#14
+	adc	x5,x5,xzr
+	lsr	x6,x14,#24
+	adds	x5,x5,x14,lsl#40
+	adc	x14,x6,xzr		// can be partially reduced...
+
+	ldp	x12,x13,[x1],#16	// load input
+	sub	x2,x2,#16
+	add	x9,x8,x8,lsr#2	// s1 = r1 + (r1 >> 2)
+
+	and	x10,x14,#-4		// ... so reduce
+	and	x6,x14,#3
+	add	x10,x10,x14,lsr#2
+	adds	x4,x4,x10
+	adcs	x5,x5,xzr
+	adc	x6,x6,xzr
+
+#ifdef	__ARMEB__
+	rev	x12,x12
+	rev	x13,x13
+#endif
+	adds	x4,x4,x12		// accumulate input
+	adcs	x5,x5,x13
+	adc	x6,x6,x3
+
+	bl	__poly1305_mult
+	ldr	x30,[sp,#8]
+
+	cbz	x3,.Lstore_base2_64_neon
+
+	and	x10,x4,#0x03ffffff	// base 2^64 -> base 2^26
+	ubfx	x11,x4,#26,#26
+	extr	x12,x5,x4,#52
+	and	x12,x12,#0x03ffffff
+	ubfx	x13,x5,#14,#26
+	extr	x14,x6,x5,#40
+
+	cbnz	x2,.Leven_neon
+
+	stp	w10,w11,[x0]		// store hash value base 2^26
+	stp	w12,w13,[x0,#8]
+	str	w14,[x0,#16]
+	b	.Lno_data_neon
+
+.align	4
+.Lstore_base2_64_neon:
+	stp	x4,x5,[x0]		// store hash value base 2^64
+	stp	x6,xzr,[x0,#16]	// note that is_base2_26 is zeroed
+	b	.Lno_data_neon
+
+.align	4
+.Lbase2_64_neon:
+	ldp	x7,x8,[x0,#32]	// load key value
+
+	ldp	x4,x5,[x0]		// load hash value base 2^64
+	ldr	x6,[x0,#16]
+
+	tst	x2,#31
+	b.eq	.Linit_neon
+
+	ldp	x12,x13,[x1],#16	// load input
+	sub	x2,x2,#16
+	add	x9,x8,x8,lsr#2	// s1 = r1 + (r1 >> 2)
+#ifdef	__ARMEB__
+	rev	x12,x12
+	rev	x13,x13
+#endif
+	adds	x4,x4,x12		// accumulate input
+	adcs	x5,x5,x13
+	adc	x6,x6,x3
+
+	bl	__poly1305_mult
+
+.Linit_neon:
+	and	x10,x4,#0x03ffffff	// base 2^64 -> base 2^26
+	ubfx	x11,x4,#26,#26
+	extr	x12,x5,x4,#52
+	and	x12,x12,#0x03ffffff
+	ubfx	x13,x5,#14,#26
+	extr	x14,x6,x5,#40
+
+	stp	d8,d9,[sp,#16]		// meet ABI requirements
+	stp	d10,d11,[sp,#32]
+	stp	d12,d13,[sp,#48]
+	stp	d14,d15,[sp,#64]
+
+	fmov	d24,x10
+	fmov	d25,x11
+	fmov	d26,x12
+	fmov	d27,x13
+	fmov	d28,x14
+
+	////////////////////////////////// initialize r^n table
+	mov	x4,x7			// r^1
+	add	x9,x8,x8,lsr#2	// s1 = r1 + (r1 >> 2)
+	mov	x5,x8
+	mov	x6,xzr
+	add	x0,x0,#48+12
+	bl	__poly1305_splat
+
+	bl	__poly1305_mult		// r^2
+	sub	x0,x0,#4
+	bl	__poly1305_splat
+
+	bl	__poly1305_mult		// r^3
+	sub	x0,x0,#4
+	bl	__poly1305_splat
+
+	bl	__poly1305_mult		// r^4
+	sub	x0,x0,#4
+	bl	__poly1305_splat
+	ldr	x30,[sp,#8]
+
+	add	x16,x1,#32
+	adr	x17,.Lzeros
+	subs	x2,x2,#64
+	csel	x16,x17,x16,lo
+
+	mov	x4,#1
+	str	x4,[x0,#-24]		// set is_base2_26
+	sub	x0,x0,#48		// restore original x0
+	b	.Ldo_neon
+
+.align	4
+.Leven_neon:
+	add	x16,x1,#32
+	adr	x17,.Lzeros
+	subs	x2,x2,#64
+	csel	x16,x17,x16,lo
+
+	stp	d8,d9,[sp,#16]		// meet ABI requirements
+	stp	d10,d11,[sp,#32]
+	stp	d12,d13,[sp,#48]
+	stp	d14,d15,[sp,#64]
+
+	fmov	d24,x10
+	fmov	d25,x11
+	fmov	d26,x12
+	fmov	d27,x13
+	fmov	d28,x14
+
+.Ldo_neon:
+	ldp	x8,x12,[x16],#16	// inp[2:3] (or zero)
+	ldp	x9,x13,[x16],#48
+
+	lsl	x3,x3,#24
+	add	x15,x0,#48
+
+#ifdef	__ARMEB__
+	rev	x8,x8
+	rev	x12,x12
+	rev	x9,x9
+	rev	x13,x13
+#endif
+	and	x4,x8,#0x03ffffff	// base 2^64 -> base 2^26
+	and	x5,x9,#0x03ffffff
+	ubfx	x6,x8,#26,#26
+	ubfx	x7,x9,#26,#26
+	add	x4,x4,x5,lsl#32		// bfi	x4,x5,#32,#32
+	extr	x8,x12,x8,#52
+	extr	x9,x13,x9,#52
+	add	x6,x6,x7,lsl#32		// bfi	x6,x7,#32,#32
+	fmov	d14,x4
+	and	x8,x8,#0x03ffffff
+	and	x9,x9,#0x03ffffff
+	ubfx	x10,x12,#14,#26
+	ubfx	x11,x13,#14,#26
+	add	x12,x3,x12,lsr#40
+	add	x13,x3,x13,lsr#40
+	add	x8,x8,x9,lsl#32		// bfi	x8,x9,#32,#32
+	fmov	d15,x6
+	add	x10,x10,x11,lsl#32	// bfi	x10,x11,#32,#32
+	add	x12,x12,x13,lsl#32	// bfi	x12,x13,#32,#32
+	fmov	d16,x8
+	fmov	d17,x10
+	fmov	d18,x12
+
+	ldp	x8,x12,[x1],#16	// inp[0:1]
+	ldp	x9,x13,[x1],#48
+
+	ld1	{v0.4s,v1.4s,v2.4s,v3.4s},[x15],#64
+	ld1	{v4.4s,v5.4s,v6.4s,v7.4s},[x15],#64
+	ld1	{v8.4s},[x15]
+
+#ifdef	__ARMEB__
+	rev	x8,x8
+	rev	x12,x12
+	rev	x9,x9
+	rev	x13,x13
+#endif
+	and	x4,x8,#0x03ffffff	// base 2^64 -> base 2^26
+	and	x5,x9,#0x03ffffff
+	ubfx	x6,x8,#26,#26
+	ubfx	x7,x9,#26,#26
+	add	x4,x4,x5,lsl#32		// bfi	x4,x5,#32,#32
+	extr	x8,x12,x8,#52
+	extr	x9,x13,x9,#52
+	add	x6,x6,x7,lsl#32		// bfi	x6,x7,#32,#32
+	fmov	d9,x4
+	and	x8,x8,#0x03ffffff
+	and	x9,x9,#0x03ffffff
+	ubfx	x10,x12,#14,#26
+	ubfx	x11,x13,#14,#26
+	add	x12,x3,x12,lsr#40
+	add	x13,x3,x13,lsr#40
+	add	x8,x8,x9,lsl#32		// bfi	x8,x9,#32,#32
+	fmov	d10,x6
+	add	x10,x10,x11,lsl#32	// bfi	x10,x11,#32,#32
+	add	x12,x12,x13,lsl#32	// bfi	x12,x13,#32,#32
+	movi	v31.2d,#-1
+	fmov	d11,x8
+	fmov	d12,x10
+	fmov	d13,x12
+	ushr	v31.2d,v31.2d,#38
+
+	b.ls	.Lskip_loop
+
+.align	4
+.Loop_neon:
+	////////////////////////////////////////////////////////////////
+	// ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
+	// ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
+	//   ___________________/
+	// ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
+	// ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
+	//   ___________________/ ____________________/
+	//
+	// Note that we start with inp[2:3]*r^2. This is because it
+	// doesn't depend on reduction in previous iteration.
+	////////////////////////////////////////////////////////////////
+	// d4 = h0*r4 + h1*r3   + h2*r2   + h3*r1   + h4*r0
+	// d3 = h0*r3 + h1*r2   + h2*r1   + h3*r0   + h4*5*r4
+	// d2 = h0*r2 + h1*r1   + h2*r0   + h3*5*r4 + h4*5*r3
+	// d1 = h0*r1 + h1*r0   + h2*5*r4 + h3*5*r3 + h4*5*r2
+	// d0 = h0*r0 + h1*5*r4 + h2*5*r3 + h3*5*r2 + h4*5*r1
+
+	subs	x2,x2,#64
+	umull	v23.2d,v14.2s,v7.s[2]
+	csel	x16,x17,x16,lo
+	umull	v22.2d,v14.2s,v5.s[2]
+	umull	v21.2d,v14.2s,v3.s[2]
+	ldp	x8,x12,[x16],#16	// inp[2:3] (or zero)
+	umull	v20.2d,v14.2s,v1.s[2]
+	ldp	x9,x13,[x16],#48
+	umull	v19.2d,v14.2s,v0.s[2]
+#ifdef	__ARMEB__
+	rev	x8,x8
+	rev	x12,x12
+	rev	x9,x9
+	rev	x13,x13
+#endif
+
+	umlal	v23.2d,v15.2s,v5.s[2]
+	and	x4,x8,#0x03ffffff	// base 2^64 -> base 2^26
+	umlal	v22.2d,v15.2s,v3.s[2]
+	and	x5,x9,#0x03ffffff
+	umlal	v21.2d,v15.2s,v1.s[2]
+	ubfx	x6,x8,#26,#26
+	umlal	v20.2d,v15.2s,v0.s[2]
+	ubfx	x7,x9,#26,#26
+	umlal	v19.2d,v15.2s,v8.s[2]
+	add	x4,x4,x5,lsl#32		// bfi	x4,x5,#32,#32
+
+	umlal	v23.2d,v16.2s,v3.s[2]
+	extr	x8,x12,x8,#52
+	umlal	v22.2d,v16.2s,v1.s[2]
+	extr	x9,x13,x9,#52
+	umlal	v21.2d,v16.2s,v0.s[2]
+	add	x6,x6,x7,lsl#32		// bfi	x6,x7,#32,#32
+	umlal	v20.2d,v16.2s,v8.s[2]
+	fmov	d14,x4
+	umlal	v19.2d,v16.2s,v6.s[2]
+	and	x8,x8,#0x03ffffff
+
+	umlal	v23.2d,v17.2s,v1.s[2]
+	and	x9,x9,#0x03ffffff
+	umlal	v22.2d,v17.2s,v0.s[2]
+	ubfx	x10,x12,#14,#26
+	umlal	v21.2d,v17.2s,v8.s[2]
+	ubfx	x11,x13,#14,#26
+	umlal	v20.2d,v17.2s,v6.s[2]
+	add	x8,x8,x9,lsl#32		// bfi	x8,x9,#32,#32
+	umlal	v19.2d,v17.2s,v4.s[2]
+	fmov	d15,x6
+
+	add	v11.2s,v11.2s,v26.2s
+	add	x12,x3,x12,lsr#40
+	umlal	v23.2d,v18.2s,v0.s[2]
+	add	x13,x3,x13,lsr#40
+	umlal	v22.2d,v18.2s,v8.s[2]
+	add	x10,x10,x11,lsl#32	// bfi	x10,x11,#32,#32
+	umlal	v21.2d,v18.2s,v6.s[2]
+	add	x12,x12,x13,lsl#32	// bfi	x12,x13,#32,#32
+	umlal	v20.2d,v18.2s,v4.s[2]
+	fmov	d16,x8
+	umlal	v19.2d,v18.2s,v2.s[2]
+	fmov	d17,x10
+
+	////////////////////////////////////////////////////////////////
+	// (hash+inp[0:1])*r^4 and accumulate
+
+	add	v9.2s,v9.2s,v24.2s
+	fmov	d18,x12
+	umlal	v22.2d,v11.2s,v1.s[0]
+	ldp	x8,x12,[x1],#16	// inp[0:1]
+	umlal	v19.2d,v11.2s,v6.s[0]
+	ldp	x9,x13,[x1],#48
+	umlal	v23.2d,v11.2s,v3.s[0]
+	umlal	v20.2d,v11.2s,v8.s[0]
+	umlal	v21.2d,v11.2s,v0.s[0]
+#ifdef	__ARMEB__
+	rev	x8,x8
+	rev	x12,x12
+	rev	x9,x9
+	rev	x13,x13
+#endif
+
+	add	v10.2s,v10.2s,v25.2s
+	umlal	v22.2d,v9.2s,v5.s[0]
+	umlal	v23.2d,v9.2s,v7.s[0]
+	and	x4,x8,#0x03ffffff	// base 2^64 -> base 2^26
+	umlal	v21.2d,v9.2s,v3.s[0]
+	and	x5,x9,#0x03ffffff
+	umlal	v19.2d,v9.2s,v0.s[0]
+	ubfx	x6,x8,#26,#26
+	umlal	v20.2d,v9.2s,v1.s[0]
+	ubfx	x7,x9,#26,#26
+
+	add	v12.2s,v12.2s,v27.2s
+	add	x4,x4,x5,lsl#32		// bfi	x4,x5,#32,#32
+	umlal	v22.2d,v10.2s,v3.s[0]
+	extr	x8,x12,x8,#52
+	umlal	v23.2d,v10.2s,v5.s[0]
+	extr	x9,x13,x9,#52
+	umlal	v19.2d,v10.2s,v8.s[0]
+	add	x6,x6,x7,lsl#32		// bfi	x6,x7,#32,#32
+	umlal	v21.2d,v10.2s,v1.s[0]
+	fmov	d9,x4
+	umlal	v20.2d,v10.2s,v0.s[0]
+	and	x8,x8,#0x03ffffff
+
+	add	v13.2s,v13.2s,v28.2s
+	and	x9,x9,#0x03ffffff
+	umlal	v22.2d,v12.2s,v0.s[0]
+	ubfx	x10,x12,#14,#26
+	umlal	v19.2d,v12.2s,v4.s[0]
+	ubfx	x11,x13,#14,#26
+	umlal	v23.2d,v12.2s,v1.s[0]
+	add	x8,x8,x9,lsl#32		// bfi	x8,x9,#32,#32
+	umlal	v20.2d,v12.2s,v6.s[0]
+	fmov	d10,x6
+	umlal	v21.2d,v12.2s,v8.s[0]
+	add	x12,x3,x12,lsr#40
+
+	umlal	v22.2d,v13.2s,v8.s[0]
+	add	x13,x3,x13,lsr#40
+	umlal	v19.2d,v13.2s,v2.s[0]
+	add	x10,x10,x11,lsl#32	// bfi	x10,x11,#32,#32
+	umlal	v23.2d,v13.2s,v0.s[0]
+	add	x12,x12,x13,lsl#32	// bfi	x12,x13,#32,#32
+	umlal	v20.2d,v13.2s,v4.s[0]
+	fmov	d11,x8
+	umlal	v21.2d,v13.2s,v6.s[0]
+	fmov	d12,x10
+	fmov	d13,x12
+
+	/////////////////////////////////////////////////////////////////
+	// lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
+	// and P. Schwabe
+	//
+	// [see discussion in poly1305-armv4 module]
+
+	ushr	v29.2d,v22.2d,#26
+	xtn	v27.2s,v22.2d
+	ushr	v30.2d,v19.2d,#26
+	and	v19.16b,v19.16b,v31.16b
+	add	v23.2d,v23.2d,v29.2d	// h3 -> h4
+	bic	v27.2s,#0xfc,lsl#24	// &=0x03ffffff
+	add	v20.2d,v20.2d,v30.2d	// h0 -> h1
+
+	ushr	v29.2d,v23.2d,#26
+	xtn	v28.2s,v23.2d
+	ushr	v30.2d,v20.2d,#26
+	xtn	v25.2s,v20.2d
+	bic	v28.2s,#0xfc,lsl#24
+	add	v21.2d,v21.2d,v30.2d	// h1 -> h2
+
+	add	v19.2d,v19.2d,v29.2d
+	shl	v29.2d,v29.2d,#2
+	shrn	v30.2s,v21.2d,#26
+	xtn	v26.2s,v21.2d
+	add	v19.2d,v19.2d,v29.2d	// h4 -> h0
+	bic	v25.2s,#0xfc,lsl#24
+	add	v27.2s,v27.2s,v30.2s		// h2 -> h3
+	bic	v26.2s,#0xfc,lsl#24
+
+	shrn	v29.2s,v19.2d,#26
+	xtn	v24.2s,v19.2d
+	ushr	v30.2s,v27.2s,#26
+	bic	v27.2s,#0xfc,lsl#24
+	bic	v24.2s,#0xfc,lsl#24
+	add	v25.2s,v25.2s,v29.2s		// h0 -> h1
+	add	v28.2s,v28.2s,v30.2s		// h3 -> h4
+
+	b.hi	.Loop_neon
+
+.Lskip_loop:
+	dup	v16.2d,v16.d[0]
+	add	v11.2s,v11.2s,v26.2s
+
+	////////////////////////////////////////////////////////////////
+	// multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
+
+	adds	x2,x2,#32
+	b.ne	.Long_tail
+
+	dup	v16.2d,v11.d[0]
+	add	v14.2s,v9.2s,v24.2s
+	add	v17.2s,v12.2s,v27.2s
+	add	v15.2s,v10.2s,v25.2s
+	add	v18.2s,v13.2s,v28.2s
+
+.Long_tail:
+	dup	v14.2d,v14.d[0]
+	umull2	v19.2d,v16.4s,v6.4s
+	umull2	v22.2d,v16.4s,v1.4s
+	umull2	v23.2d,v16.4s,v3.4s
+	umull2	v21.2d,v16.4s,v0.4s
+	umull2	v20.2d,v16.4s,v8.4s
+
+	dup	v15.2d,v15.d[0]
+	umlal2	v19.2d,v14.4s,v0.4s
+	umlal2	v21.2d,v14.4s,v3.4s
+	umlal2	v22.2d,v14.4s,v5.4s
+	umlal2	v23.2d,v14.4s,v7.4s
+	umlal2	v20.2d,v14.4s,v1.4s
+
+	dup	v17.2d,v17.d[0]
+	umlal2	v19.2d,v15.4s,v8.4s
+	umlal2	v22.2d,v15.4s,v3.4s
+	umlal2	v21.2d,v15.4s,v1.4s
+	umlal2	v23.2d,v15.4s,v5.4s
+	umlal2	v20.2d,v15.4s,v0.4s
+
+	dup	v18.2d,v18.d[0]
+	umlal2	v22.2d,v17.4s,v0.4s
+	umlal2	v23.2d,v17.4s,v1.4s
+	umlal2	v19.2d,v17.4s,v4.4s
+	umlal2	v20.2d,v17.4s,v6.4s
+	umlal2	v21.2d,v17.4s,v8.4s
+
+	umlal2	v22.2d,v18.4s,v8.4s
+	umlal2	v19.2d,v18.4s,v2.4s
+	umlal2	v23.2d,v18.4s,v0.4s
+	umlal2	v20.2d,v18.4s,v4.4s
+	umlal2	v21.2d,v18.4s,v6.4s
+
+	b.eq	.Lshort_tail
+
+	////////////////////////////////////////////////////////////////
+	// (hash+inp[0:1])*r^4:r^3 and accumulate
+
+	add	v9.2s,v9.2s,v24.2s
+	umlal	v22.2d,v11.2s,v1.2s
+	umlal	v19.2d,v11.2s,v6.2s
+	umlal	v23.2d,v11.2s,v3.2s
+	umlal	v20.2d,v11.2s,v8.2s
+	umlal	v21.2d,v11.2s,v0.2s
+
+	add	v10.2s,v10.2s,v25.2s
+	umlal	v22.2d,v9.2s,v5.2s
+	umlal	v19.2d,v9.2s,v0.2s
+	umlal	v23.2d,v9.2s,v7.2s
+	umlal	v20.2d,v9.2s,v1.2s
+	umlal	v21.2d,v9.2s,v3.2s
+
+	add	v12.2s,v12.2s,v27.2s
+	umlal	v22.2d,v10.2s,v3.2s
+	umlal	v19.2d,v10.2s,v8.2s
+	umlal	v23.2d,v10.2s,v5.2s
+	umlal	v20.2d,v10.2s,v0.2s
+	umlal	v21.2d,v10.2s,v1.2s
+
+	add	v13.2s,v13.2s,v28.2s
+	umlal	v22.2d,v12.2s,v0.2s
+	umlal	v19.2d,v12.2s,v4.2s
+	umlal	v23.2d,v12.2s,v1.2s
+	umlal	v20.2d,v12.2s,v6.2s
+	umlal	v21.2d,v12.2s,v8.2s
+
+	umlal	v22.2d,v13.2s,v8.2s
+	umlal	v19.2d,v13.2s,v2.2s
+	umlal	v23.2d,v13.2s,v0.2s
+	umlal	v20.2d,v13.2s,v4.2s
+	umlal	v21.2d,v13.2s,v6.2s
+
+.Lshort_tail:
+	////////////////////////////////////////////////////////////////
+	// horizontal add
+
+	addp	v22.2d,v22.2d,v22.2d
+	ldp	d8,d9,[sp,#16]		// meet ABI requirements
+	addp	v19.2d,v19.2d,v19.2d
+	ldp	d10,d11,[sp,#32]
+	addp	v23.2d,v23.2d,v23.2d
+	ldp	d12,d13,[sp,#48]
+	addp	v20.2d,v20.2d,v20.2d
+	ldp	d14,d15,[sp,#64]
+	addp	v21.2d,v21.2d,v21.2d
+
+	////////////////////////////////////////////////////////////////
+	// lazy reduction, but without narrowing
+
+	ushr	v29.2d,v22.2d,#26
+	and	v22.16b,v22.16b,v31.16b
+	ushr	v30.2d,v19.2d,#26
+	and	v19.16b,v19.16b,v31.16b
+
+	add	v23.2d,v23.2d,v29.2d	// h3 -> h4
+	add	v20.2d,v20.2d,v30.2d	// h0 -> h1
+
+	ushr	v29.2d,v23.2d,#26
+	and	v23.16b,v23.16b,v31.16b
+	ushr	v30.2d,v20.2d,#26
+	and	v20.16b,v20.16b,v31.16b
+	add	v21.2d,v21.2d,v30.2d	// h1 -> h2
+
+	add	v19.2d,v19.2d,v29.2d
+	shl	v29.2d,v29.2d,#2
+	ushr	v30.2d,v21.2d,#26
+	and	v21.16b,v21.16b,v31.16b
+	add	v19.2d,v19.2d,v29.2d	// h4 -> h0
+	add	v22.2d,v22.2d,v30.2d	// h2 -> h3
+
+	ushr	v29.2d,v19.2d,#26
+	and	v19.16b,v19.16b,v31.16b
+	ushr	v30.2d,v22.2d,#26
+	and	v22.16b,v22.16b,v31.16b
+	add	v20.2d,v20.2d,v29.2d	// h0 -> h1
+	add	v23.2d,v23.2d,v30.2d	// h3 -> h4
+
+	////////////////////////////////////////////////////////////////
+	// write the result, can be partially reduced
+
+	st4	{v19.s,v20.s,v21.s,v22.s}[0],[x0],#16
+	st1	{v23.s}[0],[x0]
+
+.Lno_data_neon:
+	ldr	x29,[sp],#80
+	ret
+ENDPROC(poly1305_blocks_neon)
+
+.align	5
+ENTRY(poly1305_emit_neon)
+	ldr	x17,[x0,#24]
+	cbz	x17,poly1305_emit_arm
+
+	ldp	w10,w11,[x0]		// load hash value base 2^26
+	ldp	w12,w13,[x0,#8]
+	ldr	w14,[x0,#16]
+
+	add	x4,x10,x11,lsl#26	// base 2^26 -> base 2^64
+	lsr	x5,x12,#12
+	adds	x4,x4,x12,lsl#52
+	add	x5,x5,x13,lsl#14
+	adc	x5,x5,xzr
+	lsr	x6,x14,#24
+	adds	x5,x5,x14,lsl#40
+	adc	x6,x6,xzr		// can be partially reduced...
+
+	ldp	x10,x11,[x2]	// load nonce
+
+	and	x12,x6,#-4		// ... so reduce
+	add	x12,x12,x6,lsr#2
+	and	x6,x6,#3
+	adds	x4,x4,x12
+	adcs	x5,x5,xzr
+	adc	x6,x6,xzr
+
+	adds	x12,x4,#5		// compare to modulus
+	adcs	x13,x5,xzr
+	adc	x14,x6,xzr
+
+	tst	x14,#-4			// see if it's carried/borrowed
+
+	csel	x4,x4,x12,eq
+	csel	x5,x5,x13,eq
+
+#ifdef	__ARMEB__
+	ror	x10,x10,#32		// flip nonce words
+	ror	x11,x11,#32
+#endif
+	adds	x4,x4,x10		// accumulate nonce
+	adc	x5,x5,x11
+#ifdef	__ARMEB__
+	rev	x4,x4			// flip output bytes
+	rev	x5,x5
+#endif
+	stp	x4,x5,[x1]		// write result
+
+	ret
+ENDPROC(poly1305_emit_neon)
+
+.align	5
+.Lzeros:
+.long	0,0,0,0,0,0,0,0
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 09/20] zinc: Poly1305 x86_64 implementation
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (7 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 10/20] zinc: Poly1305 MIPS32r2 and MIPS64 implementations Jason A. Donenfeld
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Andy Polyakov, Thomas Gleixner,
	Ingo Molnar, x86

This provides AVX, AVX-2, and AVX-512F implementations for Poly1305.
The AVX-512F implementation is disabled on Skylake, due to throttling.
These come from Andy Polyakov's implementation, with the following
modifications from Samuel Neves:

  - Some cosmetic changes, like renaming labels to .Lname, constants,
    and other Linux conventions.

  - CPU feature checking is done in C by the glue code, so that has been
    removed from the assembly.

  - poly1305_blocks_avx512 jumped to the middle of the poly1305_blocks_avx2
    for the final blocks. To appease objtool, the relevant tail avx2 code
    was duplicated for the avx512 function.

  - The original uses %rbp as a scratch register. However, the kernel
    expects %rbp to be a valid frame pointer at any given time in order
    to do proper unwinding. Thus we need to alter the code in order to
    preserve it. The most straightforward manner in which this was
    accomplished was by replacing $d3, formerly %r10, by %rdi, and
    replacing %rbp by %r10. Because %rdi, a pointer to the context
    structure, does not change and is not used by poly1305_iteration,
    it is safe to use it here, and the overhead of saving and restoring
    it should be minimal.

  - The original hardcodes returns as .byte 0xf3,0xc3, aka "rep ret".
    We replace this by "ret". "rep ret" was meant to help with AMD K8
    chips, cf. http://repzret.org/p/repzret. It makes no sense to
    continue to use this kludge for code that won't even run on ancient
    AMD chips.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
---
 lib/zinc/Makefile                        |    4 +
 lib/zinc/poly1305/poly1305-x86_64-glue.h |  109 +
 lib/zinc/poly1305/poly1305-x86_64.S      | 2792 ++++++++++++++++++++++
 3 files changed, 2905 insertions(+)
 create mode 100644 lib/zinc/poly1305/poly1305-x86_64-glue.h
 create mode 100644 lib/zinc/poly1305/poly1305-x86_64.S

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index f37df89a3f87..72112f8ffba1 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -25,6 +25,10 @@ endif
 
 ifeq ($(CONFIG_ZINC_POLY1305),y)
 zinc-y += poly1305/poly1305.o
+ifeq ($(CONFIG_ZINC_ARCH_X86_64),y)
+zinc-y += poly1305/poly1305-x86_64.o
+CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-x86_64-glue.h
+endif
 ifeq ($(CONFIG_ZINC_ARCH_ARM),y)
 zinc-y += poly1305/poly1305-arm.o
 CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-arm-glue.h
diff --git a/lib/zinc/poly1305/poly1305-x86_64-glue.h b/lib/zinc/poly1305/poly1305-x86_64-glue.h
new file mode 100644
index 000000000000..4ae028101e7c
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-x86_64-glue.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/poly1305.h>
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+#include <asm/intel-family.h>
+
+asmlinkage void poly1305_init_x86_64(void *ctx,
+				     const u8 key[POLY1305_KEY_SIZE]);
+asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp,
+				       const size_t len, const u32 padbit);
+asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+				     const u32 nonce[4]);
+#ifdef CONFIG_AS_AVX
+asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+				  const u32 nonce[4]);
+asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, const size_t len,
+				    const u32 padbit);
+#endif
+#ifdef CONFIG_AS_AVX2
+asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, const size_t len,
+				     const u32 padbit);
+#endif
+#ifdef CONFIG_AS_AVX512
+asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp,
+				       const size_t len, const u32 padbit);
+#endif
+
+static bool poly1305_use_avx __ro_after_init;
+static bool poly1305_use_avx2 __ro_after_init;
+static bool poly1305_use_avx512 __ro_after_init;
+
+void __init poly1305_fpu_init(void)
+{
+	poly1305_use_avx =
+		boot_cpu_has(X86_FEATURE_AVX) &&
+		cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
+	poly1305_use_avx2 =
+		boot_cpu_has(X86_FEATURE_AVX) &&
+		boot_cpu_has(X86_FEATURE_AVX2) &&
+		cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
+	poly1305_use_avx512 =
+		boot_cpu_has(X86_FEATURE_AVX) &&
+		boot_cpu_has(X86_FEATURE_AVX2) &&
+		boot_cpu_has(X86_FEATURE_AVX512F) &&
+		cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
+				  XFEATURE_MASK_AVX512, NULL) &&
+		/* Skylake downclocks unacceptably much when using zmm. */
+		boot_cpu_data.x86_model != INTEL_FAM6_SKYLAKE_X;
+}
+
+static inline bool poly1305_init_arch(void *ctx,
+				      const u8 key[POLY1305_KEY_SIZE],
+				      simd_context_t simd_context)
+{
+	poly1305_init_x86_64(ctx, key);
+	return true;
+}
+
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
+					const size_t len, const u32 padbit,
+					simd_context_t simd_context)
+{
+#ifdef CONFIG_AS_AVX512
+	if (poly1305_use_avx512 && simd_context == HAVE_FULL_SIMD)
+		poly1305_blocks_avx512(ctx, inp, len, padbit);
+	else
+#endif
+#ifdef CONFIG_AS_AVX2
+	if (poly1305_use_avx2 && simd_context == HAVE_FULL_SIMD)
+		poly1305_blocks_avx2(ctx, inp, len, padbit);
+	else
+#endif
+#ifdef CONFIG_AS_AVX
+	if (poly1305_use_avx && simd_context == HAVE_FULL_SIMD)
+		poly1305_blocks_avx(ctx, inp, len, padbit);
+	else
+#endif
+		poly1305_blocks_x86_64(ctx, inp, len, padbit);
+	return true;
+}
+
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+				      const u32 nonce[4],
+				      simd_context_t simd_context)
+{
+#ifdef CONFIG_AS_AVX512
+	if (poly1305_use_avx512 && simd_context == HAVE_FULL_SIMD)
+		poly1305_emit_avx(ctx, mac, nonce);
+	else
+#endif
+#ifdef CONFIG_AS_AVX2
+	if (poly1305_use_avx2 && simd_context == HAVE_FULL_SIMD)
+		poly1305_emit_avx(ctx, mac, nonce);
+	else
+#endif
+#ifdef CONFIG_AS_AVX
+	if (poly1305_use_avx && simd_context == HAVE_FULL_SIMD)
+		poly1305_emit_avx(ctx, mac, nonce);
+	else
+#endif
+		poly1305_emit_x86_64(ctx, mac, nonce);
+	return true;
+}
+
+#define HAVE_POLY1305_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/poly1305/poly1305-x86_64.S b/lib/zinc/poly1305/poly1305-x86_64.S
new file mode 100644
index 000000000000..26c852e3c769
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-x86_64.S
@@ -0,0 +1,2792 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ *
+ * Copyright (C) 2017 Samuel Neves <sneves@dei.uc.pt>. All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#include <linux/linkage.h>
+
+.section .rodata.cst192.Lconst, "aM", @progbits, 192
+.align	64
+.Lconst:
+.long	0x0ffffff,0,0x0ffffff,0,0x0ffffff,0,0x0ffffff,0
+.long	16777216,0,16777216,0,16777216,0,16777216,0
+.long	0x3ffffff,0,0x3ffffff,0,0x3ffffff,0,0x3ffffff,0
+.long	2,2,2,3,2,0,2,1
+.long	0,0,0,1, 0,2,0,3, 0,4,0,5, 0,6,0,7
+
+.text
+
+.align	32
+ENTRY(poly1305_init_x86_64)
+	xorq	%rax,%rax
+	movq	%rax,0(%rdi)
+	movq	%rax,8(%rdi)
+	movq	%rax,16(%rdi)
+
+	cmpq	$0,%rsi
+	je	.Lno_key
+
+	movq	$0x0ffffffc0fffffff,%rax
+	movq	$0x0ffffffc0ffffffc,%rcx
+	andq	0(%rsi),%rax
+	andq	8(%rsi),%rcx
+	movq	%rax,24(%rdi)
+	movq	%rcx,32(%rdi)
+	movl	$1,%eax
+.Lno_key:
+	ret
+ENDPROC(poly1305_init_x86_64)
+
+.align	32
+ENTRY(poly1305_blocks_x86_64)
+.Lblocks:
+	shrq	$4,%rdx
+	jz	.Lno_data
+
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+	pushq	%rdi
+
+.Lblocks_body:
+
+	movq	%rdx,%r15
+
+	movq	24(%rdi),%r11
+	movq	32(%rdi),%r13
+
+	movq	0(%rdi),%r14
+	movq	8(%rdi),%rbx
+	movq	16(%rdi),%r10
+
+	movq	%r13,%r12
+	shrq	$2,%r13
+	movq	%r12,%rax
+	addq	%r12,%r13
+	jmp	.Loop
+
+.align	32
+.Loop:
+
+	addq	0(%rsi),%r14
+	adcq	8(%rsi),%rbx
+	leaq	16(%rsi),%rsi
+	adcq	%rcx,%r10
+	mulq	%r14
+	movq	%rax,%r9
+	movq	%r11,%rax
+	movq	%rdx,%rdi
+
+	mulq	%r14
+	movq	%rax,%r14
+	movq	%r11,%rax
+	movq	%rdx,%r8
+
+	mulq	%rbx
+	addq	%rax,%r9
+	movq	%r13,%rax
+	adcq	%rdx,%rdi
+
+	mulq	%rbx
+	movq	%r10,%rbx
+	addq	%rax,%r14
+	adcq	%rdx,%r8
+
+	imulq	%r13,%rbx
+	addq	%rbx,%r9
+	movq	%r8,%rbx
+	adcq	$0,%rdi
+
+	imulq	%r11,%r10
+	addq	%r9,%rbx
+	movq	$-4,%rax
+	adcq	%r10,%rdi
+
+	andq	%rdi,%rax
+	movq	%rdi,%r10
+	shrq	$2,%rdi
+	andq	$3,%r10
+	addq	%rdi,%rax
+	addq	%rax,%r14
+	adcq	$0,%rbx
+	adcq	$0,%r10
+
+	movq	%r12,%rax
+	decq	%r15
+	jnz	.Loop
+
+	movq	0(%rsp),%rdi
+
+	movq	%r14,0(%rdi)
+	movq	%rbx,8(%rdi)
+	movq	%r10,16(%rdi)
+
+	movq	8(%rsp),%r15
+	movq	16(%rsp),%r14
+	movq	24(%rsp),%r13
+	movq	32(%rsp),%r12
+	movq	40(%rsp),%rbx
+	leaq	48(%rsp),%rsp
+.Lno_data:
+.Lblocks_epilogue:
+	ret
+ENDPROC(poly1305_blocks_x86_64)
+
+.align	32
+ENTRY(poly1305_emit_x86_64)
+.Lemit:
+	movq	0(%rdi),%r8
+	movq	8(%rdi),%r9
+	movq	16(%rdi),%r10
+
+	movq	%r8,%rax
+	addq	$5,%r8
+	movq	%r9,%rcx
+	adcq	$0,%r9
+	adcq	$0,%r10
+	shrq	$2,%r10
+	cmovnzq	%r8,%rax
+	cmovnzq	%r9,%rcx
+
+	addq	0(%rdx),%rax
+	adcq	8(%rdx),%rcx
+	movq	%rax,0(%rsi)
+	movq	%rcx,8(%rsi)
+
+	ret
+ENDPROC(poly1305_emit_x86_64)
+
+.macro __poly1305_block
+	mulq	%r14
+	movq	%rax,%r9
+	movq	%r11,%rax
+	movq	%rdx,%rdi
+
+	mulq	%r14
+	movq	%rax,%r14
+	movq	%r11,%rax
+	movq	%rdx,%r8
+
+	mulq	%rbx
+	addq	%rax,%r9
+	movq	%r13,%rax
+	adcq	%rdx,%rdi
+
+	mulq	%rbx
+	movq	%r10,%rbx
+	addq	%rax,%r14
+	adcq	%rdx,%r8
+
+	imulq	%r13,%rbx
+	addq	%rbx,%r9
+	movq	%r8,%rbx
+	adcq	$0,%rdi
+
+	imulq	%r11,%r10
+	addq	%r9,%rbx
+	movq	$-4,%rax
+	adcq	%r10,%rdi
+
+	andq	%rdi,%rax
+	movq	%rdi,%r10
+	shrq	$2,%rdi
+	andq	$3,%r10
+	addq	%rdi,%rax
+	addq	%rax,%r14
+	adcq	$0,%rbx
+	adcq	$0,%r10
+.endm
+
+.macro __poly1305_init_avx
+	movq	%r11,%r14
+	movq	%r12,%rbx
+	xorq	%r10,%r10
+
+	leaq	48+64(%rdi),%rdi
+
+	movq	%r12,%rax
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+
+	movl	$0x3ffffff,%eax
+	movl	$0x3ffffff,%edx
+	movq	%r14,%r8
+	andl	%r14d,%eax
+	movq	%r11,%r9
+	andl	%r11d,%edx
+	movl	%eax,-64(%rdi)
+	shrq	$26,%r8
+	movl	%edx,-60(%rdi)
+	shrq	$26,%r9
+
+	movl	$0x3ffffff,%eax
+	movl	$0x3ffffff,%edx
+	andl	%r8d,%eax
+	andl	%r9d,%edx
+	movl	%eax,-48(%rdi)
+	leal	(%rax,%rax,4),%eax
+	movl	%edx,-44(%rdi)
+	leal	(%rdx,%rdx,4),%edx
+	movl	%eax,-32(%rdi)
+	shrq	$26,%r8
+	movl	%edx,-28(%rdi)
+	shrq	$26,%r9
+
+	movq	%rbx,%rax
+	movq	%r12,%rdx
+	shlq	$12,%rax
+	shlq	$12,%rdx
+	orq	%r8,%rax
+	orq	%r9,%rdx
+	andl	$0x3ffffff,%eax
+	andl	$0x3ffffff,%edx
+	movl	%eax,-16(%rdi)
+	leal	(%rax,%rax,4),%eax
+	movl	%edx,-12(%rdi)
+	leal	(%rdx,%rdx,4),%edx
+	movl	%eax,0(%rdi)
+	movq	%rbx,%r8
+	movl	%edx,4(%rdi)
+	movq	%r12,%r9
+
+	movl	$0x3ffffff,%eax
+	movl	$0x3ffffff,%edx
+	shrq	$14,%r8
+	shrq	$14,%r9
+	andl	%r8d,%eax
+	andl	%r9d,%edx
+	movl	%eax,16(%rdi)
+	leal	(%rax,%rax,4),%eax
+	movl	%edx,20(%rdi)
+	leal	(%rdx,%rdx,4),%edx
+	movl	%eax,32(%rdi)
+	shrq	$26,%r8
+	movl	%edx,36(%rdi)
+	shrq	$26,%r9
+
+	movq	%r10,%rax
+	shlq	$24,%rax
+	orq	%rax,%r8
+	movl	%r8d,48(%rdi)
+	leaq	(%r8,%r8,4),%r8
+	movl	%r9d,52(%rdi)
+	leaq	(%r9,%r9,4),%r9
+	movl	%r8d,64(%rdi)
+	movl	%r9d,68(%rdi)
+
+	movq	%r12,%rax
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+
+	movl	$0x3ffffff,%eax
+	movq	%r14,%r8
+	andl	%r14d,%eax
+	shrq	$26,%r8
+	movl	%eax,-52(%rdi)
+
+	movl	$0x3ffffff,%edx
+	andl	%r8d,%edx
+	movl	%edx,-36(%rdi)
+	leal	(%rdx,%rdx,4),%edx
+	shrq	$26,%r8
+	movl	%edx,-20(%rdi)
+
+	movq	%rbx,%rax
+	shlq	$12,%rax
+	orq	%r8,%rax
+	andl	$0x3ffffff,%eax
+	movl	%eax,-4(%rdi)
+	leal	(%rax,%rax,4),%eax
+	movq	%rbx,%r8
+	movl	%eax,12(%rdi)
+
+	movl	$0x3ffffff,%edx
+	shrq	$14,%r8
+	andl	%r8d,%edx
+	movl	%edx,28(%rdi)
+	leal	(%rdx,%rdx,4),%edx
+	shrq	$26,%r8
+	movl	%edx,44(%rdi)
+
+	movq	%r10,%rax
+	shlq	$24,%rax
+	orq	%rax,%r8
+	movl	%r8d,60(%rdi)
+	leaq	(%r8,%r8,4),%r8
+	movl	%r8d,76(%rdi)
+
+	movq	%r12,%rax
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+
+	movl	$0x3ffffff,%eax
+	movq	%r14,%r8
+	andl	%r14d,%eax
+	shrq	$26,%r8
+	movl	%eax,-56(%rdi)
+
+	movl	$0x3ffffff,%edx
+	andl	%r8d,%edx
+	movl	%edx,-40(%rdi)
+	leal	(%rdx,%rdx,4),%edx
+	shrq	$26,%r8
+	movl	%edx,-24(%rdi)
+
+	movq	%rbx,%rax
+	shlq	$12,%rax
+	orq	%r8,%rax
+	andl	$0x3ffffff,%eax
+	movl	%eax,-8(%rdi)
+	leal	(%rax,%rax,4),%eax
+	movq	%rbx,%r8
+	movl	%eax,8(%rdi)
+
+	movl	$0x3ffffff,%edx
+	shrq	$14,%r8
+	andl	%r8d,%edx
+	movl	%edx,24(%rdi)
+	leal	(%rdx,%rdx,4),%edx
+	shrq	$26,%r8
+	movl	%edx,40(%rdi)
+
+	movq	%r10,%rax
+	shlq	$24,%rax
+	orq	%rax,%r8
+	movl	%r8d,56(%rdi)
+	leaq	(%r8,%r8,4),%r8
+	movl	%r8d,72(%rdi)
+
+	leaq	-48-64(%rdi),%rdi
+.endm
+
+#ifdef CONFIG_AS_AVX
+.align	32
+ENTRY(poly1305_blocks_avx)
+
+	movl	20(%rdi),%r8d
+	cmpq	$128,%rdx
+	jae	.Lblocks_avx
+	testl	%r8d,%r8d
+	jz	.Lblocks
+
+.Lblocks_avx:
+	andq	$-16,%rdx
+	jz	.Lno_data_avx
+
+	vzeroupper
+
+	testl	%r8d,%r8d
+	jz	.Lbase2_64_avx
+
+	testq	$31,%rdx
+	jz	.Leven_avx
+
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+	pushq	%rdi
+
+.Lblocks_avx_body:
+
+	movq	%rdx,%r15
+
+	movq	0(%rdi),%r8
+	movq	8(%rdi),%r9
+	movl	16(%rdi),%r10d
+
+	movq	24(%rdi),%r11
+	movq	32(%rdi),%r13
+
+
+	movl	%r8d,%r14d
+	andq	$-2147483648,%r8
+	movq	%r9,%r12
+	movl	%r9d,%ebx
+	andq	$-2147483648,%r9
+
+	shrq	$6,%r8
+	shlq	$52,%r12
+	addq	%r8,%r14
+	shrq	$12,%rbx
+	shrq	$18,%r9
+	addq	%r12,%r14
+	adcq	%r9,%rbx
+
+	movq	%r10,%r8
+	shlq	$40,%r8
+	shrq	$24,%r10
+	addq	%r8,%rbx
+	adcq	$0,%r10
+
+	movq	$-4,%r9
+	movq	%r10,%r8
+	andq	%r10,%r9
+	shrq	$2,%r8
+	andq	$3,%r10
+	addq	%r9,%r8
+	addq	%r8,%r14
+	adcq	$0,%rbx
+	adcq	$0,%r10
+
+	movq	%r13,%r12
+	movq	%r13,%rax
+	shrq	$2,%r13
+	addq	%r12,%r13
+
+	addq	0(%rsi),%r14
+	adcq	8(%rsi),%rbx
+	leaq	16(%rsi),%rsi
+	adcq	%rcx,%r10
+
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+
+	testq	%rcx,%rcx
+	jz	.Lstore_base2_64_avx
+
+
+	movq	%r14,%rax
+	movq	%r14,%rdx
+	shrq	$52,%r14
+	movq	%rbx,%r11
+	movq	%rbx,%r12
+	shrq	$26,%rdx
+	andq	$0x3ffffff,%rax
+	shlq	$12,%r11
+	andq	$0x3ffffff,%rdx
+	shrq	$14,%rbx
+	orq	%r11,%r14
+	shlq	$24,%r10
+	andq	$0x3ffffff,%r14
+	shrq	$40,%r12
+	andq	$0x3ffffff,%rbx
+	orq	%r12,%r10
+
+	subq	$16,%r15
+	jz	.Lstore_base2_26_avx
+
+	vmovd	%eax,%xmm0
+	vmovd	%edx,%xmm1
+	vmovd	%r14d,%xmm2
+	vmovd	%ebx,%xmm3
+	vmovd	%r10d,%xmm4
+	jmp	.Lproceed_avx
+
+.align	32
+.Lstore_base2_64_avx:
+	movq	%r14,0(%rdi)
+	movq	%rbx,8(%rdi)
+	movq	%r10,16(%rdi)
+	jmp	.Ldone_avx
+
+.align	16
+.Lstore_base2_26_avx:
+	movl	%eax,0(%rdi)
+	movl	%edx,4(%rdi)
+	movl	%r14d,8(%rdi)
+	movl	%ebx,12(%rdi)
+	movl	%r10d,16(%rdi)
+.align	16
+.Ldone_avx:
+	movq	8(%rsp),%r15
+	movq	16(%rsp),%r14
+	movq	24(%rsp),%r13
+	movq	32(%rsp),%r12
+	movq	40(%rsp),%rbx
+	leaq	48(%rsp),%rsp
+
+.Lno_data_avx:
+.Lblocks_avx_epilogue:
+	ret
+
+.align	32
+.Lbase2_64_avx:
+
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+	pushq	%rdi
+
+.Lbase2_64_avx_body:
+
+	movq	%rdx,%r15
+
+	movq	24(%rdi),%r11
+	movq	32(%rdi),%r13
+
+	movq	0(%rdi),%r14
+	movq	8(%rdi),%rbx
+	movl	16(%rdi),%r10d
+
+	movq	%r13,%r12
+	movq	%r13,%rax
+	shrq	$2,%r13
+	addq	%r12,%r13
+
+	testq	$31,%rdx
+	jz	.Linit_avx
+
+	addq	0(%rsi),%r14
+	adcq	8(%rsi),%rbx
+	leaq	16(%rsi),%rsi
+	adcq	%rcx,%r10
+	subq	$16,%r15
+
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+
+.Linit_avx:
+
+	movq	%r14,%rax
+	movq	%r14,%rdx
+	shrq	$52,%r14
+	movq	%rbx,%r8
+	movq	%rbx,%r9
+	shrq	$26,%rdx
+	andq	$0x3ffffff,%rax
+	shlq	$12,%r8
+	andq	$0x3ffffff,%rdx
+	shrq	$14,%rbx
+	orq	%r8,%r14
+	shlq	$24,%r10
+	andq	$0x3ffffff,%r14
+	shrq	$40,%r9
+	andq	$0x3ffffff,%rbx
+	orq	%r9,%r10
+
+	vmovd	%eax,%xmm0
+	vmovd	%edx,%xmm1
+	vmovd	%r14d,%xmm2
+	vmovd	%ebx,%xmm3
+	vmovd	%r10d,%xmm4
+	movl	$1,20(%rdi)
+
+	__poly1305_init_avx
+
+.Lproceed_avx:
+	movq	%r15,%rdx
+
+	movq	8(%rsp),%r15
+	movq	16(%rsp),%r14
+	movq	24(%rsp),%r13
+	movq	32(%rsp),%r12
+	movq	40(%rsp),%rbx
+	leaq	48(%rsp),%rax
+	leaq	48(%rsp),%rsp
+
+.Lbase2_64_avx_epilogue:
+	jmp	.Ldo_avx
+
+
+.align	32
+.Leven_avx:
+	vmovd	0(%rdi),%xmm0
+	vmovd	4(%rdi),%xmm1
+	vmovd	8(%rdi),%xmm2
+	vmovd	12(%rdi),%xmm3
+	vmovd	16(%rdi),%xmm4
+
+.Ldo_avx:
+	leaq	8(%rsp),%r10
+	andq	$-32,%rsp
+	subq	$8,%rsp
+	leaq	-88(%rsp),%r11
+	subq	$0x178,%rsp
+	subq	$64,%rdx
+	leaq	-32(%rsi),%rax
+	cmovcq	%rax,%rsi
+
+	vmovdqu	48(%rdi),%xmm14
+	leaq	112(%rdi),%rdi
+	leaq	.Lconst(%rip),%rcx
+
+	vmovdqu	32(%rsi),%xmm5
+	vmovdqu	48(%rsi),%xmm6
+	vmovdqa	64(%rcx),%xmm15
+
+	vpsrldq	$6,%xmm5,%xmm7
+	vpsrldq	$6,%xmm6,%xmm8
+	vpunpckhqdq	%xmm6,%xmm5,%xmm9
+	vpunpcklqdq	%xmm6,%xmm5,%xmm5
+	vpunpcklqdq	%xmm8,%xmm7,%xmm8
+
+	vpsrlq	$40,%xmm9,%xmm9
+	vpsrlq	$26,%xmm5,%xmm6
+	vpand	%xmm15,%xmm5,%xmm5
+	vpsrlq	$4,%xmm8,%xmm7
+	vpand	%xmm15,%xmm6,%xmm6
+	vpsrlq	$30,%xmm8,%xmm8
+	vpand	%xmm15,%xmm7,%xmm7
+	vpand	%xmm15,%xmm8,%xmm8
+	vpor	32(%rcx),%xmm9,%xmm9
+
+	jbe	.Lskip_loop_avx
+
+
+	vmovdqu	-48(%rdi),%xmm11
+	vmovdqu	-32(%rdi),%xmm12
+	vpshufd	$0xEE,%xmm14,%xmm13
+	vpshufd	$0x44,%xmm14,%xmm10
+	vmovdqa	%xmm13,-144(%r11)
+	vmovdqa	%xmm10,0(%rsp)
+	vpshufd	$0xEE,%xmm11,%xmm14
+	vmovdqu	-16(%rdi),%xmm10
+	vpshufd	$0x44,%xmm11,%xmm11
+	vmovdqa	%xmm14,-128(%r11)
+	vmovdqa	%xmm11,16(%rsp)
+	vpshufd	$0xEE,%xmm12,%xmm13
+	vmovdqu	0(%rdi),%xmm11
+	vpshufd	$0x44,%xmm12,%xmm12
+	vmovdqa	%xmm13,-112(%r11)
+	vmovdqa	%xmm12,32(%rsp)
+	vpshufd	$0xEE,%xmm10,%xmm14
+	vmovdqu	16(%rdi),%xmm12
+	vpshufd	$0x44,%xmm10,%xmm10
+	vmovdqa	%xmm14,-96(%r11)
+	vmovdqa	%xmm10,48(%rsp)
+	vpshufd	$0xEE,%xmm11,%xmm13
+	vmovdqu	32(%rdi),%xmm10
+	vpshufd	$0x44,%xmm11,%xmm11
+	vmovdqa	%xmm13,-80(%r11)
+	vmovdqa	%xmm11,64(%rsp)
+	vpshufd	$0xEE,%xmm12,%xmm14
+	vmovdqu	48(%rdi),%xmm11
+	vpshufd	$0x44,%xmm12,%xmm12
+	vmovdqa	%xmm14,-64(%r11)
+	vmovdqa	%xmm12,80(%rsp)
+	vpshufd	$0xEE,%xmm10,%xmm13
+	vmovdqu	64(%rdi),%xmm12
+	vpshufd	$0x44,%xmm10,%xmm10
+	vmovdqa	%xmm13,-48(%r11)
+	vmovdqa	%xmm10,96(%rsp)
+	vpshufd	$0xEE,%xmm11,%xmm14
+	vpshufd	$0x44,%xmm11,%xmm11
+	vmovdqa	%xmm14,-32(%r11)
+	vmovdqa	%xmm11,112(%rsp)
+	vpshufd	$0xEE,%xmm12,%xmm13
+	vmovdqa	0(%rsp),%xmm14
+	vpshufd	$0x44,%xmm12,%xmm12
+	vmovdqa	%xmm13,-16(%r11)
+	vmovdqa	%xmm12,128(%rsp)
+
+	jmp	.Loop_avx
+
+.align	32
+.Loop_avx:
+
+	vpmuludq	%xmm5,%xmm14,%xmm10
+	vpmuludq	%xmm6,%xmm14,%xmm11
+	vmovdqa	%xmm2,32(%r11)
+	vpmuludq	%xmm7,%xmm14,%xmm12
+	vmovdqa	16(%rsp),%xmm2
+	vpmuludq	%xmm8,%xmm14,%xmm13
+	vpmuludq	%xmm9,%xmm14,%xmm14
+
+	vmovdqa	%xmm0,0(%r11)
+	vpmuludq	32(%rsp),%xmm9,%xmm0
+	vmovdqa	%xmm1,16(%r11)
+	vpmuludq	%xmm8,%xmm2,%xmm1
+	vpaddq	%xmm0,%xmm10,%xmm10
+	vpaddq	%xmm1,%xmm14,%xmm14
+	vmovdqa	%xmm3,48(%r11)
+	vpmuludq	%xmm7,%xmm2,%xmm0
+	vpmuludq	%xmm6,%xmm2,%xmm1
+	vpaddq	%xmm0,%xmm13,%xmm13
+	vmovdqa	48(%rsp),%xmm3
+	vpaddq	%xmm1,%xmm12,%xmm12
+	vmovdqa	%xmm4,64(%r11)
+	vpmuludq	%xmm5,%xmm2,%xmm2
+	vpmuludq	%xmm7,%xmm3,%xmm0
+	vpaddq	%xmm2,%xmm11,%xmm11
+
+	vmovdqa	64(%rsp),%xmm4
+	vpaddq	%xmm0,%xmm14,%xmm14
+	vpmuludq	%xmm6,%xmm3,%xmm1
+	vpmuludq	%xmm5,%xmm3,%xmm3
+	vpaddq	%xmm1,%xmm13,%xmm13
+	vmovdqa	80(%rsp),%xmm2
+	vpaddq	%xmm3,%xmm12,%xmm12
+	vpmuludq	%xmm9,%xmm4,%xmm0
+	vpmuludq	%xmm8,%xmm4,%xmm4
+	vpaddq	%xmm0,%xmm11,%xmm11
+	vmovdqa	96(%rsp),%xmm3
+	vpaddq	%xmm4,%xmm10,%xmm10
+
+	vmovdqa	128(%rsp),%xmm4
+	vpmuludq	%xmm6,%xmm2,%xmm1
+	vpmuludq	%xmm5,%xmm2,%xmm2
+	vpaddq	%xmm1,%xmm14,%xmm14
+	vpaddq	%xmm2,%xmm13,%xmm13
+	vpmuludq	%xmm9,%xmm3,%xmm0
+	vpmuludq	%xmm8,%xmm3,%xmm1
+	vpaddq	%xmm0,%xmm12,%xmm12
+	vmovdqu	0(%rsi),%xmm0
+	vpaddq	%xmm1,%xmm11,%xmm11
+	vpmuludq	%xmm7,%xmm3,%xmm3
+	vpmuludq	%xmm7,%xmm4,%xmm7
+	vpaddq	%xmm3,%xmm10,%xmm10
+
+	vmovdqu	16(%rsi),%xmm1
+	vpaddq	%xmm7,%xmm11,%xmm11
+	vpmuludq	%xmm8,%xmm4,%xmm8
+	vpmuludq	%xmm9,%xmm4,%xmm9
+	vpsrldq	$6,%xmm0,%xmm2
+	vpaddq	%xmm8,%xmm12,%xmm12
+	vpaddq	%xmm9,%xmm13,%xmm13
+	vpsrldq	$6,%xmm1,%xmm3
+	vpmuludq	112(%rsp),%xmm5,%xmm9
+	vpmuludq	%xmm6,%xmm4,%xmm5
+	vpunpckhqdq	%xmm1,%xmm0,%xmm4
+	vpaddq	%xmm9,%xmm14,%xmm14
+	vmovdqa	-144(%r11),%xmm9
+	vpaddq	%xmm5,%xmm10,%xmm10
+
+	vpunpcklqdq	%xmm1,%xmm0,%xmm0
+	vpunpcklqdq	%xmm3,%xmm2,%xmm3
+
+
+	vpsrldq	$5,%xmm4,%xmm4
+	vpsrlq	$26,%xmm0,%xmm1
+	vpand	%xmm15,%xmm0,%xmm0
+	vpsrlq	$4,%xmm3,%xmm2
+	vpand	%xmm15,%xmm1,%xmm1
+	vpand	0(%rcx),%xmm4,%xmm4
+	vpsrlq	$30,%xmm3,%xmm3
+	vpand	%xmm15,%xmm2,%xmm2
+	vpand	%xmm15,%xmm3,%xmm3
+	vpor	32(%rcx),%xmm4,%xmm4
+
+	vpaddq	0(%r11),%xmm0,%xmm0
+	vpaddq	16(%r11),%xmm1,%xmm1
+	vpaddq	32(%r11),%xmm2,%xmm2
+	vpaddq	48(%r11),%xmm3,%xmm3
+	vpaddq	64(%r11),%xmm4,%xmm4
+
+	leaq	32(%rsi),%rax
+	leaq	64(%rsi),%rsi
+	subq	$64,%rdx
+	cmovcq	%rax,%rsi
+
+	vpmuludq	%xmm0,%xmm9,%xmm5
+	vpmuludq	%xmm1,%xmm9,%xmm6
+	vpaddq	%xmm5,%xmm10,%xmm10
+	vpaddq	%xmm6,%xmm11,%xmm11
+	vmovdqa	-128(%r11),%xmm7
+	vpmuludq	%xmm2,%xmm9,%xmm5
+	vpmuludq	%xmm3,%xmm9,%xmm6
+	vpaddq	%xmm5,%xmm12,%xmm12
+	vpaddq	%xmm6,%xmm13,%xmm13
+	vpmuludq	%xmm4,%xmm9,%xmm9
+	vpmuludq	-112(%r11),%xmm4,%xmm5
+	vpaddq	%xmm9,%xmm14,%xmm14
+
+	vpaddq	%xmm5,%xmm10,%xmm10
+	vpmuludq	%xmm2,%xmm7,%xmm6
+	vpmuludq	%xmm3,%xmm7,%xmm5
+	vpaddq	%xmm6,%xmm13,%xmm13
+	vmovdqa	-96(%r11),%xmm8
+	vpaddq	%xmm5,%xmm14,%xmm14
+	vpmuludq	%xmm1,%xmm7,%xmm6
+	vpmuludq	%xmm0,%xmm7,%xmm7
+	vpaddq	%xmm6,%xmm12,%xmm12
+	vpaddq	%xmm7,%xmm11,%xmm11
+
+	vmovdqa	-80(%r11),%xmm9
+	vpmuludq	%xmm2,%xmm8,%xmm5
+	vpmuludq	%xmm1,%xmm8,%xmm6
+	vpaddq	%xmm5,%xmm14,%xmm14
+	vpaddq	%xmm6,%xmm13,%xmm13
+	vmovdqa	-64(%r11),%xmm7
+	vpmuludq	%xmm0,%xmm8,%xmm8
+	vpmuludq	%xmm4,%xmm9,%xmm5
+	vpaddq	%xmm8,%xmm12,%xmm12
+	vpaddq	%xmm5,%xmm11,%xmm11
+	vmovdqa	-48(%r11),%xmm8
+	vpmuludq	%xmm3,%xmm9,%xmm9
+	vpmuludq	%xmm1,%xmm7,%xmm6
+	vpaddq	%xmm9,%xmm10,%xmm10
+
+	vmovdqa	-16(%r11),%xmm9
+	vpaddq	%xmm6,%xmm14,%xmm14
+	vpmuludq	%xmm0,%xmm7,%xmm7
+	vpmuludq	%xmm4,%xmm8,%xmm5
+	vpaddq	%xmm7,%xmm13,%xmm13
+	vpaddq	%xmm5,%xmm12,%xmm12
+	vmovdqu	32(%rsi),%xmm5
+	vpmuludq	%xmm3,%xmm8,%xmm7
+	vpmuludq	%xmm2,%xmm8,%xmm8
+	vpaddq	%xmm7,%xmm11,%xmm11
+	vmovdqu	48(%rsi),%xmm6
+	vpaddq	%xmm8,%xmm10,%xmm10
+
+	vpmuludq	%xmm2,%xmm9,%xmm2
+	vpmuludq	%xmm3,%xmm9,%xmm3
+	vpsrldq	$6,%xmm5,%xmm7
+	vpaddq	%xmm2,%xmm11,%xmm11
+	vpmuludq	%xmm4,%xmm9,%xmm4
+	vpsrldq	$6,%xmm6,%xmm8
+	vpaddq	%xmm3,%xmm12,%xmm2
+	vpaddq	%xmm4,%xmm13,%xmm3
+	vpmuludq	-32(%r11),%xmm0,%xmm4
+	vpmuludq	%xmm1,%xmm9,%xmm0
+	vpunpckhqdq	%xmm6,%xmm5,%xmm9
+	vpaddq	%xmm4,%xmm14,%xmm4
+	vpaddq	%xmm0,%xmm10,%xmm0
+
+	vpunpcklqdq	%xmm6,%xmm5,%xmm5
+	vpunpcklqdq	%xmm8,%xmm7,%xmm8
+
+
+	vpsrldq	$5,%xmm9,%xmm9
+	vpsrlq	$26,%xmm5,%xmm6
+	vmovdqa	0(%rsp),%xmm14
+	vpand	%xmm15,%xmm5,%xmm5
+	vpsrlq	$4,%xmm8,%xmm7
+	vpand	%xmm15,%xmm6,%xmm6
+	vpand	0(%rcx),%xmm9,%xmm9
+	vpsrlq	$30,%xmm8,%xmm8
+	vpand	%xmm15,%xmm7,%xmm7
+	vpand	%xmm15,%xmm8,%xmm8
+	vpor	32(%rcx),%xmm9,%xmm9
+
+	vpsrlq	$26,%xmm3,%xmm13
+	vpand	%xmm15,%xmm3,%xmm3
+	vpaddq	%xmm13,%xmm4,%xmm4
+
+	vpsrlq	$26,%xmm0,%xmm10
+	vpand	%xmm15,%xmm0,%xmm0
+	vpaddq	%xmm10,%xmm11,%xmm1
+
+	vpsrlq	$26,%xmm4,%xmm10
+	vpand	%xmm15,%xmm4,%xmm4
+
+	vpsrlq	$26,%xmm1,%xmm11
+	vpand	%xmm15,%xmm1,%xmm1
+	vpaddq	%xmm11,%xmm2,%xmm2
+
+	vpaddq	%xmm10,%xmm0,%xmm0
+	vpsllq	$2,%xmm10,%xmm10
+	vpaddq	%xmm10,%xmm0,%xmm0
+
+	vpsrlq	$26,%xmm2,%xmm12
+	vpand	%xmm15,%xmm2,%xmm2
+	vpaddq	%xmm12,%xmm3,%xmm3
+
+	vpsrlq	$26,%xmm0,%xmm10
+	vpand	%xmm15,%xmm0,%xmm0
+	vpaddq	%xmm10,%xmm1,%xmm1
+
+	vpsrlq	$26,%xmm3,%xmm13
+	vpand	%xmm15,%xmm3,%xmm3
+	vpaddq	%xmm13,%xmm4,%xmm4
+
+	ja	.Loop_avx
+
+.Lskip_loop_avx:
+	vpshufd	$0x10,%xmm14,%xmm14
+	addq	$32,%rdx
+	jnz	.Long_tail_avx
+
+	vpaddq	%xmm2,%xmm7,%xmm7
+	vpaddq	%xmm0,%xmm5,%xmm5
+	vpaddq	%xmm1,%xmm6,%xmm6
+	vpaddq	%xmm3,%xmm8,%xmm8
+	vpaddq	%xmm4,%xmm9,%xmm9
+
+.Long_tail_avx:
+	vmovdqa	%xmm2,32(%r11)
+	vmovdqa	%xmm0,0(%r11)
+	vmovdqa	%xmm1,16(%r11)
+	vmovdqa	%xmm3,48(%r11)
+	vmovdqa	%xmm4,64(%r11)
+
+	vpmuludq	%xmm7,%xmm14,%xmm12
+	vpmuludq	%xmm5,%xmm14,%xmm10
+	vpshufd	$0x10,-48(%rdi),%xmm2
+	vpmuludq	%xmm6,%xmm14,%xmm11
+	vpmuludq	%xmm8,%xmm14,%xmm13
+	vpmuludq	%xmm9,%xmm14,%xmm14
+
+	vpmuludq	%xmm8,%xmm2,%xmm0
+	vpaddq	%xmm0,%xmm14,%xmm14
+	vpshufd	$0x10,-32(%rdi),%xmm3
+	vpmuludq	%xmm7,%xmm2,%xmm1
+	vpaddq	%xmm1,%xmm13,%xmm13
+	vpshufd	$0x10,-16(%rdi),%xmm4
+	vpmuludq	%xmm6,%xmm2,%xmm0
+	vpaddq	%xmm0,%xmm12,%xmm12
+	vpmuludq	%xmm5,%xmm2,%xmm2
+	vpaddq	%xmm2,%xmm11,%xmm11
+	vpmuludq	%xmm9,%xmm3,%xmm3
+	vpaddq	%xmm3,%xmm10,%xmm10
+
+	vpshufd	$0x10,0(%rdi),%xmm2
+	vpmuludq	%xmm7,%xmm4,%xmm1
+	vpaddq	%xmm1,%xmm14,%xmm14
+	vpmuludq	%xmm6,%xmm4,%xmm0
+	vpaddq	%xmm0,%xmm13,%xmm13
+	vpshufd	$0x10,16(%rdi),%xmm3
+	vpmuludq	%xmm5,%xmm4,%xmm4
+	vpaddq	%xmm4,%xmm12,%xmm12
+	vpmuludq	%xmm9,%xmm2,%xmm1
+	vpaddq	%xmm1,%xmm11,%xmm11
+	vpshufd	$0x10,32(%rdi),%xmm4
+	vpmuludq	%xmm8,%xmm2,%xmm2
+	vpaddq	%xmm2,%xmm10,%xmm10
+
+	vpmuludq	%xmm6,%xmm3,%xmm0
+	vpaddq	%xmm0,%xmm14,%xmm14
+	vpmuludq	%xmm5,%xmm3,%xmm3
+	vpaddq	%xmm3,%xmm13,%xmm13
+	vpshufd	$0x10,48(%rdi),%xmm2
+	vpmuludq	%xmm9,%xmm4,%xmm1
+	vpaddq	%xmm1,%xmm12,%xmm12
+	vpshufd	$0x10,64(%rdi),%xmm3
+	vpmuludq	%xmm8,%xmm4,%xmm0
+	vpaddq	%xmm0,%xmm11,%xmm11
+	vpmuludq	%xmm7,%xmm4,%xmm4
+	vpaddq	%xmm4,%xmm10,%xmm10
+
+	vpmuludq	%xmm5,%xmm2,%xmm2
+	vpaddq	%xmm2,%xmm14,%xmm14
+	vpmuludq	%xmm9,%xmm3,%xmm1
+	vpaddq	%xmm1,%xmm13,%xmm13
+	vpmuludq	%xmm8,%xmm3,%xmm0
+	vpaddq	%xmm0,%xmm12,%xmm12
+	vpmuludq	%xmm7,%xmm3,%xmm1
+	vpaddq	%xmm1,%xmm11,%xmm11
+	vpmuludq	%xmm6,%xmm3,%xmm3
+	vpaddq	%xmm3,%xmm10,%xmm10
+
+	jz	.Lshort_tail_avx
+
+	vmovdqu	0(%rsi),%xmm0
+	vmovdqu	16(%rsi),%xmm1
+
+	vpsrldq	$6,%xmm0,%xmm2
+	vpsrldq	$6,%xmm1,%xmm3
+	vpunpckhqdq	%xmm1,%xmm0,%xmm4
+	vpunpcklqdq	%xmm1,%xmm0,%xmm0
+	vpunpcklqdq	%xmm3,%xmm2,%xmm3
+
+	vpsrlq	$40,%xmm4,%xmm4
+	vpsrlq	$26,%xmm0,%xmm1
+	vpand	%xmm15,%xmm0,%xmm0
+	vpsrlq	$4,%xmm3,%xmm2
+	vpand	%xmm15,%xmm1,%xmm1
+	vpsrlq	$30,%xmm3,%xmm3
+	vpand	%xmm15,%xmm2,%xmm2
+	vpand	%xmm15,%xmm3,%xmm3
+	vpor	32(%rcx),%xmm4,%xmm4
+
+	vpshufd	$0x32,-64(%rdi),%xmm9
+	vpaddq	0(%r11),%xmm0,%xmm0
+	vpaddq	16(%r11),%xmm1,%xmm1
+	vpaddq	32(%r11),%xmm2,%xmm2
+	vpaddq	48(%r11),%xmm3,%xmm3
+	vpaddq	64(%r11),%xmm4,%xmm4
+
+	vpmuludq	%xmm0,%xmm9,%xmm5
+	vpaddq	%xmm5,%xmm10,%xmm10
+	vpmuludq	%xmm1,%xmm9,%xmm6
+	vpaddq	%xmm6,%xmm11,%xmm11
+	vpmuludq	%xmm2,%xmm9,%xmm5
+	vpaddq	%xmm5,%xmm12,%xmm12
+	vpshufd	$0x32,-48(%rdi),%xmm7
+	vpmuludq	%xmm3,%xmm9,%xmm6
+	vpaddq	%xmm6,%xmm13,%xmm13
+	vpmuludq	%xmm4,%xmm9,%xmm9
+	vpaddq	%xmm9,%xmm14,%xmm14
+
+	vpmuludq	%xmm3,%xmm7,%xmm5
+	vpaddq	%xmm5,%xmm14,%xmm14
+	vpshufd	$0x32,-32(%rdi),%xmm8
+	vpmuludq	%xmm2,%xmm7,%xmm6
+	vpaddq	%xmm6,%xmm13,%xmm13
+	vpshufd	$0x32,-16(%rdi),%xmm9
+	vpmuludq	%xmm1,%xmm7,%xmm5
+	vpaddq	%xmm5,%xmm12,%xmm12
+	vpmuludq	%xmm0,%xmm7,%xmm7
+	vpaddq	%xmm7,%xmm11,%xmm11
+	vpmuludq	%xmm4,%xmm8,%xmm8
+	vpaddq	%xmm8,%xmm10,%xmm10
+
+	vpshufd	$0x32,0(%rdi),%xmm7
+	vpmuludq	%xmm2,%xmm9,%xmm6
+	vpaddq	%xmm6,%xmm14,%xmm14
+	vpmuludq	%xmm1,%xmm9,%xmm5
+	vpaddq	%xmm5,%xmm13,%xmm13
+	vpshufd	$0x32,16(%rdi),%xmm8
+	vpmuludq	%xmm0,%xmm9,%xmm9
+	vpaddq	%xmm9,%xmm12,%xmm12
+	vpmuludq	%xmm4,%xmm7,%xmm6
+	vpaddq	%xmm6,%xmm11,%xmm11
+	vpshufd	$0x32,32(%rdi),%xmm9
+	vpmuludq	%xmm3,%xmm7,%xmm7
+	vpaddq	%xmm7,%xmm10,%xmm10
+
+	vpmuludq	%xmm1,%xmm8,%xmm5
+	vpaddq	%xmm5,%xmm14,%xmm14
+	vpmuludq	%xmm0,%xmm8,%xmm8
+	vpaddq	%xmm8,%xmm13,%xmm13
+	vpshufd	$0x32,48(%rdi),%xmm7
+	vpmuludq	%xmm4,%xmm9,%xmm6
+	vpaddq	%xmm6,%xmm12,%xmm12
+	vpshufd	$0x32,64(%rdi),%xmm8
+	vpmuludq	%xmm3,%xmm9,%xmm5
+	vpaddq	%xmm5,%xmm11,%xmm11
+	vpmuludq	%xmm2,%xmm9,%xmm9
+	vpaddq	%xmm9,%xmm10,%xmm10
+
+	vpmuludq	%xmm0,%xmm7,%xmm7
+	vpaddq	%xmm7,%xmm14,%xmm14
+	vpmuludq	%xmm4,%xmm8,%xmm6
+	vpaddq	%xmm6,%xmm13,%xmm13
+	vpmuludq	%xmm3,%xmm8,%xmm5
+	vpaddq	%xmm5,%xmm12,%xmm12
+	vpmuludq	%xmm2,%xmm8,%xmm6
+	vpaddq	%xmm6,%xmm11,%xmm11
+	vpmuludq	%xmm1,%xmm8,%xmm8
+	vpaddq	%xmm8,%xmm10,%xmm10
+
+.Lshort_tail_avx:
+
+	vpsrldq	$8,%xmm14,%xmm9
+	vpsrldq	$8,%xmm13,%xmm8
+	vpsrldq	$8,%xmm11,%xmm6
+	vpsrldq	$8,%xmm10,%xmm5
+	vpsrldq	$8,%xmm12,%xmm7
+	vpaddq	%xmm8,%xmm13,%xmm13
+	vpaddq	%xmm9,%xmm14,%xmm14
+	vpaddq	%xmm5,%xmm10,%xmm10
+	vpaddq	%xmm6,%xmm11,%xmm11
+	vpaddq	%xmm7,%xmm12,%xmm12
+
+	vpsrlq	$26,%xmm13,%xmm3
+	vpand	%xmm15,%xmm13,%xmm13
+	vpaddq	%xmm3,%xmm14,%xmm14
+
+	vpsrlq	$26,%xmm10,%xmm0
+	vpand	%xmm15,%xmm10,%xmm10
+	vpaddq	%xmm0,%xmm11,%xmm11
+
+	vpsrlq	$26,%xmm14,%xmm4
+	vpand	%xmm15,%xmm14,%xmm14
+
+	vpsrlq	$26,%xmm11,%xmm1
+	vpand	%xmm15,%xmm11,%xmm11
+	vpaddq	%xmm1,%xmm12,%xmm12
+
+	vpaddq	%xmm4,%xmm10,%xmm10
+	vpsllq	$2,%xmm4,%xmm4
+	vpaddq	%xmm4,%xmm10,%xmm10
+
+	vpsrlq	$26,%xmm12,%xmm2
+	vpand	%xmm15,%xmm12,%xmm12
+	vpaddq	%xmm2,%xmm13,%xmm13
+
+	vpsrlq	$26,%xmm10,%xmm0
+	vpand	%xmm15,%xmm10,%xmm10
+	vpaddq	%xmm0,%xmm11,%xmm11
+
+	vpsrlq	$26,%xmm13,%xmm3
+	vpand	%xmm15,%xmm13,%xmm13
+	vpaddq	%xmm3,%xmm14,%xmm14
+
+	vmovd	%xmm10,-112(%rdi)
+	vmovd	%xmm11,-108(%rdi)
+	vmovd	%xmm12,-104(%rdi)
+	vmovd	%xmm13,-100(%rdi)
+	vmovd	%xmm14,-96(%rdi)
+	leaq	-8(%r10),%rsp
+
+	vzeroupper
+	ret
+ENDPROC(poly1305_blocks_avx)
+
+.align	32
+ENTRY(poly1305_emit_avx)
+	cmpl	$0,20(%rdi)
+	je	.Lemit
+
+	movl	0(%rdi),%eax
+	movl	4(%rdi),%ecx
+	movl	8(%rdi),%r8d
+	movl	12(%rdi),%r11d
+	movl	16(%rdi),%r10d
+
+	shlq	$26,%rcx
+	movq	%r8,%r9
+	shlq	$52,%r8
+	addq	%rcx,%rax
+	shrq	$12,%r9
+	addq	%rax,%r8
+	adcq	$0,%r9
+
+	shlq	$14,%r11
+	movq	%r10,%rax
+	shrq	$24,%r10
+	addq	%r11,%r9
+	shlq	$40,%rax
+	addq	%rax,%r9
+	adcq	$0,%r10
+
+	movq	%r10,%rax
+	movq	%r10,%rcx
+	andq	$3,%r10
+	shrq	$2,%rax
+	andq	$-4,%rcx
+	addq	%rcx,%rax
+	addq	%rax,%r8
+	adcq	$0,%r9
+	adcq	$0,%r10
+
+	movq	%r8,%rax
+	addq	$5,%r8
+	movq	%r9,%rcx
+	adcq	$0,%r9
+	adcq	$0,%r10
+	shrq	$2,%r10
+	cmovnzq	%r8,%rax
+	cmovnzq	%r9,%rcx
+
+	addq	0(%rdx),%rax
+	adcq	8(%rdx),%rcx
+	movq	%rax,0(%rsi)
+	movq	%rcx,8(%rsi)
+
+	ret
+ENDPROC(poly1305_emit_avx)
+#endif /* CONFIG_AS_AVX */
+
+#ifdef CONFIG_AS_AVX2
+.align	32
+ENTRY(poly1305_blocks_avx2)
+
+	movl	20(%rdi),%r8d
+	cmpq	$128,%rdx
+	jae	.Lblocks_avx2
+	testl	%r8d,%r8d
+	jz	.Lblocks
+
+.Lblocks_avx2:
+	andq	$-16,%rdx
+	jz	.Lno_data_avx2
+
+	vzeroupper
+
+	testl	%r8d,%r8d
+	jz	.Lbase2_64_avx2
+
+	testq	$63,%rdx
+	jz	.Leven_avx2
+
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+	pushq	%rdi
+
+.Lblocks_avx2_body:
+
+	movq	%rdx,%r15
+
+	movq	0(%rdi),%r8
+	movq	8(%rdi),%r9
+	movl	16(%rdi),%r10d
+
+	movq	24(%rdi),%r11
+	movq	32(%rdi),%r13
+
+
+	movl	%r8d,%r14d
+	andq	$-2147483648,%r8
+	movq	%r9,%r12
+	movl	%r9d,%ebx
+	andq	$-2147483648,%r9
+
+	shrq	$6,%r8
+	shlq	$52,%r12
+	addq	%r8,%r14
+	shrq	$12,%rbx
+	shrq	$18,%r9
+	addq	%r12,%r14
+	adcq	%r9,%rbx
+
+	movq	%r10,%r8
+	shlq	$40,%r8
+	shrq	$24,%r10
+	addq	%r8,%rbx
+	adcq	$0,%r10
+
+	movq	$-4,%r9
+	movq	%r10,%r8
+	andq	%r10,%r9
+	shrq	$2,%r8
+	andq	$3,%r10
+	addq	%r9,%r8
+	addq	%r8,%r14
+	adcq	$0,%rbx
+	adcq	$0,%r10
+
+	movq	%r13,%r12
+	movq	%r13,%rax
+	shrq	$2,%r13
+	addq	%r12,%r13
+
+.Lbase2_26_pre_avx2:
+	addq	0(%rsi),%r14
+	adcq	8(%rsi),%rbx
+	leaq	16(%rsi),%rsi
+	adcq	%rcx,%r10
+	subq	$16,%r15
+
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+	movq	%r12,%rax
+
+	testq	$63,%r15
+	jnz	.Lbase2_26_pre_avx2
+
+	testq	%rcx,%rcx
+	jz	.Lstore_base2_64_avx2
+
+
+	movq	%r14,%rax
+	movq	%r14,%rdx
+	shrq	$52,%r14
+	movq	%rbx,%r11
+	movq	%rbx,%r12
+	shrq	$26,%rdx
+	andq	$0x3ffffff,%rax
+	shlq	$12,%r11
+	andq	$0x3ffffff,%rdx
+	shrq	$14,%rbx
+	orq	%r11,%r14
+	shlq	$24,%r10
+	andq	$0x3ffffff,%r14
+	shrq	$40,%r12
+	andq	$0x3ffffff,%rbx
+	orq	%r12,%r10
+
+	testq	%r15,%r15
+	jz	.Lstore_base2_26_avx2
+
+	vmovd	%eax,%xmm0
+	vmovd	%edx,%xmm1
+	vmovd	%r14d,%xmm2
+	vmovd	%ebx,%xmm3
+	vmovd	%r10d,%xmm4
+	jmp	.Lproceed_avx2
+
+.align	32
+.Lstore_base2_64_avx2:
+	movq	%r14,0(%rdi)
+	movq	%rbx,8(%rdi)
+	movq	%r10,16(%rdi)
+	jmp	.Ldone_avx2
+
+.align	16
+.Lstore_base2_26_avx2:
+	movl	%eax,0(%rdi)
+	movl	%edx,4(%rdi)
+	movl	%r14d,8(%rdi)
+	movl	%ebx,12(%rdi)
+	movl	%r10d,16(%rdi)
+.align	16
+.Ldone_avx2:
+	movq	8(%rsp),%r15
+	movq	16(%rsp),%r14
+	movq	24(%rsp),%r13
+	movq	32(%rsp),%r12
+	movq	40(%rsp),%rbx
+	leaq	48(%rsp),%rsp
+
+.Lno_data_avx2:
+.Lblocks_avx2_epilogue:
+	ret
+
+
+.align	32
+.Lbase2_64_avx2:
+
+
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+	pushq	%rdi
+
+.Lbase2_64_avx2_body:
+
+	movq	%rdx,%r15
+
+	movq	24(%rdi),%r11
+	movq	32(%rdi),%r13
+
+	movq	0(%rdi),%r14
+	movq	8(%rdi),%rbx
+	movl	16(%rdi),%r10d
+
+	movq	%r13,%r12
+	movq	%r13,%rax
+	shrq	$2,%r13
+	addq	%r12,%r13
+
+	testq	$63,%rdx
+	jz	.Linit_avx2
+
+.Lbase2_64_pre_avx2:
+	addq	0(%rsi),%r14
+	adcq	8(%rsi),%rbx
+	leaq	16(%rsi),%rsi
+	adcq	%rcx,%r10
+	subq	$16,%r15
+
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+	movq	%r12,%rax
+
+	testq	$63,%r15
+	jnz	.Lbase2_64_pre_avx2
+
+.Linit_avx2:
+
+	movq	%r14,%rax
+	movq	%r14,%rdx
+	shrq	$52,%r14
+	movq	%rbx,%r8
+	movq	%rbx,%r9
+	shrq	$26,%rdx
+	andq	$0x3ffffff,%rax
+	shlq	$12,%r8
+	andq	$0x3ffffff,%rdx
+	shrq	$14,%rbx
+	orq	%r8,%r14
+	shlq	$24,%r10
+	andq	$0x3ffffff,%r14
+	shrq	$40,%r9
+	andq	$0x3ffffff,%rbx
+	orq	%r9,%r10
+
+	vmovd	%eax,%xmm0
+	vmovd	%edx,%xmm1
+	vmovd	%r14d,%xmm2
+	vmovd	%ebx,%xmm3
+	vmovd	%r10d,%xmm4
+	movl	$1,20(%rdi)
+
+	__poly1305_init_avx
+
+.Lproceed_avx2:
+	movq	%r15,%rdx
+
+	movq	8(%rsp),%r15
+	movq	16(%rsp),%r14
+	movq	24(%rsp),%r13
+	movq	32(%rsp),%r12
+	movq	40(%rsp),%rbx
+	leaq	48(%rsp),%rax
+	leaq	48(%rsp),%rsp
+
+.Lbase2_64_avx2_epilogue:
+	jmp	.Ldo_avx2
+
+
+.align	32
+.Leven_avx2:
+
+	vmovd	0(%rdi),%xmm0
+	vmovd	4(%rdi),%xmm1
+	vmovd	8(%rdi),%xmm2
+	vmovd	12(%rdi),%xmm3
+	vmovd	16(%rdi),%xmm4
+
+.Ldo_avx2:
+	leaq	8(%rsp),%r10
+	subq	$0x128,%rsp
+	leaq	.Lconst(%rip),%rcx
+	leaq	48+64(%rdi),%rdi
+	vmovdqa	96(%rcx),%ymm7
+
+
+	vmovdqu	-64(%rdi),%xmm9
+	andq	$-512,%rsp
+	vmovdqu	-48(%rdi),%xmm10
+	vmovdqu	-32(%rdi),%xmm6
+	vmovdqu	-16(%rdi),%xmm11
+	vmovdqu	0(%rdi),%xmm12
+	vmovdqu	16(%rdi),%xmm13
+	leaq	144(%rsp),%rax
+	vmovdqu	32(%rdi),%xmm14
+	vpermd	%ymm9,%ymm7,%ymm9
+	vmovdqu	48(%rdi),%xmm15
+	vpermd	%ymm10,%ymm7,%ymm10
+	vmovdqu	64(%rdi),%xmm5
+	vpermd	%ymm6,%ymm7,%ymm6
+	vmovdqa	%ymm9,0(%rsp)
+	vpermd	%ymm11,%ymm7,%ymm11
+	vmovdqa	%ymm10,32-144(%rax)
+	vpermd	%ymm12,%ymm7,%ymm12
+	vmovdqa	%ymm6,64-144(%rax)
+	vpermd	%ymm13,%ymm7,%ymm13
+	vmovdqa	%ymm11,96-144(%rax)
+	vpermd	%ymm14,%ymm7,%ymm14
+	vmovdqa	%ymm12,128-144(%rax)
+	vpermd	%ymm15,%ymm7,%ymm15
+	vmovdqa	%ymm13,160-144(%rax)
+	vpermd	%ymm5,%ymm7,%ymm5
+	vmovdqa	%ymm14,192-144(%rax)
+	vmovdqa	%ymm15,224-144(%rax)
+	vmovdqa	%ymm5,256-144(%rax)
+	vmovdqa	64(%rcx),%ymm5
+
+
+
+	vmovdqu	0(%rsi),%xmm7
+	vmovdqu	16(%rsi),%xmm8
+	vinserti128	$1,32(%rsi),%ymm7,%ymm7
+	vinserti128	$1,48(%rsi),%ymm8,%ymm8
+	leaq	64(%rsi),%rsi
+
+	vpsrldq	$6,%ymm7,%ymm9
+	vpsrldq	$6,%ymm8,%ymm10
+	vpunpckhqdq	%ymm8,%ymm7,%ymm6
+	vpunpcklqdq	%ymm10,%ymm9,%ymm9
+	vpunpcklqdq	%ymm8,%ymm7,%ymm7
+
+	vpsrlq	$30,%ymm9,%ymm10
+	vpsrlq	$4,%ymm9,%ymm9
+	vpsrlq	$26,%ymm7,%ymm8
+	vpsrlq	$40,%ymm6,%ymm6
+	vpand	%ymm5,%ymm9,%ymm9
+	vpand	%ymm5,%ymm7,%ymm7
+	vpand	%ymm5,%ymm8,%ymm8
+	vpand	%ymm5,%ymm10,%ymm10
+	vpor	32(%rcx),%ymm6,%ymm6
+
+	vpaddq	%ymm2,%ymm9,%ymm2
+	subq	$64,%rdx
+	jz	.Ltail_avx2
+	jmp	.Loop_avx2
+
+.align	32
+.Loop_avx2:
+
+	vpaddq	%ymm0,%ymm7,%ymm0
+	vmovdqa	0(%rsp),%ymm7
+	vpaddq	%ymm1,%ymm8,%ymm1
+	vmovdqa	32(%rsp),%ymm8
+	vpaddq	%ymm3,%ymm10,%ymm3
+	vmovdqa	96(%rsp),%ymm9
+	vpaddq	%ymm4,%ymm6,%ymm4
+	vmovdqa	48(%rax),%ymm10
+	vmovdqa	112(%rax),%ymm5
+
+	vpmuludq	%ymm2,%ymm7,%ymm13
+	vpmuludq	%ymm2,%ymm8,%ymm14
+	vpmuludq	%ymm2,%ymm9,%ymm15
+	vpmuludq	%ymm2,%ymm10,%ymm11
+	vpmuludq	%ymm2,%ymm5,%ymm12
+
+	vpmuludq	%ymm0,%ymm8,%ymm6
+	vpmuludq	%ymm1,%ymm8,%ymm2
+	vpaddq	%ymm6,%ymm12,%ymm12
+	vpaddq	%ymm2,%ymm13,%ymm13
+	vpmuludq	%ymm3,%ymm8,%ymm6
+	vpmuludq	64(%rsp),%ymm4,%ymm2
+	vpaddq	%ymm6,%ymm15,%ymm15
+	vpaddq	%ymm2,%ymm11,%ymm11
+	vmovdqa	-16(%rax),%ymm8
+
+	vpmuludq	%ymm0,%ymm7,%ymm6
+	vpmuludq	%ymm1,%ymm7,%ymm2
+	vpaddq	%ymm6,%ymm11,%ymm11
+	vpaddq	%ymm2,%ymm12,%ymm12
+	vpmuludq	%ymm3,%ymm7,%ymm6
+	vpmuludq	%ymm4,%ymm7,%ymm2
+	vmovdqu	0(%rsi),%xmm7
+	vpaddq	%ymm6,%ymm14,%ymm14
+	vpaddq	%ymm2,%ymm15,%ymm15
+	vinserti128	$1,32(%rsi),%ymm7,%ymm7
+
+	vpmuludq	%ymm3,%ymm8,%ymm6
+	vpmuludq	%ymm4,%ymm8,%ymm2
+	vmovdqu	16(%rsi),%xmm8
+	vpaddq	%ymm6,%ymm11,%ymm11
+	vpaddq	%ymm2,%ymm12,%ymm12
+	vmovdqa	16(%rax),%ymm2
+	vpmuludq	%ymm1,%ymm9,%ymm6
+	vpmuludq	%ymm0,%ymm9,%ymm9
+	vpaddq	%ymm6,%ymm14,%ymm14
+	vpaddq	%ymm9,%ymm13,%ymm13
+	vinserti128	$1,48(%rsi),%ymm8,%ymm8
+	leaq	64(%rsi),%rsi
+
+	vpmuludq	%ymm1,%ymm2,%ymm6
+	vpmuludq	%ymm0,%ymm2,%ymm2
+	vpsrldq	$6,%ymm7,%ymm9
+	vpaddq	%ymm6,%ymm15,%ymm15
+	vpaddq	%ymm2,%ymm14,%ymm14
+	vpmuludq	%ymm3,%ymm10,%ymm6
+	vpmuludq	%ymm4,%ymm10,%ymm2
+	vpsrldq	$6,%ymm8,%ymm10
+	vpaddq	%ymm6,%ymm12,%ymm12
+	vpaddq	%ymm2,%ymm13,%ymm13
+	vpunpckhqdq	%ymm8,%ymm7,%ymm6
+
+	vpmuludq	%ymm3,%ymm5,%ymm3
+	vpmuludq	%ymm4,%ymm5,%ymm4
+	vpunpcklqdq	%ymm8,%ymm7,%ymm7
+	vpaddq	%ymm3,%ymm13,%ymm2
+	vpaddq	%ymm4,%ymm14,%ymm3
+	vpunpcklqdq	%ymm10,%ymm9,%ymm10
+	vpmuludq	80(%rax),%ymm0,%ymm4
+	vpmuludq	%ymm1,%ymm5,%ymm0
+	vmovdqa	64(%rcx),%ymm5
+	vpaddq	%ymm4,%ymm15,%ymm4
+	vpaddq	%ymm0,%ymm11,%ymm0
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm11,%ymm12,%ymm1
+
+	vpsrlq	$26,%ymm4,%ymm15
+	vpand	%ymm5,%ymm4,%ymm4
+
+	vpsrlq	$4,%ymm10,%ymm9
+
+	vpsrlq	$26,%ymm1,%ymm12
+	vpand	%ymm5,%ymm1,%ymm1
+	vpaddq	%ymm12,%ymm2,%ymm2
+
+	vpaddq	%ymm15,%ymm0,%ymm0
+	vpsllq	$2,%ymm15,%ymm15
+	vpaddq	%ymm15,%ymm0,%ymm0
+
+	vpand	%ymm5,%ymm9,%ymm9
+	vpsrlq	$26,%ymm7,%ymm8
+
+	vpsrlq	$26,%ymm2,%ymm13
+	vpand	%ymm5,%ymm2,%ymm2
+	vpaddq	%ymm13,%ymm3,%ymm3
+
+	vpaddq	%ymm9,%ymm2,%ymm2
+	vpsrlq	$30,%ymm10,%ymm10
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm11,%ymm1,%ymm1
+
+	vpsrlq	$40,%ymm6,%ymm6
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vpand	%ymm5,%ymm7,%ymm7
+	vpand	%ymm5,%ymm8,%ymm8
+	vpand	%ymm5,%ymm10,%ymm10
+	vpor	32(%rcx),%ymm6,%ymm6
+
+	subq	$64,%rdx
+	jnz	.Loop_avx2
+
+.byte	0x66,0x90
+.Ltail_avx2:
+
+	vpaddq	%ymm0,%ymm7,%ymm0
+	vmovdqu	4(%rsp),%ymm7
+	vpaddq	%ymm1,%ymm8,%ymm1
+	vmovdqu	36(%rsp),%ymm8
+	vpaddq	%ymm3,%ymm10,%ymm3
+	vmovdqu	100(%rsp),%ymm9
+	vpaddq	%ymm4,%ymm6,%ymm4
+	vmovdqu	52(%rax),%ymm10
+	vmovdqu	116(%rax),%ymm5
+
+	vpmuludq	%ymm2,%ymm7,%ymm13
+	vpmuludq	%ymm2,%ymm8,%ymm14
+	vpmuludq	%ymm2,%ymm9,%ymm15
+	vpmuludq	%ymm2,%ymm10,%ymm11
+	vpmuludq	%ymm2,%ymm5,%ymm12
+
+	vpmuludq	%ymm0,%ymm8,%ymm6
+	vpmuludq	%ymm1,%ymm8,%ymm2
+	vpaddq	%ymm6,%ymm12,%ymm12
+	vpaddq	%ymm2,%ymm13,%ymm13
+	vpmuludq	%ymm3,%ymm8,%ymm6
+	vpmuludq	68(%rsp),%ymm4,%ymm2
+	vpaddq	%ymm6,%ymm15,%ymm15
+	vpaddq	%ymm2,%ymm11,%ymm11
+
+	vpmuludq	%ymm0,%ymm7,%ymm6
+	vpmuludq	%ymm1,%ymm7,%ymm2
+	vpaddq	%ymm6,%ymm11,%ymm11
+	vmovdqu	-12(%rax),%ymm8
+	vpaddq	%ymm2,%ymm12,%ymm12
+	vpmuludq	%ymm3,%ymm7,%ymm6
+	vpmuludq	%ymm4,%ymm7,%ymm2
+	vpaddq	%ymm6,%ymm14,%ymm14
+	vpaddq	%ymm2,%ymm15,%ymm15
+
+	vpmuludq	%ymm3,%ymm8,%ymm6
+	vpmuludq	%ymm4,%ymm8,%ymm2
+	vpaddq	%ymm6,%ymm11,%ymm11
+	vpaddq	%ymm2,%ymm12,%ymm12
+	vmovdqu	20(%rax),%ymm2
+	vpmuludq	%ymm1,%ymm9,%ymm6
+	vpmuludq	%ymm0,%ymm9,%ymm9
+	vpaddq	%ymm6,%ymm14,%ymm14
+	vpaddq	%ymm9,%ymm13,%ymm13
+
+	vpmuludq	%ymm1,%ymm2,%ymm6
+	vpmuludq	%ymm0,%ymm2,%ymm2
+	vpaddq	%ymm6,%ymm15,%ymm15
+	vpaddq	%ymm2,%ymm14,%ymm14
+	vpmuludq	%ymm3,%ymm10,%ymm6
+	vpmuludq	%ymm4,%ymm10,%ymm2
+	vpaddq	%ymm6,%ymm12,%ymm12
+	vpaddq	%ymm2,%ymm13,%ymm13
+
+	vpmuludq	%ymm3,%ymm5,%ymm3
+	vpmuludq	%ymm4,%ymm5,%ymm4
+	vpaddq	%ymm3,%ymm13,%ymm2
+	vpaddq	%ymm4,%ymm14,%ymm3
+	vpmuludq	84(%rax),%ymm0,%ymm4
+	vpmuludq	%ymm1,%ymm5,%ymm0
+	vmovdqa	64(%rcx),%ymm5
+	vpaddq	%ymm4,%ymm15,%ymm4
+	vpaddq	%ymm0,%ymm11,%ymm0
+
+	vpsrldq	$8,%ymm12,%ymm8
+	vpsrldq	$8,%ymm2,%ymm9
+	vpsrldq	$8,%ymm3,%ymm10
+	vpsrldq	$8,%ymm4,%ymm6
+	vpsrldq	$8,%ymm0,%ymm7
+	vpaddq	%ymm8,%ymm12,%ymm12
+	vpaddq	%ymm9,%ymm2,%ymm2
+	vpaddq	%ymm10,%ymm3,%ymm3
+	vpaddq	%ymm6,%ymm4,%ymm4
+	vpaddq	%ymm7,%ymm0,%ymm0
+
+	vpermq	$0x2,%ymm3,%ymm10
+	vpermq	$0x2,%ymm4,%ymm6
+	vpermq	$0x2,%ymm0,%ymm7
+	vpermq	$0x2,%ymm12,%ymm8
+	vpermq	$0x2,%ymm2,%ymm9
+	vpaddq	%ymm10,%ymm3,%ymm3
+	vpaddq	%ymm6,%ymm4,%ymm4
+	vpaddq	%ymm7,%ymm0,%ymm0
+	vpaddq	%ymm8,%ymm12,%ymm12
+	vpaddq	%ymm9,%ymm2,%ymm2
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm11,%ymm12,%ymm1
+
+	vpsrlq	$26,%ymm4,%ymm15
+	vpand	%ymm5,%ymm4,%ymm4
+
+	vpsrlq	$26,%ymm1,%ymm12
+	vpand	%ymm5,%ymm1,%ymm1
+	vpaddq	%ymm12,%ymm2,%ymm2
+
+	vpaddq	%ymm15,%ymm0,%ymm0
+	vpsllq	$2,%ymm15,%ymm15
+	vpaddq	%ymm15,%ymm0,%ymm0
+
+	vpsrlq	$26,%ymm2,%ymm13
+	vpand	%ymm5,%ymm2,%ymm2
+	vpaddq	%ymm13,%ymm3,%ymm3
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm11,%ymm1,%ymm1
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vmovd	%xmm0,-112(%rdi)
+	vmovd	%xmm1,-108(%rdi)
+	vmovd	%xmm2,-104(%rdi)
+	vmovd	%xmm3,-100(%rdi)
+	vmovd	%xmm4,-96(%rdi)
+	leaq	-8(%r10),%rsp
+
+	vzeroupper
+	ret
+
+ENDPROC(poly1305_blocks_avx2)
+#endif /* CONFIG_AS_AVX2 */
+
+#ifdef CONFIG_AS_AVX512
+.align	32
+ENTRY(poly1305_blocks_avx512)
+
+	movl	20(%rdi),%r8d
+	cmpq	$128,%rdx
+	jae	.Lblocks_avx2_512
+	testl	%r8d,%r8d
+	jz	.Lblocks
+
+.Lblocks_avx2_512:
+	andq	$-16,%rdx
+	jz	.Lno_data_avx2_512
+
+	vzeroupper
+
+	testl	%r8d,%r8d
+	jz	.Lbase2_64_avx2_512
+
+	testq	$63,%rdx
+	jz	.Leven_avx2_512
+
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+	pushq	%rdi
+
+.Lblocks_avx2_body_512:
+
+	movq	%rdx,%r15
+
+	movq	0(%rdi),%r8
+	movq	8(%rdi),%r9
+	movl	16(%rdi),%r10d
+
+	movq	24(%rdi),%r11
+	movq	32(%rdi),%r13
+
+
+	movl	%r8d,%r14d
+	andq	$-2147483648,%r8
+	movq	%r9,%r12
+	movl	%r9d,%ebx
+	andq	$-2147483648,%r9
+
+	shrq	$6,%r8
+	shlq	$52,%r12
+	addq	%r8,%r14
+	shrq	$12,%rbx
+	shrq	$18,%r9
+	addq	%r12,%r14
+	adcq	%r9,%rbx
+
+	movq	%r10,%r8
+	shlq	$40,%r8
+	shrq	$24,%r10
+	addq	%r8,%rbx
+	adcq	$0,%r10
+
+	movq	$-4,%r9
+	movq	%r10,%r8
+	andq	%r10,%r9
+	shrq	$2,%r8
+	andq	$3,%r10
+	addq	%r9,%r8
+	addq	%r8,%r14
+	adcq	$0,%rbx
+	adcq	$0,%r10
+
+	movq	%r13,%r12
+	movq	%r13,%rax
+	shrq	$2,%r13
+	addq	%r12,%r13
+
+.Lbase2_26_pre_avx2_512:
+	addq	0(%rsi),%r14
+	adcq	8(%rsi),%rbx
+	leaq	16(%rsi),%rsi
+	adcq	%rcx,%r10
+	subq	$16,%r15
+
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+	movq	%r12,%rax
+
+	testq	$63,%r15
+	jnz	.Lbase2_26_pre_avx2_512
+
+	testq	%rcx,%rcx
+	jz	.Lstore_base2_64_avx2_512
+
+
+	movq	%r14,%rax
+	movq	%r14,%rdx
+	shrq	$52,%r14
+	movq	%rbx,%r11
+	movq	%rbx,%r12
+	shrq	$26,%rdx
+	andq	$0x3ffffff,%rax
+	shlq	$12,%r11
+	andq	$0x3ffffff,%rdx
+	shrq	$14,%rbx
+	orq	%r11,%r14
+	shlq	$24,%r10
+	andq	$0x3ffffff,%r14
+	shrq	$40,%r12
+	andq	$0x3ffffff,%rbx
+	orq	%r12,%r10
+
+	testq	%r15,%r15
+	jz	.Lstore_base2_26_avx2_512
+
+	vmovd	%eax,%xmm0
+	vmovd	%edx,%xmm1
+	vmovd	%r14d,%xmm2
+	vmovd	%ebx,%xmm3
+	vmovd	%r10d,%xmm4
+	jmp	.Lproceed_avx2_512
+
+.align	32
+.Lstore_base2_64_avx2_512:
+	movq	%r14,0(%rdi)
+	movq	%rbx,8(%rdi)
+	movq	%r10,16(%rdi)
+	jmp	.Ldone_avx2_512
+
+.align	16
+.Lstore_base2_26_avx2_512:
+	movl	%eax,0(%rdi)
+	movl	%edx,4(%rdi)
+	movl	%r14d,8(%rdi)
+	movl	%ebx,12(%rdi)
+	movl	%r10d,16(%rdi)
+.align	16
+.Ldone_avx2_512:
+	movq	8(%rsp),%r15
+	movq	16(%rsp),%r14
+	movq	24(%rsp),%r13
+	movq	32(%rsp),%r12
+	movq	40(%rsp),%rbx
+	leaq	48(%rsp),%rsp
+
+.Lno_data_avx2_512:
+.Lblocks_avx2_epilogue_512:
+	ret
+
+
+.align	32
+.Lbase2_64_avx2_512:
+
+	pushq	%rbx
+	pushq	%r12
+	pushq	%r13
+	pushq	%r14
+	pushq	%r15
+	pushq	%rdi
+
+.Lbase2_64_avx2_body_512:
+
+	movq	%rdx,%r15
+
+	movq	24(%rdi),%r11
+	movq	32(%rdi),%r13
+
+	movq	0(%rdi),%r14
+	movq	8(%rdi),%rbx
+	movl	16(%rdi),%r10d
+
+	movq	%r13,%r12
+	movq	%r13,%rax
+	shrq	$2,%r13
+	addq	%r12,%r13
+
+	testq	$63,%rdx
+	jz	.Linit_avx2_512
+
+.Lbase2_64_pre_avx2_512:
+	addq	0(%rsi),%r14
+	adcq	8(%rsi),%rbx
+	leaq	16(%rsi),%rsi
+	adcq	%rcx,%r10
+	subq	$16,%r15
+
+	movq	%rdi,0(%rsp)
+	__poly1305_block
+	movq	0(%rsp),%rdi
+	movq	%r12,%rax
+
+	testq	$63,%r15
+	jnz	.Lbase2_64_pre_avx2_512
+
+.Linit_avx2_512:
+
+	movq	%r14,%rax
+	movq	%r14,%rdx
+	shrq	$52,%r14
+	movq	%rbx,%r8
+	movq	%rbx,%r9
+	shrq	$26,%rdx
+	andq	$0x3ffffff,%rax
+	shlq	$12,%r8
+	andq	$0x3ffffff,%rdx
+	shrq	$14,%rbx
+	orq	%r8,%r14
+	shlq	$24,%r10
+	andq	$0x3ffffff,%r14
+	shrq	$40,%r9
+	andq	$0x3ffffff,%rbx
+	orq	%r9,%r10
+
+	vmovd	%eax,%xmm0
+	vmovd	%edx,%xmm1
+	vmovd	%r14d,%xmm2
+	vmovd	%ebx,%xmm3
+	vmovd	%r10d,%xmm4
+	movl	$1,20(%rdi)
+
+	__poly1305_init_avx
+
+.Lproceed_avx2_512:
+	movq	%r15,%rdx
+
+	movq	8(%rsp),%r15
+	movq	16(%rsp),%r14
+	movq	24(%rsp),%r13
+	movq	32(%rsp),%r12
+	movq	40(%rsp),%rbx
+	leaq	48(%rsp),%rax
+	leaq	48(%rsp),%rsp
+
+.Lbase2_64_avx2_epilogue_512:
+	jmp	.Ldo_avx2_512
+
+
+.align	32
+.Leven_avx2_512:
+
+	vmovd	0(%rdi),%xmm0
+	vmovd	4(%rdi),%xmm1
+	vmovd	8(%rdi),%xmm2
+	vmovd	12(%rdi),%xmm3
+	vmovd	16(%rdi),%xmm4
+
+.Ldo_avx2_512:
+	cmpq	$512,%rdx
+	jae	.Lblocks_avx512
+.Lskip_avx512:
+	leaq	8(%rsp),%r10
+
+	subq	$0x128,%rsp
+	leaq	.Lconst(%rip),%rcx
+	leaq	48+64(%rdi),%rdi
+	vmovdqa	96(%rcx),%ymm7
+
+
+	vmovdqu	-64(%rdi),%xmm9
+	andq	$-512,%rsp
+	vmovdqu	-48(%rdi),%xmm10
+	vmovdqu	-32(%rdi),%xmm6
+	vmovdqu	-16(%rdi),%xmm11
+	vmovdqu	0(%rdi),%xmm12
+	vmovdqu	16(%rdi),%xmm13
+	leaq	144(%rsp),%rax
+	vmovdqu	32(%rdi),%xmm14
+	vpermd	%ymm9,%ymm7,%ymm9
+	vmovdqu	48(%rdi),%xmm15
+	vpermd	%ymm10,%ymm7,%ymm10
+	vmovdqu	64(%rdi),%xmm5
+	vpermd	%ymm6,%ymm7,%ymm6
+	vmovdqa	%ymm9,0(%rsp)
+	vpermd	%ymm11,%ymm7,%ymm11
+	vmovdqa	%ymm10,32-144(%rax)
+	vpermd	%ymm12,%ymm7,%ymm12
+	vmovdqa	%ymm6,64-144(%rax)
+	vpermd	%ymm13,%ymm7,%ymm13
+	vmovdqa	%ymm11,96-144(%rax)
+	vpermd	%ymm14,%ymm7,%ymm14
+	vmovdqa	%ymm12,128-144(%rax)
+	vpermd	%ymm15,%ymm7,%ymm15
+	vmovdqa	%ymm13,160-144(%rax)
+	vpermd	%ymm5,%ymm7,%ymm5
+	vmovdqa	%ymm14,192-144(%rax)
+	vmovdqa	%ymm15,224-144(%rax)
+	vmovdqa	%ymm5,256-144(%rax)
+	vmovdqa	64(%rcx),%ymm5
+
+
+
+	vmovdqu	0(%rsi),%xmm7
+	vmovdqu	16(%rsi),%xmm8
+	vinserti128	$1,32(%rsi),%ymm7,%ymm7
+	vinserti128	$1,48(%rsi),%ymm8,%ymm8
+	leaq	64(%rsi),%rsi
+
+	vpsrldq	$6,%ymm7,%ymm9
+	vpsrldq	$6,%ymm8,%ymm10
+	vpunpckhqdq	%ymm8,%ymm7,%ymm6
+	vpunpcklqdq	%ymm10,%ymm9,%ymm9
+	vpunpcklqdq	%ymm8,%ymm7,%ymm7
+
+	vpsrlq	$30,%ymm9,%ymm10
+	vpsrlq	$4,%ymm9,%ymm9
+	vpsrlq	$26,%ymm7,%ymm8
+	vpsrlq	$40,%ymm6,%ymm6
+	vpand	%ymm5,%ymm9,%ymm9
+	vpand	%ymm5,%ymm7,%ymm7
+	vpand	%ymm5,%ymm8,%ymm8
+	vpand	%ymm5,%ymm10,%ymm10
+	vpor	32(%rcx),%ymm6,%ymm6
+
+	vpaddq	%ymm2,%ymm9,%ymm2
+	subq	$64,%rdx
+	jz	.Ltail_avx2_512
+	jmp	.Loop_avx2_512
+
+.align	32
+.Loop_avx2_512:
+
+	vpaddq	%ymm0,%ymm7,%ymm0
+	vmovdqa	0(%rsp),%ymm7
+	vpaddq	%ymm1,%ymm8,%ymm1
+	vmovdqa	32(%rsp),%ymm8
+	vpaddq	%ymm3,%ymm10,%ymm3
+	vmovdqa	96(%rsp),%ymm9
+	vpaddq	%ymm4,%ymm6,%ymm4
+	vmovdqa	48(%rax),%ymm10
+	vmovdqa	112(%rax),%ymm5
+
+	vpmuludq	%ymm2,%ymm7,%ymm13
+	vpmuludq	%ymm2,%ymm8,%ymm14
+	vpmuludq	%ymm2,%ymm9,%ymm15
+	vpmuludq	%ymm2,%ymm10,%ymm11
+	vpmuludq	%ymm2,%ymm5,%ymm12
+
+	vpmuludq	%ymm0,%ymm8,%ymm6
+	vpmuludq	%ymm1,%ymm8,%ymm2
+	vpaddq	%ymm6,%ymm12,%ymm12
+	vpaddq	%ymm2,%ymm13,%ymm13
+	vpmuludq	%ymm3,%ymm8,%ymm6
+	vpmuludq	64(%rsp),%ymm4,%ymm2
+	vpaddq	%ymm6,%ymm15,%ymm15
+	vpaddq	%ymm2,%ymm11,%ymm11
+	vmovdqa	-16(%rax),%ymm8
+
+	vpmuludq	%ymm0,%ymm7,%ymm6
+	vpmuludq	%ymm1,%ymm7,%ymm2
+	vpaddq	%ymm6,%ymm11,%ymm11
+	vpaddq	%ymm2,%ymm12,%ymm12
+	vpmuludq	%ymm3,%ymm7,%ymm6
+	vpmuludq	%ymm4,%ymm7,%ymm2
+	vmovdqu	0(%rsi),%xmm7
+	vpaddq	%ymm6,%ymm14,%ymm14
+	vpaddq	%ymm2,%ymm15,%ymm15
+	vinserti128	$1,32(%rsi),%ymm7,%ymm7
+
+	vpmuludq	%ymm3,%ymm8,%ymm6
+	vpmuludq	%ymm4,%ymm8,%ymm2
+	vmovdqu	16(%rsi),%xmm8
+	vpaddq	%ymm6,%ymm11,%ymm11
+	vpaddq	%ymm2,%ymm12,%ymm12
+	vmovdqa	16(%rax),%ymm2
+	vpmuludq	%ymm1,%ymm9,%ymm6
+	vpmuludq	%ymm0,%ymm9,%ymm9
+	vpaddq	%ymm6,%ymm14,%ymm14
+	vpaddq	%ymm9,%ymm13,%ymm13
+	vinserti128	$1,48(%rsi),%ymm8,%ymm8
+	leaq	64(%rsi),%rsi
+
+	vpmuludq	%ymm1,%ymm2,%ymm6
+	vpmuludq	%ymm0,%ymm2,%ymm2
+	vpsrldq	$6,%ymm7,%ymm9
+	vpaddq	%ymm6,%ymm15,%ymm15
+	vpaddq	%ymm2,%ymm14,%ymm14
+	vpmuludq	%ymm3,%ymm10,%ymm6
+	vpmuludq	%ymm4,%ymm10,%ymm2
+	vpsrldq	$6,%ymm8,%ymm10
+	vpaddq	%ymm6,%ymm12,%ymm12
+	vpaddq	%ymm2,%ymm13,%ymm13
+	vpunpckhqdq	%ymm8,%ymm7,%ymm6
+
+	vpmuludq	%ymm3,%ymm5,%ymm3
+	vpmuludq	%ymm4,%ymm5,%ymm4
+	vpunpcklqdq	%ymm8,%ymm7,%ymm7
+	vpaddq	%ymm3,%ymm13,%ymm2
+	vpaddq	%ymm4,%ymm14,%ymm3
+	vpunpcklqdq	%ymm10,%ymm9,%ymm10
+	vpmuludq	80(%rax),%ymm0,%ymm4
+	vpmuludq	%ymm1,%ymm5,%ymm0
+	vmovdqa	64(%rcx),%ymm5
+	vpaddq	%ymm4,%ymm15,%ymm4
+	vpaddq	%ymm0,%ymm11,%ymm0
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm11,%ymm12,%ymm1
+
+	vpsrlq	$26,%ymm4,%ymm15
+	vpand	%ymm5,%ymm4,%ymm4
+
+	vpsrlq	$4,%ymm10,%ymm9
+
+	vpsrlq	$26,%ymm1,%ymm12
+	vpand	%ymm5,%ymm1,%ymm1
+	vpaddq	%ymm12,%ymm2,%ymm2
+
+	vpaddq	%ymm15,%ymm0,%ymm0
+	vpsllq	$2,%ymm15,%ymm15
+	vpaddq	%ymm15,%ymm0,%ymm0
+
+	vpand	%ymm5,%ymm9,%ymm9
+	vpsrlq	$26,%ymm7,%ymm8
+
+	vpsrlq	$26,%ymm2,%ymm13
+	vpand	%ymm5,%ymm2,%ymm2
+	vpaddq	%ymm13,%ymm3,%ymm3
+
+	vpaddq	%ymm9,%ymm2,%ymm2
+	vpsrlq	$30,%ymm10,%ymm10
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm11,%ymm1,%ymm1
+
+	vpsrlq	$40,%ymm6,%ymm6
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vpand	%ymm5,%ymm7,%ymm7
+	vpand	%ymm5,%ymm8,%ymm8
+	vpand	%ymm5,%ymm10,%ymm10
+	vpor	32(%rcx),%ymm6,%ymm6
+
+	subq	$64,%rdx
+	jnz	.Loop_avx2_512
+
+.byte	0x66,0x90
+.Ltail_avx2_512:
+
+	vpaddq	%ymm0,%ymm7,%ymm0
+	vmovdqu	4(%rsp),%ymm7
+	vpaddq	%ymm1,%ymm8,%ymm1
+	vmovdqu	36(%rsp),%ymm8
+	vpaddq	%ymm3,%ymm10,%ymm3
+	vmovdqu	100(%rsp),%ymm9
+	vpaddq	%ymm4,%ymm6,%ymm4
+	vmovdqu	52(%rax),%ymm10
+	vmovdqu	116(%rax),%ymm5
+
+	vpmuludq	%ymm2,%ymm7,%ymm13
+	vpmuludq	%ymm2,%ymm8,%ymm14
+	vpmuludq	%ymm2,%ymm9,%ymm15
+	vpmuludq	%ymm2,%ymm10,%ymm11
+	vpmuludq	%ymm2,%ymm5,%ymm12
+
+	vpmuludq	%ymm0,%ymm8,%ymm6
+	vpmuludq	%ymm1,%ymm8,%ymm2
+	vpaddq	%ymm6,%ymm12,%ymm12
+	vpaddq	%ymm2,%ymm13,%ymm13
+	vpmuludq	%ymm3,%ymm8,%ymm6
+	vpmuludq	68(%rsp),%ymm4,%ymm2
+	vpaddq	%ymm6,%ymm15,%ymm15
+	vpaddq	%ymm2,%ymm11,%ymm11
+
+	vpmuludq	%ymm0,%ymm7,%ymm6
+	vpmuludq	%ymm1,%ymm7,%ymm2
+	vpaddq	%ymm6,%ymm11,%ymm11
+	vmovdqu	-12(%rax),%ymm8
+	vpaddq	%ymm2,%ymm12,%ymm12
+	vpmuludq	%ymm3,%ymm7,%ymm6
+	vpmuludq	%ymm4,%ymm7,%ymm2
+	vpaddq	%ymm6,%ymm14,%ymm14
+	vpaddq	%ymm2,%ymm15,%ymm15
+
+	vpmuludq	%ymm3,%ymm8,%ymm6
+	vpmuludq	%ymm4,%ymm8,%ymm2
+	vpaddq	%ymm6,%ymm11,%ymm11
+	vpaddq	%ymm2,%ymm12,%ymm12
+	vmovdqu	20(%rax),%ymm2
+	vpmuludq	%ymm1,%ymm9,%ymm6
+	vpmuludq	%ymm0,%ymm9,%ymm9
+	vpaddq	%ymm6,%ymm14,%ymm14
+	vpaddq	%ymm9,%ymm13,%ymm13
+
+	vpmuludq	%ymm1,%ymm2,%ymm6
+	vpmuludq	%ymm0,%ymm2,%ymm2
+	vpaddq	%ymm6,%ymm15,%ymm15
+	vpaddq	%ymm2,%ymm14,%ymm14
+	vpmuludq	%ymm3,%ymm10,%ymm6
+	vpmuludq	%ymm4,%ymm10,%ymm2
+	vpaddq	%ymm6,%ymm12,%ymm12
+	vpaddq	%ymm2,%ymm13,%ymm13
+
+	vpmuludq	%ymm3,%ymm5,%ymm3
+	vpmuludq	%ymm4,%ymm5,%ymm4
+	vpaddq	%ymm3,%ymm13,%ymm2
+	vpaddq	%ymm4,%ymm14,%ymm3
+	vpmuludq	84(%rax),%ymm0,%ymm4
+	vpmuludq	%ymm1,%ymm5,%ymm0
+	vmovdqa	64(%rcx),%ymm5
+	vpaddq	%ymm4,%ymm15,%ymm4
+	vpaddq	%ymm0,%ymm11,%ymm0
+
+	vpsrldq	$8,%ymm12,%ymm8
+	vpsrldq	$8,%ymm2,%ymm9
+	vpsrldq	$8,%ymm3,%ymm10
+	vpsrldq	$8,%ymm4,%ymm6
+	vpsrldq	$8,%ymm0,%ymm7
+	vpaddq	%ymm8,%ymm12,%ymm12
+	vpaddq	%ymm9,%ymm2,%ymm2
+	vpaddq	%ymm10,%ymm3,%ymm3
+	vpaddq	%ymm6,%ymm4,%ymm4
+	vpaddq	%ymm7,%ymm0,%ymm0
+
+	vpermq	$0x2,%ymm3,%ymm10
+	vpermq	$0x2,%ymm4,%ymm6
+	vpermq	$0x2,%ymm0,%ymm7
+	vpermq	$0x2,%ymm12,%ymm8
+	vpermq	$0x2,%ymm2,%ymm9
+	vpaddq	%ymm10,%ymm3,%ymm3
+	vpaddq	%ymm6,%ymm4,%ymm4
+	vpaddq	%ymm7,%ymm0,%ymm0
+	vpaddq	%ymm8,%ymm12,%ymm12
+	vpaddq	%ymm9,%ymm2,%ymm2
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm11,%ymm12,%ymm1
+
+	vpsrlq	$26,%ymm4,%ymm15
+	vpand	%ymm5,%ymm4,%ymm4
+
+	vpsrlq	$26,%ymm1,%ymm12
+	vpand	%ymm5,%ymm1,%ymm1
+	vpaddq	%ymm12,%ymm2,%ymm2
+
+	vpaddq	%ymm15,%ymm0,%ymm0
+	vpsllq	$2,%ymm15,%ymm15
+	vpaddq	%ymm15,%ymm0,%ymm0
+
+	vpsrlq	$26,%ymm2,%ymm13
+	vpand	%ymm5,%ymm2,%ymm2
+	vpaddq	%ymm13,%ymm3,%ymm3
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm11,%ymm1,%ymm1
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vmovd	%xmm0,-112(%rdi)
+	vmovd	%xmm1,-108(%rdi)
+	vmovd	%xmm2,-104(%rdi)
+	vmovd	%xmm3,-100(%rdi)
+	vmovd	%xmm4,-96(%rdi)
+	leaq	-8(%r10),%rsp
+
+	vzeroupper
+	ret
+
+.Lblocks_avx512:
+
+	movl	$15,%eax
+	kmovw	%eax,%k2
+	leaq	8(%rsp),%r10
+
+	subq	$0x128,%rsp
+	leaq	.Lconst(%rip),%rcx
+	leaq	48+64(%rdi),%rdi
+	vmovdqa	96(%rcx),%ymm9
+
+	vmovdqu32	-64(%rdi),%zmm16{%k2}{z}
+	andq	$-512,%rsp
+	vmovdqu32	-48(%rdi),%zmm17{%k2}{z}
+	movq	$0x20,%rax
+	vmovdqu32	-32(%rdi),%zmm21{%k2}{z}
+	vmovdqu32	-16(%rdi),%zmm18{%k2}{z}
+	vmovdqu32	0(%rdi),%zmm22{%k2}{z}
+	vmovdqu32	16(%rdi),%zmm19{%k2}{z}
+	vmovdqu32	32(%rdi),%zmm23{%k2}{z}
+	vmovdqu32	48(%rdi),%zmm20{%k2}{z}
+	vmovdqu32	64(%rdi),%zmm24{%k2}{z}
+	vpermd	%zmm16,%zmm9,%zmm16
+	vpbroadcastq	64(%rcx),%zmm5
+	vpermd	%zmm17,%zmm9,%zmm17
+	vpermd	%zmm21,%zmm9,%zmm21
+	vpermd	%zmm18,%zmm9,%zmm18
+	vmovdqa64	%zmm16,0(%rsp){%k2}
+	vpsrlq	$32,%zmm16,%zmm7
+	vpermd	%zmm22,%zmm9,%zmm22
+	vmovdqu64	%zmm17,0(%rsp,%rax,1){%k2}
+	vpsrlq	$32,%zmm17,%zmm8
+	vpermd	%zmm19,%zmm9,%zmm19
+	vmovdqa64	%zmm21,64(%rsp){%k2}
+	vpermd	%zmm23,%zmm9,%zmm23
+	vpermd	%zmm20,%zmm9,%zmm20
+	vmovdqu64	%zmm18,64(%rsp,%rax,1){%k2}
+	vpermd	%zmm24,%zmm9,%zmm24
+	vmovdqa64	%zmm22,128(%rsp){%k2}
+	vmovdqu64	%zmm19,128(%rsp,%rax,1){%k2}
+	vmovdqa64	%zmm23,192(%rsp){%k2}
+	vmovdqu64	%zmm20,192(%rsp,%rax,1){%k2}
+	vmovdqa64	%zmm24,256(%rsp){%k2}
+
+	vpmuludq	%zmm7,%zmm16,%zmm11
+	vpmuludq	%zmm7,%zmm17,%zmm12
+	vpmuludq	%zmm7,%zmm18,%zmm13
+	vpmuludq	%zmm7,%zmm19,%zmm14
+	vpmuludq	%zmm7,%zmm20,%zmm15
+	vpsrlq	$32,%zmm18,%zmm9
+
+	vpmuludq	%zmm8,%zmm24,%zmm25
+	vpmuludq	%zmm8,%zmm16,%zmm26
+	vpmuludq	%zmm8,%zmm17,%zmm27
+	vpmuludq	%zmm8,%zmm18,%zmm28
+	vpmuludq	%zmm8,%zmm19,%zmm29
+	vpsrlq	$32,%zmm19,%zmm10
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm26,%zmm12,%zmm12
+	vpaddq	%zmm27,%zmm13,%zmm13
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+
+	vpmuludq	%zmm9,%zmm23,%zmm25
+	vpmuludq	%zmm9,%zmm24,%zmm26
+	vpmuludq	%zmm9,%zmm17,%zmm28
+	vpmuludq	%zmm9,%zmm18,%zmm29
+	vpmuludq	%zmm9,%zmm16,%zmm27
+	vpsrlq	$32,%zmm20,%zmm6
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm26,%zmm12,%zmm12
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vpmuludq	%zmm10,%zmm22,%zmm25
+	vpmuludq	%zmm10,%zmm16,%zmm28
+	vpmuludq	%zmm10,%zmm17,%zmm29
+	vpmuludq	%zmm10,%zmm23,%zmm26
+	vpmuludq	%zmm10,%zmm24,%zmm27
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm26,%zmm12,%zmm12
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vpmuludq	%zmm6,%zmm24,%zmm28
+	vpmuludq	%zmm6,%zmm16,%zmm29
+	vpmuludq	%zmm6,%zmm21,%zmm25
+	vpmuludq	%zmm6,%zmm22,%zmm26
+	vpmuludq	%zmm6,%zmm23,%zmm27
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm26,%zmm12,%zmm12
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vmovdqu64	0(%rsi),%zmm10
+	vmovdqu64	64(%rsi),%zmm6
+	leaq	128(%rsi),%rsi
+
+	vpsrlq	$26,%zmm14,%zmm28
+	vpandq	%zmm5,%zmm14,%zmm14
+	vpaddq	%zmm28,%zmm15,%zmm15
+
+	vpsrlq	$26,%zmm11,%zmm25
+	vpandq	%zmm5,%zmm11,%zmm11
+	vpaddq	%zmm25,%zmm12,%zmm12
+
+	vpsrlq	$26,%zmm15,%zmm29
+	vpandq	%zmm5,%zmm15,%zmm15
+
+	vpsrlq	$26,%zmm12,%zmm26
+	vpandq	%zmm5,%zmm12,%zmm12
+	vpaddq	%zmm26,%zmm13,%zmm13
+
+	vpaddq	%zmm29,%zmm11,%zmm11
+	vpsllq	$2,%zmm29,%zmm29
+	vpaddq	%zmm29,%zmm11,%zmm11
+
+	vpsrlq	$26,%zmm13,%zmm27
+	vpandq	%zmm5,%zmm13,%zmm13
+	vpaddq	%zmm27,%zmm14,%zmm14
+
+	vpsrlq	$26,%zmm11,%zmm25
+	vpandq	%zmm5,%zmm11,%zmm11
+	vpaddq	%zmm25,%zmm12,%zmm12
+
+	vpsrlq	$26,%zmm14,%zmm28
+	vpandq	%zmm5,%zmm14,%zmm14
+	vpaddq	%zmm28,%zmm15,%zmm15
+
+	vpunpcklqdq	%zmm6,%zmm10,%zmm7
+	vpunpckhqdq	%zmm6,%zmm10,%zmm6
+
+	vmovdqa32	128(%rcx),%zmm25
+	movl	$0x7777,%eax
+	kmovw	%eax,%k1
+
+	vpermd	%zmm16,%zmm25,%zmm16
+	vpermd	%zmm17,%zmm25,%zmm17
+	vpermd	%zmm18,%zmm25,%zmm18
+	vpermd	%zmm19,%zmm25,%zmm19
+	vpermd	%zmm20,%zmm25,%zmm20
+
+	vpermd	%zmm11,%zmm25,%zmm16{%k1}
+	vpermd	%zmm12,%zmm25,%zmm17{%k1}
+	vpermd	%zmm13,%zmm25,%zmm18{%k1}
+	vpermd	%zmm14,%zmm25,%zmm19{%k1}
+	vpermd	%zmm15,%zmm25,%zmm20{%k1}
+
+	vpslld	$2,%zmm17,%zmm21
+	vpslld	$2,%zmm18,%zmm22
+	vpslld	$2,%zmm19,%zmm23
+	vpslld	$2,%zmm20,%zmm24
+	vpaddd	%zmm17,%zmm21,%zmm21
+	vpaddd	%zmm18,%zmm22,%zmm22
+	vpaddd	%zmm19,%zmm23,%zmm23
+	vpaddd	%zmm20,%zmm24,%zmm24
+
+	vpbroadcastq	32(%rcx),%zmm30
+
+	vpsrlq	$52,%zmm7,%zmm9
+	vpsllq	$12,%zmm6,%zmm10
+	vporq	%zmm10,%zmm9,%zmm9
+	vpsrlq	$26,%zmm7,%zmm8
+	vpsrlq	$14,%zmm6,%zmm10
+	vpsrlq	$40,%zmm6,%zmm6
+	vpandq	%zmm5,%zmm9,%zmm9
+	vpandq	%zmm5,%zmm7,%zmm7
+
+	vpaddq	%zmm2,%zmm9,%zmm2
+	subq	$192,%rdx
+	jbe	.Ltail_avx512
+	jmp	.Loop_avx512
+
+.align	32
+.Loop_avx512:
+
+	vpmuludq	%zmm2,%zmm17,%zmm14
+	vpaddq	%zmm0,%zmm7,%zmm0
+	vpmuludq	%zmm2,%zmm18,%zmm15
+	vpandq	%zmm5,%zmm8,%zmm8
+	vpmuludq	%zmm2,%zmm23,%zmm11
+	vpandq	%zmm5,%zmm10,%zmm10
+	vpmuludq	%zmm2,%zmm24,%zmm12
+	vporq	%zmm30,%zmm6,%zmm6
+	vpmuludq	%zmm2,%zmm16,%zmm13
+	vpaddq	%zmm1,%zmm8,%zmm1
+	vpaddq	%zmm3,%zmm10,%zmm3
+	vpaddq	%zmm4,%zmm6,%zmm4
+
+	vmovdqu64	0(%rsi),%zmm10
+	vmovdqu64	64(%rsi),%zmm6
+	leaq	128(%rsi),%rsi
+	vpmuludq	%zmm0,%zmm19,%zmm28
+	vpmuludq	%zmm0,%zmm20,%zmm29
+	vpmuludq	%zmm0,%zmm16,%zmm25
+	vpmuludq	%zmm0,%zmm17,%zmm26
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm26,%zmm12,%zmm12
+
+	vpmuludq	%zmm1,%zmm18,%zmm28
+	vpmuludq	%zmm1,%zmm19,%zmm29
+	vpmuludq	%zmm1,%zmm24,%zmm25
+	vpmuludq	%zmm0,%zmm18,%zmm27
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vpunpcklqdq	%zmm6,%zmm10,%zmm7
+	vpunpckhqdq	%zmm6,%zmm10,%zmm6
+
+	vpmuludq	%zmm3,%zmm16,%zmm28
+	vpmuludq	%zmm3,%zmm17,%zmm29
+	vpmuludq	%zmm1,%zmm16,%zmm26
+	vpmuludq	%zmm1,%zmm17,%zmm27
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm26,%zmm12,%zmm12
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vpmuludq	%zmm4,%zmm24,%zmm28
+	vpmuludq	%zmm4,%zmm16,%zmm29
+	vpmuludq	%zmm3,%zmm22,%zmm25
+	vpmuludq	%zmm3,%zmm23,%zmm26
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpmuludq	%zmm3,%zmm24,%zmm27
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm26,%zmm12,%zmm12
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vpmuludq	%zmm4,%zmm21,%zmm25
+	vpmuludq	%zmm4,%zmm22,%zmm26
+	vpmuludq	%zmm4,%zmm23,%zmm27
+	vpaddq	%zmm25,%zmm11,%zmm0
+	vpaddq	%zmm26,%zmm12,%zmm1
+	vpaddq	%zmm27,%zmm13,%zmm2
+
+	vpsrlq	$52,%zmm7,%zmm9
+	vpsllq	$12,%zmm6,%zmm10
+
+	vpsrlq	$26,%zmm14,%zmm3
+	vpandq	%zmm5,%zmm14,%zmm14
+	vpaddq	%zmm3,%zmm15,%zmm4
+
+	vporq	%zmm10,%zmm9,%zmm9
+
+	vpsrlq	$26,%zmm0,%zmm11
+	vpandq	%zmm5,%zmm0,%zmm0
+	vpaddq	%zmm11,%zmm1,%zmm1
+
+	vpandq	%zmm5,%zmm9,%zmm9
+
+	vpsrlq	$26,%zmm4,%zmm15
+	vpandq	%zmm5,%zmm4,%zmm4
+
+	vpsrlq	$26,%zmm1,%zmm12
+	vpandq	%zmm5,%zmm1,%zmm1
+	vpaddq	%zmm12,%zmm2,%zmm2
+
+	vpaddq	%zmm15,%zmm0,%zmm0
+	vpsllq	$2,%zmm15,%zmm15
+	vpaddq	%zmm15,%zmm0,%zmm0
+
+	vpaddq	%zmm9,%zmm2,%zmm2
+	vpsrlq	$26,%zmm7,%zmm8
+
+	vpsrlq	$26,%zmm2,%zmm13
+	vpandq	%zmm5,%zmm2,%zmm2
+	vpaddq	%zmm13,%zmm14,%zmm3
+
+	vpsrlq	$14,%zmm6,%zmm10
+
+	vpsrlq	$26,%zmm0,%zmm11
+	vpandq	%zmm5,%zmm0,%zmm0
+	vpaddq	%zmm11,%zmm1,%zmm1
+
+	vpsrlq	$40,%zmm6,%zmm6
+
+	vpsrlq	$26,%zmm3,%zmm14
+	vpandq	%zmm5,%zmm3,%zmm3
+	vpaddq	%zmm14,%zmm4,%zmm4
+
+	vpandq	%zmm5,%zmm7,%zmm7
+
+	subq	$128,%rdx
+	ja	.Loop_avx512
+
+.Ltail_avx512:
+
+	vpsrlq	$32,%zmm16,%zmm16
+	vpsrlq	$32,%zmm17,%zmm17
+	vpsrlq	$32,%zmm18,%zmm18
+	vpsrlq	$32,%zmm23,%zmm23
+	vpsrlq	$32,%zmm24,%zmm24
+	vpsrlq	$32,%zmm19,%zmm19
+	vpsrlq	$32,%zmm20,%zmm20
+	vpsrlq	$32,%zmm21,%zmm21
+	vpsrlq	$32,%zmm22,%zmm22
+
+	leaq	(%rsi,%rdx,1),%rsi
+
+	vpaddq	%zmm0,%zmm7,%zmm0
+
+	vpmuludq	%zmm2,%zmm17,%zmm14
+	vpmuludq	%zmm2,%zmm18,%zmm15
+	vpmuludq	%zmm2,%zmm23,%zmm11
+	vpandq	%zmm5,%zmm8,%zmm8
+	vpmuludq	%zmm2,%zmm24,%zmm12
+	vpandq	%zmm5,%zmm10,%zmm10
+	vpmuludq	%zmm2,%zmm16,%zmm13
+	vporq	%zmm30,%zmm6,%zmm6
+	vpaddq	%zmm1,%zmm8,%zmm1
+	vpaddq	%zmm3,%zmm10,%zmm3
+	vpaddq	%zmm4,%zmm6,%zmm4
+
+	vmovdqu	0(%rsi),%xmm7
+	vpmuludq	%zmm0,%zmm19,%zmm28
+	vpmuludq	%zmm0,%zmm20,%zmm29
+	vpmuludq	%zmm0,%zmm16,%zmm25
+	vpmuludq	%zmm0,%zmm17,%zmm26
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm26,%zmm12,%zmm12
+
+	vmovdqu	16(%rsi),%xmm8
+	vpmuludq	%zmm1,%zmm18,%zmm28
+	vpmuludq	%zmm1,%zmm19,%zmm29
+	vpmuludq	%zmm1,%zmm24,%zmm25
+	vpmuludq	%zmm0,%zmm18,%zmm27
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vinserti128	$1,32(%rsi),%ymm7,%ymm7
+	vpmuludq	%zmm3,%zmm16,%zmm28
+	vpmuludq	%zmm3,%zmm17,%zmm29
+	vpmuludq	%zmm1,%zmm16,%zmm26
+	vpmuludq	%zmm1,%zmm17,%zmm27
+	vpaddq	%zmm28,%zmm14,%zmm14
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm26,%zmm12,%zmm12
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vinserti128	$1,48(%rsi),%ymm8,%ymm8
+	vpmuludq	%zmm4,%zmm24,%zmm28
+	vpmuludq	%zmm4,%zmm16,%zmm29
+	vpmuludq	%zmm3,%zmm22,%zmm25
+	vpmuludq	%zmm3,%zmm23,%zmm26
+	vpmuludq	%zmm3,%zmm24,%zmm27
+	vpaddq	%zmm28,%zmm14,%zmm3
+	vpaddq	%zmm29,%zmm15,%zmm15
+	vpaddq	%zmm25,%zmm11,%zmm11
+	vpaddq	%zmm26,%zmm12,%zmm12
+	vpaddq	%zmm27,%zmm13,%zmm13
+
+	vpmuludq	%zmm4,%zmm21,%zmm25
+	vpmuludq	%zmm4,%zmm22,%zmm26
+	vpmuludq	%zmm4,%zmm23,%zmm27
+	vpaddq	%zmm25,%zmm11,%zmm0
+	vpaddq	%zmm26,%zmm12,%zmm1
+	vpaddq	%zmm27,%zmm13,%zmm2
+
+	movl	$1,%eax
+	vpermq	$0xb1,%zmm3,%zmm14
+	vpermq	$0xb1,%zmm15,%zmm4
+	vpermq	$0xb1,%zmm0,%zmm11
+	vpermq	$0xb1,%zmm1,%zmm12
+	vpermq	$0xb1,%zmm2,%zmm13
+	vpaddq	%zmm14,%zmm3,%zmm3
+	vpaddq	%zmm15,%zmm4,%zmm4
+	vpaddq	%zmm11,%zmm0,%zmm0
+	vpaddq	%zmm12,%zmm1,%zmm1
+	vpaddq	%zmm13,%zmm2,%zmm2
+
+	kmovw	%eax,%k3
+	vpermq	$0x2,%zmm3,%zmm14
+	vpermq	$0x2,%zmm4,%zmm15
+	vpermq	$0x2,%zmm0,%zmm11
+	vpermq	$0x2,%zmm1,%zmm12
+	vpermq	$0x2,%zmm2,%zmm13
+	vpaddq	%zmm14,%zmm3,%zmm3
+	vpaddq	%zmm15,%zmm4,%zmm4
+	vpaddq	%zmm11,%zmm0,%zmm0
+	vpaddq	%zmm12,%zmm1,%zmm1
+	vpaddq	%zmm13,%zmm2,%zmm2
+
+	vextracti64x4	$0x1,%zmm3,%ymm14
+	vextracti64x4	$0x1,%zmm4,%ymm15
+	vextracti64x4	$0x1,%zmm0,%ymm11
+	vextracti64x4	$0x1,%zmm1,%ymm12
+	vextracti64x4	$0x1,%zmm2,%ymm13
+	vpaddq	%zmm14,%zmm3,%zmm3{%k3}{z}
+	vpaddq	%zmm15,%zmm4,%zmm4{%k3}{z}
+	vpaddq	%zmm11,%zmm0,%zmm0{%k3}{z}
+	vpaddq	%zmm12,%zmm1,%zmm1{%k3}{z}
+	vpaddq	%zmm13,%zmm2,%zmm2{%k3}{z}
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpsrldq	$6,%ymm7,%ymm9
+	vpsrldq	$6,%ymm8,%ymm10
+	vpunpckhqdq	%ymm8,%ymm7,%ymm6
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpunpcklqdq	%ymm10,%ymm9,%ymm9
+	vpunpcklqdq	%ymm8,%ymm7,%ymm7
+	vpaddq	%ymm11,%ymm1,%ymm1
+
+	vpsrlq	$26,%ymm4,%ymm15
+	vpand	%ymm5,%ymm4,%ymm4
+
+	vpsrlq	$26,%ymm1,%ymm12
+	vpand	%ymm5,%ymm1,%ymm1
+	vpsrlq	$30,%ymm9,%ymm10
+	vpsrlq	$4,%ymm9,%ymm9
+	vpaddq	%ymm12,%ymm2,%ymm2
+
+	vpaddq	%ymm15,%ymm0,%ymm0
+	vpsllq	$2,%ymm15,%ymm15
+	vpsrlq	$26,%ymm7,%ymm8
+	vpsrlq	$40,%ymm6,%ymm6
+	vpaddq	%ymm15,%ymm0,%ymm0
+
+	vpsrlq	$26,%ymm2,%ymm13
+	vpand	%ymm5,%ymm2,%ymm2
+	vpand	%ymm5,%ymm9,%ymm9
+	vpand	%ymm5,%ymm7,%ymm7
+	vpaddq	%ymm13,%ymm3,%ymm3
+
+	vpsrlq	$26,%ymm0,%ymm11
+	vpand	%ymm5,%ymm0,%ymm0
+	vpaddq	%ymm2,%ymm9,%ymm2
+	vpand	%ymm5,%ymm8,%ymm8
+	vpaddq	%ymm11,%ymm1,%ymm1
+
+	vpsrlq	$26,%ymm3,%ymm14
+	vpand	%ymm5,%ymm3,%ymm3
+	vpand	%ymm5,%ymm10,%ymm10
+	vpor	32(%rcx),%ymm6,%ymm6
+	vpaddq	%ymm14,%ymm4,%ymm4
+
+	leaq	144(%rsp),%rax
+	addq	$64,%rdx
+	jnz	.Ltail_avx2_512
+
+	vpsubq	%ymm9,%ymm2,%ymm2
+	vmovd	%xmm0,-112(%rdi)
+	vmovd	%xmm1,-108(%rdi)
+	vmovd	%xmm2,-104(%rdi)
+	vmovd	%xmm3,-100(%rdi)
+	vmovd	%xmm4,-96(%rdi)
+	vzeroall
+	leaq	-8(%r10),%rsp
+
+	ret
+
+ENDPROC(poly1305_blocks_avx512)
+#endif /* CONFIG_AS_AVX512 */
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 10/20] zinc: Poly1305 MIPS32r2 and MIPS64 implementations
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (8 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 09/20] zinc: Poly1305 x86_64 implementation Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 11/20] zinc: ChaCha20Poly1305 construction and selftest Jason A. Donenfeld
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, René van Dorst, Samuel Neves,
	Andy Lutomirski, Jean-Philippe Aumasson, Andy Polyakov,
	Ralf Baechle, Paul Burton, James Hogan, linux-mips

This MIPS32r2 implementation comes from René van Dorst and me and
results in a nice speedup on the usual OpenWRT targets. The MIPS64
implementation comes from Andy Polyakov results in a nice speedup on
commodity Octeon hardware, and has been modified slightly from the
original:

- The function names have been renamed to fit kernel conventions.
- A comment has been added.

No changes have been made to the actual instructions.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: René van Dorst <opensource@vdorst.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Andy Polyakov <appro@openssl.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
---
 lib/zinc/Makefile                      |   8 +
 lib/zinc/poly1305/poly1305-mips-glue.h |  40 +++
 lib/zinc/poly1305/poly1305-mips.S      | 417 +++++++++++++++++++++++++
 lib/zinc/poly1305/poly1305-mips64.S    | 359 +++++++++++++++++++++
 4 files changed, 824 insertions(+)
 create mode 100644 lib/zinc/poly1305/poly1305-mips-glue.h
 create mode 100644 lib/zinc/poly1305/poly1305-mips.S
 create mode 100644 lib/zinc/poly1305/poly1305-mips64.S

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 72112f8ffba1..c8c49d59794b 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -37,6 +37,14 @@ ifeq ($(CONFIG_ZINC_ARCH_ARM64),y)
 zinc-y += poly1305/poly1305-arm64.o
 CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-arm-glue.h
 endif
+ifeq ($(CONFIG_ZINC_ARCH_MIPS)$(CONFIG_CPU_MIPS32_R2),yy)
+zinc-y += poly1305/poly1305-mips.o
+CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-mips-glue.h
+endif
+ifeq ($(CONFIG_ZINC_ARCH_MIPS64),y)
+zinc-y += poly1305/poly1305-mips64.o
+CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-mips-glue.h
+endif
 endif
 
 zinc-y += main.o
diff --git a/lib/zinc/poly1305/poly1305-mips-glue.h b/lib/zinc/poly1305/poly1305-mips-glue.h
new file mode 100644
index 000000000000..e29f85915eec
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-mips-glue.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/poly1305.h>
+
+asmlinkage void poly1305_init_mips(void *ctx, const u8 key[16]);
+asmlinkage void poly1305_blocks_mips(void *ctx, const u8 *inp, const size_t len,
+				     const u32 padbit);
+asmlinkage void poly1305_emit_mips(void *ctx, u8 mac[16], const u32 nonce[4]);
+void __init poly1305_fpu_init(void)
+{
+}
+
+static inline bool poly1305_init_arch(void *ctx,
+				      const u8 key[POLY1305_KEY_SIZE],
+				      simd_context_t simd_context)
+{
+	poly1305_init_mips(ctx, key);
+	return true;
+}
+
+static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
+					const size_t len, const u32 padbit,
+					simd_context_t simd_context)
+{
+	poly1305_blocks_mips(ctx, inp, len, padbit);
+	return true;
+}
+
+static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
+				      const u32 nonce[4],
+				      simd_context_t simd_context)
+{
+	poly1305_emit_mips(ctx, mac, nonce);
+	return true;
+}
+
+#define HAVE_POLY1305_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/poly1305/poly1305-mips.S b/lib/zinc/poly1305/poly1305-mips.S
new file mode 100644
index 000000000000..32d8558d8601
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-mips.S
@@ -0,0 +1,417 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2016-2018 René van Dorst <opensource@vdorst.com> All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+#define MSB 0
+#define LSB 3
+#else
+#define MSB 3
+#define LSB 0
+#endif
+
+#define POLY1305_BLOCK_SIZE 16
+.text
+#define H0 $t0
+#define H1 $t1
+#define H2 $t2
+#define H3 $t3
+#define H4 $t4
+
+#define R0 $t5
+#define R1 $t6
+#define R2 $t7
+#define R3 $t8
+
+#define O0 $s0
+#define O1 $s4
+#define O2 $v1
+#define O3 $t9
+#define O4 $s5
+
+#define S1 $s1
+#define S2 $s2
+#define S3 $s3
+
+#define SC $at
+#define CA $v0
+
+/* Input arguments */
+#define poly	$a0
+#define src	$a1
+#define srclen	$a2
+#define hibit	$a3
+
+/* Location in the opaque buffer
+ * R[0..3], CA, H[0..4]
+ */
+#define PTR_POLY1305_R(n) ( 0 + (n*4)) ## ($a0)
+#define PTR_POLY1305_CA   (16        ) ## ($a0)
+#define PTR_POLY1305_H(n) (20 + (n*4)) ## ($a0)
+
+#define POLY1305_BLOCK_SIZE 16
+#define POLY1305_STACK_SIZE 8 * 4
+
+.set reorder
+.set noat
+.align 4
+.globl poly1305_blocks_mips
+.ent poly1305_blocks_mips
+poly1305_blocks_mips:
+	.frame  $sp,POLY1305_STACK_SIZE,$31
+	/* srclen &= 0xFFFFFFF0 */
+	ins	srclen, $zero, 0, 4
+
+	.set noreorder
+	/* check srclen >= 16 bytes */
+	beqz	srclen, .Lpoly1305_blocks_mips_end
+	addiu	$sp, -(POLY1305_STACK_SIZE)
+	.set reorder
+
+	/* Calculate last round based on src address pointer.
+	 * last round src ptr (srclen) = src + (srclen & 0xFFFFFFF0)
+	 */
+	addu	srclen, src
+
+	lw	R0, PTR_POLY1305_R(0)
+	lw	R1, PTR_POLY1305_R(1)
+	lw	R2, PTR_POLY1305_R(2)
+	lw	R3, PTR_POLY1305_R(3)
+
+	/* store the used save registers. */
+	sw	$s0, 0($sp)
+	sw	$s1, 4($sp)
+	sw	$s2, 8($sp)
+	sw	$s3, 12($sp)
+	sw	$s4, 16($sp)
+	sw	$s5, 20($sp)
+
+	/* load Hx and Carry */
+	lw	CA, PTR_POLY1305_CA
+	lw	H0, PTR_POLY1305_H(0)
+	lw	H1, PTR_POLY1305_H(1)
+	lw	H2, PTR_POLY1305_H(2)
+	lw	H3, PTR_POLY1305_H(3)
+	lw	H4, PTR_POLY1305_H(4)
+
+	/* Sx = Rx + (Rx >> 2) */
+	srl	S1, R1, 2
+	srl	S2, R2, 2
+	srl	S3, R3, 2
+	addu	S1, R1
+	addu	S2, R2
+	addu	S3, R3
+
+	addiu	SC, $zero, 1
+
+.Lpoly1305_loop:
+	lwl	O0, 0+MSB(src)
+	lwl	O1, 4+MSB(src)
+	lwl	O2, 8+MSB(src)
+	lwl	O3,12+MSB(src)
+	lwr	O0, 0+LSB(src)
+	lwr	O1, 4+LSB(src)
+	lwr	O2, 8+LSB(src)
+	lwr	O3,12+LSB(src)
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+	wsbh	O0
+	wsbh	O1
+	wsbh	O2
+	wsbh	O3
+	rotr	O0, 16
+	rotr	O1, 16
+	rotr	O2, 16
+	rotr	O3, 16
+#endif
+
+	/* h0 = (u32)(d0 = (u64)h0 + inp[0] + c 'Carry_previous cycle'); */
+	addu	H0, CA
+	sltu	CA, H0, CA
+	addu	O0, H0
+	sltu	H0, O0, H0
+	addu	CA, H0
+
+	/* h1 = (u32)(d1 = (u64)h1 + (d0 >> 32) + inp[4]); */
+	addu	H1, CA
+	sltu	CA, H1, CA
+	addu	O1, H1
+	sltu	H1, O1, H1
+	addu	CA, H1
+
+	/* h2 = (u32)(d2 = (u64)h2 + (d1 >> 32) + inp[8]); */
+	addu	H2, CA
+	sltu	CA, H2, CA
+	addu	O2, H2
+	sltu	H2, O2, H2
+	addu	CA, H2
+
+	/* h3 = (u32)(d3 = (u64)h3 + (d2 >> 32) + inp[12]); */
+	addu	H3, CA
+	sltu	CA, H3, CA
+	addu	O3, H3
+	sltu	H3, O3, H3
+	addu	CA, H3
+
+	/* h4 += (u32)(d3 >> 32) + padbit; */
+	addu	H4, hibit
+	addu	O4, H4, CA
+
+	/* D0 */
+	multu	O0, R0
+	maddu	O1, S3
+	maddu	O2, S2
+	maddu	O3, S1
+	mfhi	CA
+	mflo	H0
+
+	/* D1 */
+	multu	O0, R1
+	maddu	O1, R0
+	maddu	O2, S3
+	maddu	O3, S2
+	maddu	O4, S1
+	maddu	CA, SC
+	mfhi	CA
+	mflo	H1
+
+	/* D2 */
+	multu	O0, R2
+	maddu	O1, R1
+	maddu	O2, R0
+	maddu	O3, S3
+	maddu	O4, S2
+	maddu	CA, SC
+	mfhi	CA
+	mflo	H2
+
+	/* D4 */
+	mul	H4, O4, R0
+
+	/* D3 */
+	multu	O0, R3
+	maddu	O1, R2
+	maddu	O2, R1
+	maddu	O3, R0
+	maddu	O4, S3
+	maddu	CA, SC
+	mfhi	CA
+	mflo	H3
+
+	addiu	src, POLY1305_BLOCK_SIZE
+
+	/* h4 += (u32)(d3 >> 32); */
+	addu	O4, H4, CA
+	/* h4 &= 3 */
+	andi	H4, O4, 3
+	/* c = (h4 >> 2) + (h4 & ~3U); */
+	srl	CA, O4, 2
+	ins	O4, $zero, 0, 2
+
+	/* able to do a 16 byte block. */
+	.set noreorder
+	bne	src, srclen, .Lpoly1305_loop
+	/* Delay slot is always executed. */
+	addu	CA, O4
+	.set reorder
+
+	/* restore the used save registers. */
+	lw	$s0, 0($sp)
+	lw	$s1, 4($sp)
+	lw	$s2, 8($sp)
+	lw	$s3, 12($sp)
+	lw	$s4, 16($sp)
+	lw	$s5, 20($sp)
+
+	/* store Hx and Carry */
+	sw	CA, PTR_POLY1305_CA
+	sw	H0, PTR_POLY1305_H(0)
+	sw	H1, PTR_POLY1305_H(1)
+	sw	H2, PTR_POLY1305_H(2)
+	sw	H3, PTR_POLY1305_H(3)
+	sw	H4, PTR_POLY1305_H(4)
+
+.Lpoly1305_blocks_mips_end:
+	/* Jump Back */
+	.set noreorder
+	jr	$ra
+	addiu	$sp, POLY1305_STACK_SIZE
+	.set reorder
+.end poly1305_blocks_mips
+.set at
+.set reorder
+
+/* Input arguments CTX=$a0, MAC=$a1, NONCE=$a2 */
+#define MAC	$a1
+#define NONCE	$a2
+
+#define G0	$t5
+#define G1	$t6
+#define G2	$t7
+#define G3	$t8
+#define G4	$t9
+
+.set reorder
+.set noat
+.align 4
+.globl poly1305_emit_mips
+.ent poly1305_emit_mips
+poly1305_emit_mips:
+	/* load Hx and Carry */
+	lw	CA, PTR_POLY1305_CA
+	lw	H0, PTR_POLY1305_H(0)
+	lw	H1, PTR_POLY1305_H(1)
+	lw	H2, PTR_POLY1305_H(2)
+	lw	H3, PTR_POLY1305_H(3)
+	lw	H4, PTR_POLY1305_H(4)
+
+	/* Add left over carry */
+	addu	H0, CA
+	sltu	CA, H0, CA
+	addu	H1, CA
+	sltu	CA, H1, CA
+	addu	H2, CA
+	sltu	CA, H2, CA
+	addu	H3, CA
+	sltu	CA, H3, CA
+	addu	H4, CA
+
+	/* compare to modulus by computing h + -p */
+	addiu	G0, H0, 5
+	sltu	CA, G0, H0
+	addu	G1, H1, CA
+	sltu	CA, G1, H1
+	addu	G2, H2, CA
+	sltu	CA, G2, H2
+	addu	G3, H3, CA
+	sltu	CA, G3, H3
+	addu	G4, H4, CA
+
+	srl	SC, G4, 2
+
+	/* if there was carry into 131st bit, h3:h0 = g3:g0 */
+	movn	H0, G0, SC
+	movn	H1, G1, SC
+	movn	H2, G2, SC
+	movn	H3, G3, SC
+
+	lwl	G0, 0+MSB(NONCE)
+	lwl	G1, 4+MSB(NONCE)
+	lwl	G2, 8+MSB(NONCE)
+	lwl	G3,12+MSB(NONCE)
+	lwr	G0, 0+LSB(NONCE)
+	lwr	G1, 4+LSB(NONCE)
+	lwr	G2, 8+LSB(NONCE)
+	lwr	G3,12+LSB(NONCE)
+
+	/* mac = (h + nonce) % (2^128) */
+	addu	H0, G0
+	sltu	CA, H0, G0
+
+	/* H1 */
+	addu	H1, CA
+	sltu	CA, H1, CA
+	addu	H1, G1
+	sltu	G1, H1, G1
+	addu	CA, G1
+
+	/* H2 */
+	addu	H2, CA
+	sltu	CA, H2, CA
+	addu	H2, G2
+	sltu	G2, H2, G2
+	addu	CA, G2
+
+	/* H3 */
+	addu	H3, CA
+	addu	H3, G3
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+	wsbh	H0
+	wsbh	H1
+	wsbh	H2
+	wsbh	H3
+	rotr	H0, 16
+	rotr	H1, 16
+	rotr	H2, 16
+	rotr	H3, 16
+#endif
+
+	/* store MAC */
+	swl	H0, 0+MSB(MAC)
+	swl	H1, 4+MSB(MAC)
+	swl	H2, 8+MSB(MAC)
+	swl	H3,12+MSB(MAC)
+	swr	H0, 0+LSB(MAC)
+	swr	H1, 4+LSB(MAC)
+	swr	H2, 8+LSB(MAC)
+	.set noreorder
+	jr	$ra
+	swr	H3,12+LSB(MAC)
+	.set reorder
+.end poly1305_emit_mips
+
+#define PR0 $t0
+#define PR1 $t1
+#define PR2 $t2
+#define PR3 $t3
+#define PT0 $t4
+
+/* Input arguments CTX=$a0, KEY=$a1 */
+
+.align 4
+.globl poly1305_init_mips
+.ent poly1305_init_mips
+poly1305_init_mips:
+	lwl	PR0, 0+MSB($a1)
+	lwl	PR1, 4+MSB($a1)
+	lwl	PR2, 8+MSB($a1)
+	lwl	PR3,12+MSB($a1)
+	lwr	PR0, 0+LSB($a1)
+	lwr	PR1, 4+LSB($a1)
+	lwr	PR2, 8+LSB($a1)
+	lwr	PR3,12+LSB($a1)
+
+	/* store Hx and Carry */
+	sw	$zero, PTR_POLY1305_CA
+	sw	$zero, PTR_POLY1305_H(0)
+	sw	$zero, PTR_POLY1305_H(1)
+	sw	$zero, PTR_POLY1305_H(2)
+	sw	$zero, PTR_POLY1305_H(3)
+	sw	$zero, PTR_POLY1305_H(4)
+
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+	wsbh	PR0
+	wsbh	PR1
+	wsbh	PR2
+	wsbh	PR3
+	rotr	PR0, 16
+	rotr	PR1, 16
+	rotr	PR2, 16
+	rotr	PR3, 16
+#endif
+
+	lui	PT0, 0x0FFF
+	ori	PT0, 0xFFFC
+
+	/* AND 0x0fffffff; */
+	ext	PR0, PR0, 0, (32-4)
+
+	/* AND 0x0ffffffc; */
+	and	PR1, PT0
+	and	PR2, PT0
+	and	PR3, PT0
+
+	/* store Rx */
+	sw	PR0, PTR_POLY1305_R(0)
+	sw	PR1, PTR_POLY1305_R(1)
+	sw	PR2, PTR_POLY1305_R(2)
+
+	.set noreorder
+	/* Jump Back  */
+	jr	$ra
+	sw	PR3, PTR_POLY1305_R(3)
+	.set reorder
+.end poly1305_init_mips
diff --git a/lib/zinc/poly1305/poly1305-mips64.S b/lib/zinc/poly1305/poly1305-mips64.S
new file mode 100644
index 000000000000..66b0aa381570
--- /dev/null
+++ b/lib/zinc/poly1305/poly1305-mips64.S
@@ -0,0 +1,359 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
+ *
+ * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
+ */
+
+#if !defined(CONFIG_64BIT)
+#error "This is only for 64-bit kernels."
+#endif
+
+#ifdef __MIPSEB__
+#define MSB 0
+#define LSB 7
+#else
+#define MSB 7
+#define LSB 0
+#endif
+
+#if defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+#define dmultu(rs,rt)
+#define mflo(rd,rs,rt)	dmulu	rd,rs,rt
+#define mfhi(rd,rs,rt)	dmuhu	rd,rs,rt
+#else
+#define dmultu(rs,rt)		dmultu	rs,rt
+#define multu(rs,rt)		multu	rs,rt
+#define mflo(rd,rs,rt)	mflo	rd
+#define mfhi(rd,rs,rt)	mfhi	rd
+#endif
+
+.text
+.set	noat
+.set	noreorder
+
+/* While most of the assembly in the kernel prefers ENTRY() and ENDPROC(),
+ * there is no existing MIPS assembly that uses it, and MIPS assembler seems
+ * to like its own .ent/.end notation, which the MIPS include files don't
+ * provide in a MIPS-specific ENTRY/ENDPROC definition. So, we skip these
+ * for now, until somebody complains. */
+
+.align	5
+.globl	poly1305_init_mips
+.ent	poly1305_init_mips
+poly1305_init_mips:
+	.frame	$29,0,$31
+	.set	reorder
+
+	sd	$0,0($4)
+	sd	$0,8($4)
+	sd	$0,16($4)
+
+	beqz	$5,.Lno_key
+
+#if defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+	ld	$8,0($5)
+	ld	$9,8($5)
+#else
+	ldl	$8,0+MSB($5)
+	ldl	$9,8+MSB($5)
+	ldr	$8,0+LSB($5)
+	ldr	$9,8+LSB($5)
+#endif
+#ifdef	__MIPSEB__
+#if defined(CONFIG_CPU_MIPS64_R2) || defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+	dsbh	$8,$8		# byte swap
+	 dsbh	$9,$9
+	dshd	$8,$8
+	 dshd	$9,$9
+#else
+	ori	$10,$0,0xFF
+	dsll	$1,$10,32
+	or	$10,$1		# 0x000000FF000000FF
+
+	and	$11,$8,$10	# byte swap
+	 and	$2,$9,$10
+	dsrl	$1,$8,24
+	 dsrl	$24,$9,24
+	dsll	$11,24
+	 dsll	$2,24
+	and	$1,$10
+	 and	$24,$10
+	dsll	$10,8			# 0x0000FF000000FF00
+	or	$11,$1
+	 or	$2,$24
+	and	$1,$8,$10
+	 and	$24,$9,$10
+	dsrl	$8,8
+	 dsrl	$9,8
+	dsll	$1,8
+	 dsll	$24,8
+	and	$8,$10
+	 and	$9,$10
+	or	$11,$1
+	 or	$2,$24
+	or	$8,$11
+	 or	$9,$2
+	dsrl	$11,$8,32
+	 dsrl	$2,$9,32
+	dsll	$8,32
+	 dsll	$9,32
+	or	$8,$11
+	 or	$9,$2
+#endif
+#endif
+	li	$10,1
+	dsll	$10,32
+	daddiu	$10,-63
+	dsll	$10,28
+	daddiu	$10,-1		# 0ffffffc0fffffff
+
+	and	$8,$10
+	daddiu	$10,-3		# 0ffffffc0ffffffc
+	and	$9,$10
+
+	sd	$8,24($4)
+	dsrl	$10,$9,2
+	sd	$9,32($4)
+	daddu	$10,$9		# s1 = r1 + (r1 >> 2)
+	sd	$10,40($4)
+
+.Lno_key:
+	li	$2,0			# return 0
+	jr	$31
+.end	poly1305_init_mips
+
+.align	5
+.globl	poly1305_blocks_mips
+.ent	poly1305_blocks_mips
+poly1305_blocks_mips:
+	.set	noreorder
+	dsrl	$6,4			# number of complete blocks
+	bnez	$6,poly1305_blocks_internal
+	nop
+	jr	$31
+	nop
+.end	poly1305_blocks_mips
+
+.align	5
+.ent	poly1305_blocks_internal
+poly1305_blocks_internal:
+	.frame	$29,6*8,$31
+	.mask	0x00030000,-8
+	.set	noreorder
+	dsubu	$29,6*8
+	sd	$17,40($29)
+	sd	$16,32($29)
+	.set	reorder
+
+	ld	$12,0($4)		# load hash value
+	ld	$13,8($4)
+	ld	$14,16($4)
+
+	ld	$15,24($4)		# load key
+	ld	$16,32($4)
+	ld	$17,40($4)
+
+.Loop:
+#if defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+	ld	$8,0($5)		# load input
+	ld	$9,8($5)
+#else
+	ldl	$8,0+MSB($5)	# load input
+	ldl	$9,8+MSB($5)
+	ldr	$8,0+LSB($5)
+	ldr	$9,8+LSB($5)
+#endif
+	daddiu	$6,-1
+	daddiu	$5,16
+#ifdef	__MIPSEB__
+#if defined(CONFIG_CPU_MIPS64_R2) || defined(CONFIG_CPU_MIPSR2) || defined(CONFIG_CPU_MIPS64_R6) || defined(CONFIG_CPU_MIPSR6)
+	dsbh	$8,$8		# byte swap
+	 dsbh	$9,$9
+	dshd	$8,$8
+	 dshd	$9,$9
+#else
+	ori	$10,$0,0xFF
+	dsll	$1,$10,32
+	or	$10,$1		# 0x000000FF000000FF
+
+	and	$11,$8,$10	# byte swap
+	 and	$2,$9,$10
+	dsrl	$1,$8,24
+	 dsrl	$24,$9,24
+	dsll	$11,24
+	 dsll	$2,24
+	and	$1,$10
+	 and	$24,$10
+	dsll	$10,8			# 0x0000FF000000FF00
+	or	$11,$1
+	 or	$2,$24
+	and	$1,$8,$10
+	 and	$24,$9,$10
+	dsrl	$8,8
+	 dsrl	$9,8
+	dsll	$1,8
+	 dsll	$24,8
+	and	$8,$10
+	 and	$9,$10
+	or	$11,$1
+	 or	$2,$24
+	or	$8,$11
+	 or	$9,$2
+	dsrl	$11,$8,32
+	 dsrl	$2,$9,32
+	dsll	$8,32
+	 dsll	$9,32
+	or	$8,$11
+	 or	$9,$2
+#endif
+#endif
+	daddu	$12,$8		# accumulate input
+	daddu	$13,$9
+	sltu	$10,$12,$8
+	sltu	$11,$13,$9
+	daddu	$13,$10
+
+	dmultu	($15,$12)		# h0*r0
+	 daddu	$14,$7
+	 sltu	$10,$13,$10
+	mflo	($8,$15,$12)
+	mfhi	($9,$15,$12)
+
+	dmultu	($17,$13)		# h1*5*r1
+	 daddu	$10,$11
+	 daddu	$14,$10
+	mflo	($10,$17,$13)
+	mfhi	($11,$17,$13)
+
+	dmultu	($16,$12)		# h0*r1
+	 daddu	$8,$10
+	 daddu	$9,$11
+	mflo	($1,$16,$12)
+	mfhi	($25,$16,$12)
+	 sltu	$10,$8,$10
+	 daddu	$9,$10
+
+	dmultu	($15,$13)		# h1*r0
+	 daddu	$9,$1
+	 sltu	$1,$9,$1
+	mflo	($10,$15,$13)
+	mfhi	($11,$15,$13)
+	 daddu	$25,$1
+
+	dmultu	($17,$14)		# h2*5*r1
+	 daddu	$9,$10
+	 daddu	$25,$11
+	mflo	($1,$17,$14)
+
+	dmultu	($15,$14)		# h2*r0
+	 sltu	$10,$9,$10
+	 daddu	$25,$10
+	mflo	($2,$15,$14)
+
+	daddu	$9,$1
+	daddu	$25,$2
+	sltu	$1,$9,$1
+	daddu	$25,$1
+
+	li	$10,-4		# final reduction
+	and	$10,$25
+	dsrl	$11,$25,2
+	andi	$14,$25,3
+	daddu	$10,$11
+	daddu	$12,$8,$10
+	sltu	$10,$12,$10
+	daddu	$13,$9,$10
+	sltu	$10,$13,$10
+	daddu	$14,$14,$10
+
+	bnez	$6,.Loop
+
+	sd	$12,0($4)		# store hash value
+	sd	$13,8($4)
+	sd	$14,16($4)
+
+	.set	noreorder
+	ld	$17,40($29)		# epilogue
+	ld	$16,32($29)
+	jr	$31
+	daddu	$29,6*8
+.end	poly1305_blocks_internal
+
+.align	5
+.globl	poly1305_emit_mips
+.ent	poly1305_emit_mips
+poly1305_emit_mips:
+	.frame	$29,0,$31
+	.set	reorder
+
+	ld	$10,0($4)
+	ld	$11,8($4)
+	ld	$1,16($4)
+
+	daddiu	$8,$10,5		# compare to modulus
+	sltiu	$2,$8,5
+	daddu	$9,$11,$2
+	sltu	$2,$9,$2
+	daddu	$1,$1,$2
+
+	dsrl	$1,2			# see if it carried/borrowed
+	dsubu	$1,$0,$1
+	nor	$2,$0,$1
+
+	and	$8,$1
+	and	$10,$2
+	and	$9,$1
+	and	$11,$2
+	or	$8,$10
+	or	$9,$11
+
+	lwu	$10,0($6)		# load nonce
+	lwu	$11,4($6)
+	lwu	$1,8($6)
+	lwu	$2,12($6)
+	dsll	$11,32
+	dsll	$2,32
+	or	$10,$11
+	or	$1,$2
+
+	daddu	$8,$10		# accumulate nonce
+	daddu	$9,$1
+	sltu	$10,$8,$10
+	daddu	$9,$10
+
+	dsrl	$10,$8,8		# write mac value
+	dsrl	$11,$8,16
+	dsrl	$1,$8,24
+	sb	$8,0($5)
+	dsrl	$2,$8,32
+	sb	$10,1($5)
+	dsrl	$10,$8,40
+	sb	$11,2($5)
+	dsrl	$11,$8,48
+	sb	$1,3($5)
+	dsrl	$1,$8,56
+	sb	$2,4($5)
+	dsrl	$2,$9,8
+	sb	$10,5($5)
+	dsrl	$10,$9,16
+	sb	$11,6($5)
+	dsrl	$11,$9,24
+	sb	$1,7($5)
+
+	sb	$9,8($5)
+	dsrl	$1,$9,32
+	sb	$2,9($5)
+	dsrl	$2,$9,40
+	sb	$10,10($5)
+	dsrl	$10,$9,48
+	sb	$11,11($5)
+	dsrl	$11,$9,56
+	sb	$1,12($5)
+	sb	$2,13($5)
+	sb	$10,14($5)
+	sb	$11,15($5)
+
+	jr	$31
+.end	poly1305_emit_mips
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 11/20] zinc: ChaCha20Poly1305 construction and selftest
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (9 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 10/20] zinc: Poly1305 MIPS32r2 and MIPS64 implementations Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 12/20] zinc: BLAKE2s generic C implementation " Jason A. Donenfeld
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson

This is an implementation of the ChaCha20Poly1305 AEAD, with an easy API
for encrypting either contiguous buffers or scatter gather lists (such
as those created from skb_to_sgvec).

Information: https://tools.ietf.org/html/rfc8439

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
 include/zinc/chacha20poly1305.h      |   54 +
 lib/zinc/Kconfig                     |    7 +
 lib/zinc/Makefile                    |    4 +
 lib/zinc/chacha20poly1305.c          |  333 ++
 lib/zinc/main.c                      |    4 +
 lib/zinc/selftest/chacha20poly1305.h | 7852 ++++++++++++++++++++++++++
 6 files changed, 8254 insertions(+)
 create mode 100644 include/zinc/chacha20poly1305.h
 create mode 100644 lib/zinc/chacha20poly1305.c
 create mode 100644 lib/zinc/selftest/chacha20poly1305.h

diff --git a/include/zinc/chacha20poly1305.h b/include/zinc/chacha20poly1305.h
new file mode 100644
index 000000000000..b607c76f665d
--- /dev/null
+++ b/include/zinc/chacha20poly1305.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_CHACHA20POLY1305_H
+#define _ZINC_CHACHA20POLY1305_H
+
+#include <linux/simd.h>
+#include <linux/types.h>
+
+struct scatterlist;
+
+enum chacha20poly1305_lengths {
+	XCHACHA20POLY1305_NONCELEN = 24,
+	CHACHA20POLY1305_KEYLEN = 32,
+	CHACHA20POLY1305_AUTHTAGLEN = 16
+};
+
+void chacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+			      const u8 *ad, const size_t ad_len,
+			      const u64 nonce,
+			      const u8 key[CHACHA20POLY1305_KEYLEN]);
+
+bool __must_check chacha20poly1305_encrypt_sg(
+	struct scatterlist *dst, struct scatterlist *src, const size_t src_len,
+	const u8 *ad, const size_t ad_len, const u64 nonce,
+	const u8 key[CHACHA20POLY1305_KEYLEN], simd_context_t simd_context);
+
+bool __must_check
+chacha20poly1305_decrypt(u8 *dst, const u8 *src, const size_t src_len,
+			 const u8 *ad, const size_t ad_len, const u64 nonce,
+			 const u8 key[CHACHA20POLY1305_KEYLEN]);
+
+bool __must_check chacha20poly1305_decrypt_sg(
+	struct scatterlist *dst, struct scatterlist *src, const size_t src_len,
+	const u8 *ad, const size_t ad_len, const u64 nonce,
+	const u8 key[CHACHA20POLY1305_KEYLEN], simd_context_t simd_context);
+
+void xchacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+			       const u8 *ad, const size_t ad_len,
+			       const u8 nonce[XCHACHA20POLY1305_NONCELEN],
+			       const u8 key[CHACHA20POLY1305_KEYLEN]);
+
+bool __must_check xchacha20poly1305_decrypt(
+	u8 *dst, const u8 *src, const size_t src_len, const u8 *ad,
+	const size_t ad_len, const u8 nonce[XCHACHA20POLY1305_NONCELEN],
+	const u8 key[CHACHA20POLY1305_KEYLEN]);
+
+#ifdef DEBUG
+bool chacha20poly1305_selftest(void);
+#endif
+
+#endif /* _ZINC_CHACHA20POLY1305_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index bc8c61334362..98d095ed2418 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -10,6 +10,13 @@ config ZINC_POLY1305
 	bool
 	select ZINC
 
+config ZINC_CHACHA20POLY1305
+	bool
+	select ZINC
+	select ZINC_CHACHA20
+	select ZINC_POLY1305
+	select CRYPTO_BLKCIPHER
+
 config ZINC_DEBUG
 	bool "Zinc cryptography library debugging and self-tests"
 	depends on ZINC
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index c8c49d59794b..1664871aaf71 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -47,6 +47,10 @@ CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-mips-glue.h
 endif
 endif
 
+ifeq ($(CONFIG_ZINC_CHACHA20POLY1305),y)
+zinc-y += chacha20poly1305.o
+endif
+
 zinc-y += main.o
 
 obj-$(CONFIG_ZINC) := zinc.o
diff --git a/lib/zinc/chacha20poly1305.c b/lib/zinc/chacha20poly1305.c
new file mode 100644
index 000000000000..4b00c8356b02
--- /dev/null
+++ b/lib/zinc/chacha20poly1305.c
@@ -0,0 +1,333 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is an implementation of the ChaCha20Poly1305 AEAD construction.
+ *
+ * Information: https://tools.ietf.org/html/rfc8439
+ */
+
+#include <zinc/chacha20poly1305.h>
+#include <zinc/chacha20.h>
+#include <zinc/poly1305.h>
+#include <asm/unaligned.h>
+#include <linux/kernel.h>
+#include <crypto/scatterwalk.h>
+
+static const u8 pad0[16] = { 0 };
+
+static struct crypto_alg chacha20_alg = {
+	.cra_blocksize = 1,
+	.cra_alignmask = sizeof(u32) - 1
+};
+static struct crypto_blkcipher chacha20_cipher = {
+	.base = {
+		.__crt_alg = &chacha20_alg
+	}
+};
+static struct blkcipher_desc chacha20_desc = {
+	.tfm = &chacha20_cipher
+};
+
+static inline void
+__chacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+			   const u8 *ad, const size_t ad_len, const u64 nonce,
+			   const u8 key[CHACHA20POLY1305_KEYLEN],
+			   simd_context_t simd_context)
+{
+	struct poly1305_ctx poly1305_state;
+	struct chacha20_ctx chacha20_state;
+	union {
+		u8 block0[POLY1305_KEY_SIZE];
+		__le64 lens[2];
+	} b = { { 0 } };
+
+	chacha20_init(&chacha20_state, key, nonce);
+	chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+		 simd_context);
+	poly1305_init(&poly1305_state, b.block0, simd_context);
+
+	poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+	poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+			simd_context);
+
+	chacha20(&chacha20_state, dst, src, src_len, simd_context);
+
+	poly1305_update(&poly1305_state, dst, src_len, simd_context);
+	poly1305_update(&poly1305_state, pad0, (0x10 - src_len) & 0xf,
+			simd_context);
+
+	b.lens[0] = cpu_to_le64(ad_len);
+	b.lens[1] = cpu_to_le64(src_len);
+	poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+			simd_context);
+
+	poly1305_final(&poly1305_state, dst + src_len, simd_context);
+
+	memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+	memzero_explicit(&b, sizeof(b));
+}
+
+void chacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+			      const u8 *ad, const size_t ad_len,
+			      const u64 nonce,
+			      const u8 key[CHACHA20POLY1305_KEYLEN])
+{
+	simd_context_t simd_context;
+
+	simd_context = simd_get();
+	__chacha20poly1305_encrypt(dst, src, src_len, ad, ad_len, nonce, key,
+				   simd_context);
+	simd_put(simd_context);
+}
+EXPORT_SYMBOL(chacha20poly1305_encrypt);
+
+bool chacha20poly1305_encrypt_sg(struct scatterlist *dst,
+				 struct scatterlist *src, const size_t src_len,
+				 const u8 *ad, const size_t ad_len,
+				 const u64 nonce,
+				 const u8 key[CHACHA20POLY1305_KEYLEN],
+				 simd_context_t simd_context)
+{
+	struct poly1305_ctx poly1305_state;
+	struct chacha20_ctx chacha20_state;
+	int ret = 0;
+	struct blkcipher_walk walk;
+	union {
+		u8 block0[POLY1305_KEY_SIZE];
+		u8 mac[POLY1305_MAC_SIZE];
+		__le64 lens[2];
+	} b = { { 0 } };
+
+	chacha20_init(&chacha20_state, key, nonce);
+	chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+		 simd_context);
+	poly1305_init(&poly1305_state, b.block0, simd_context);
+
+	poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+	poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+			simd_context);
+
+	if (likely(src_len)) {
+		blkcipher_walk_init(&walk, dst, src, src_len);
+		ret = blkcipher_walk_virt_block(&chacha20_desc, &walk,
+						CHACHA20_BLOCK_SIZE);
+		while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
+			size_t chunk_len =
+				rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE);
+
+			chacha20(&chacha20_state, walk.dst.virt.addr,
+				 walk.src.virt.addr, chunk_len, simd_context);
+			poly1305_update(&poly1305_state, walk.dst.virt.addr,
+					chunk_len, simd_context);
+			ret = blkcipher_walk_done(&chacha20_desc, &walk,
+					walk.nbytes % CHACHA20_BLOCK_SIZE);
+		}
+		if (walk.nbytes) {
+			chacha20(&chacha20_state, walk.dst.virt.addr,
+				 walk.src.virt.addr, walk.nbytes, simd_context);
+			poly1305_update(&poly1305_state, walk.dst.virt.addr,
+					walk.nbytes, simd_context);
+			ret = blkcipher_walk_done(&chacha20_desc, &walk, 0);
+		}
+	}
+	if (unlikely(ret))
+		goto err;
+
+	poly1305_update(&poly1305_state, pad0, (0x10 - src_len) & 0xf,
+			simd_context);
+
+	b.lens[0] = cpu_to_le64(ad_len);
+	b.lens[1] = cpu_to_le64(src_len);
+	poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+			simd_context);
+
+	poly1305_final(&poly1305_state, b.mac, simd_context);
+	scatterwalk_map_and_copy(b.mac, dst, src_len, sizeof(b.mac), 1);
+err:
+	memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+	memzero_explicit(&b, sizeof(b));
+	return !ret;
+}
+EXPORT_SYMBOL(chacha20poly1305_encrypt_sg);
+
+static inline bool
+__chacha20poly1305_decrypt(u8 *dst, const u8 *src, const size_t src_len,
+			   const u8 *ad, const size_t ad_len, const u64 nonce,
+			   const u8 key[CHACHA20POLY1305_KEYLEN],
+			   simd_context_t simd_context)
+{
+	struct poly1305_ctx poly1305_state;
+	struct chacha20_ctx chacha20_state;
+	int ret;
+	size_t dst_len;
+	union {
+		u8 block0[POLY1305_KEY_SIZE];
+		u8 mac[POLY1305_MAC_SIZE];
+		__le64 lens[2];
+	} b = { { 0 } };
+
+	if (unlikely(src_len < POLY1305_MAC_SIZE))
+		return false;
+
+	chacha20_init(&chacha20_state, key, nonce);
+	chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+		 simd_context);
+	poly1305_init(&poly1305_state, b.block0, simd_context);
+
+	poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+	poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+			simd_context);
+
+	dst_len = src_len - POLY1305_MAC_SIZE;
+	poly1305_update(&poly1305_state, src, dst_len, simd_context);
+	poly1305_update(&poly1305_state, pad0, (0x10 - dst_len) & 0xf,
+			simd_context);
+
+	b.lens[0] = cpu_to_le64(ad_len);
+	b.lens[1] = cpu_to_le64(dst_len);
+	poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+			simd_context);
+
+	poly1305_final(&poly1305_state, b.mac, simd_context);
+
+	ret = crypto_memneq(b.mac, src + dst_len, POLY1305_MAC_SIZE);
+	if (likely(!ret))
+		chacha20(&chacha20_state, dst, src, dst_len, simd_context);
+
+	memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+	memzero_explicit(&b, sizeof(b));
+
+	return !ret;
+}
+
+bool chacha20poly1305_decrypt(u8 *dst, const u8 *src, const size_t src_len,
+			      const u8 *ad, const size_t ad_len,
+			      const u64 nonce,
+			      const u8 key[CHACHA20POLY1305_KEYLEN])
+{
+	simd_context_t simd_context, ret;
+
+	simd_context = simd_get();
+	ret = __chacha20poly1305_decrypt(dst, src, src_len, ad, ad_len, nonce,
+					 key, simd_context);
+	simd_put(simd_context);
+	return ret;
+}
+EXPORT_SYMBOL(chacha20poly1305_decrypt);
+
+bool chacha20poly1305_decrypt_sg(struct scatterlist *dst,
+				 struct scatterlist *src, const size_t src_len,
+				 const u8 *ad, const size_t ad_len,
+				 const u64 nonce,
+				 const u8 key[CHACHA20POLY1305_KEYLEN],
+				 simd_context_t simd_context)
+{
+	struct poly1305_ctx poly1305_state;
+	struct chacha20_ctx chacha20_state;
+	struct blkcipher_walk walk;
+	int ret = 0;
+	size_t dst_len;
+	union {
+		u8 block0[POLY1305_KEY_SIZE];
+		struct {
+			u8 read_mac[POLY1305_MAC_SIZE];
+			u8 computed_mac[POLY1305_MAC_SIZE];
+		};
+		__le64 lens[2];
+	} b = { { 0 } };
+
+	if (unlikely(src_len < POLY1305_MAC_SIZE))
+		return false;
+
+	chacha20_init(&chacha20_state, key, nonce);
+	chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+		 simd_context);
+	poly1305_init(&poly1305_state, b.block0, simd_context);
+
+	poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+	poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+			simd_context);
+
+	dst_len = src_len - POLY1305_MAC_SIZE;
+	if (likely(dst_len)) {
+		blkcipher_walk_init(&walk, dst, src, dst_len);
+		ret = blkcipher_walk_virt_block(&chacha20_desc, &walk,
+						CHACHA20_BLOCK_SIZE);
+		while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
+			size_t chunk_len =
+				rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE);
+
+			poly1305_update(&poly1305_state, walk.src.virt.addr,
+					chunk_len, simd_context);
+			chacha20(&chacha20_state, walk.dst.virt.addr,
+				 walk.src.virt.addr, chunk_len, simd_context);
+			ret = blkcipher_walk_done(&chacha20_desc, &walk,
+					walk.nbytes % CHACHA20_BLOCK_SIZE);
+		}
+		if (walk.nbytes) {
+			poly1305_update(&poly1305_state, walk.src.virt.addr,
+					walk.nbytes, simd_context);
+			chacha20(&chacha20_state, walk.dst.virt.addr,
+				 walk.src.virt.addr, walk.nbytes, simd_context);
+			ret = blkcipher_walk_done(&chacha20_desc, &walk, 0);
+		}
+	}
+	if (unlikely(ret))
+		goto err;
+
+	poly1305_update(&poly1305_state, pad0, (0x10 - dst_len) & 0xf,
+			simd_context);
+
+	b.lens[0] = cpu_to_le64(ad_len);
+	b.lens[1] = cpu_to_le64(dst_len);
+	poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+			simd_context);
+
+	poly1305_final(&poly1305_state, b.computed_mac, simd_context);
+
+	scatterwalk_map_and_copy(b.read_mac, src, dst_len, POLY1305_MAC_SIZE, 0);
+	ret = crypto_memneq(b.read_mac, b.computed_mac, POLY1305_MAC_SIZE);
+err:
+	memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+	memzero_explicit(&b, sizeof(b));
+	return !ret;
+}
+EXPORT_SYMBOL(chacha20poly1305_decrypt_sg);
+
+void xchacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+			       const u8 *ad, const size_t ad_len,
+			       const u8 nonce[XCHACHA20POLY1305_NONCELEN],
+			       const u8 key[CHACHA20POLY1305_KEYLEN])
+{
+	simd_context_t simd_context = simd_get();
+	u8 derived_key[CHACHA20POLY1305_KEYLEN] __aligned(16);
+
+	hchacha20(derived_key, nonce, key, simd_context);
+	__chacha20poly1305_encrypt(dst, src, src_len, ad, ad_len,
+				   get_unaligned_le64(nonce + 16),
+				   derived_key, simd_context);
+	memzero_explicit(derived_key, CHACHA20POLY1305_KEYLEN);
+	simd_put(simd_context);
+}
+EXPORT_SYMBOL(xchacha20poly1305_encrypt);
+
+bool xchacha20poly1305_decrypt(u8 *dst, const u8 *src, const size_t src_len,
+			       const u8 *ad, const size_t ad_len,
+			       const u8 nonce[XCHACHA20POLY1305_NONCELEN],
+			       const u8 key[CHACHA20POLY1305_KEYLEN])
+{
+	bool ret, simd_context = simd_get();
+	u8 derived_key[CHACHA20POLY1305_KEYLEN] __aligned(16);
+
+	hchacha20(derived_key, nonce, key, simd_context);
+	ret = __chacha20poly1305_decrypt(dst, src, src_len, ad, ad_len,
+					 get_unaligned_le64(nonce + 16),
+					 derived_key, simd_context);
+	memzero_explicit(derived_key, CHACHA20POLY1305_KEYLEN);
+	simd_put(simd_context);
+	return ret;
+}
+EXPORT_SYMBOL(xchacha20poly1305_decrypt);
+
+#include "selftest/chacha20poly1305.h"
diff --git a/lib/zinc/main.c b/lib/zinc/main.c
index d871dd406a5c..a15b27fd0e4c 100644
--- a/lib/zinc/main.c
+++ b/lib/zinc/main.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
  */
 
+#include <zinc/chacha20poly1305.h>
 #include <zinc/chacha20.h>
 #include <zinc/poly1305.h>
 
@@ -26,6 +27,9 @@ static int __init mod_init(void)
 #ifdef CONFIG_ZINC_POLY1305
 	poly1305_fpu_init();
 	selftest(poly1305);
+#endif
+#ifdef CONFIG_ZINC_CHACHA20POLY1305
+	selftest(chacha20poly1305);
 #endif
 	return 0;
 }
diff --git a/lib/zinc/selftest/chacha20poly1305.h b/lib/zinc/selftest/chacha20poly1305.h
new file mode 100644
index 000000000000..9d170a2fc420
--- /dev/null
+++ b/lib/zinc/selftest/chacha20poly1305.h
@@ -0,0 +1,7852 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+struct chacha20poly1305_testvec {
+	u8 key[CHACHA20POLY1305_KEYLEN];
+	u8 nonce[XCHACHA20POLY1305_NONCELEN];
+	u8 assoc[64];
+	u8 input[2048];
+	u8 result[2048];
+	size_t nlen, alen, ilen;
+	bool failure;
+};
+
+/* The first of these are the ChaCha20-Poly1305 AEAD test vectors from RFC7539
+ * 2.8.2. After they are generated by the below python program. And the final
+ * marked ones are taken from wycheproof, but we only do these for the encrypt
+ * side, because mostly we're stressing the primitives rather than the actual
+ * chapoly construction. This also requires adding a 96-bit nonce construction,
+ * just for the purpose of the tests.
+ *
+ * #!/usr/bin/env python3
+ *
+ * from cryptography.hazmat.primitives.ciphers.aead import ChaCha20Poly1305
+ * import os
+ *
+ * def encode_blob(blob):
+ *     a = ""
+ *     for i in blob:
+ *         a += "\\x" + hex(i)[2:]
+ *     return a
+ *
+ * enc = [ ]
+ * dec = [ ]
+ *
+ * def make_vector(plen, adlen):
+ *     key = os.urandom(32)
+ *     nonce = os.urandom(8)
+ *     p = os.urandom(plen)
+ *     ad = os.urandom(adlen)
+ *     c = ChaCha20Poly1305(key).encrypt(nonce=bytes(4) + nonce, data=p, associated_data=ad)
+ *
+ *     out = "{\n"
+ *     out += "\t.key\t= \"" + encode_blob(key) + "\",\n"
+ *     out += "\t.nonce\t= \"" + encode_blob(nonce) + "\",\n"
+ *     out += "\t.assoc\t= \"" + encode_blob(ad) + "\",\n"
+ *     out += "\t.alen\t= " + str(len(ad)) + ",\n"
+ *     out += "\t.input\t= \"" + encode_blob(p) + "\",\n"
+ *     out += "\t.ilen\t= " + str(len(p)) + ",\n"
+ *     out += "\t.result\t= \"" + encode_blob(c) + "\"\n"
+ *     out += "}"
+ *     enc.append(out)
+ *
+ *
+ *     out = "{\n"
+ *     out += "\t.key\t= \"" + encode_blob(key) + "\",\n"
+ *     out += "\t.nonce\t= \"" + encode_blob(nonce) + "\",\n"
+ *     out += "\t.assoc\t= \"" + encode_blob(ad) + "\",\n"
+ *     out += "\t.alen\t= " + str(len(ad)) + ",\n"
+ *     out += "\t.input\t= \"" + encode_blob(c) + "\",\n"
+ *     out += "\t.ilen\t= " + str(len(c)) + ",\n"
+ *     out += "\t.result\t= \"" + encode_blob(p) + "\"\n"
+ *     out += "}"
+ *     dec.append(out)
+ *
+ *
+ * make_vector(0, 0)
+ * make_vector(0, 8)
+ * make_vector(1, 8)
+ * make_vector(1, 0)
+ * make_vector(129, 7)
+ * make_vector(256, 0)
+ * make_vector(512, 0)
+ * make_vector(513, 9)
+ * make_vector(1024, 16)
+ * make_vector(1933, 7)
+ * make_vector(2011, 63)
+ *
+ * print("======== encryption vectors ========")
+ * print(", ".join(enc))
+ *
+ * print("\n\n\n======== decryption vectors ========")
+ * print(", ".join(dec))
+ */
+
+static const struct chacha20poly1305_testvec
+chacha20poly1305_enc_vectors[] __initconst = { {
+	.key	= { 0x1c, 0x92, 0x40, 0xa5, 0xeb, 0x55, 0xd3, 0x8a,
+		    0xf3, 0x33, 0x88, 0x86, 0x04, 0xf6, 0xb5, 0xf0,
+		    0x47, 0x39, 0x17, 0xc1, 0x40, 0x2b, 0x80, 0x09,
+		    0x9d, 0xca, 0x5c, 0xbc, 0x20, 0x70, 0x75, 0xc0 },
+	.nonce	= { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 },
+	.nlen	= 8,
+	.assoc	= { 0xf3, 0x33, 0x88, 0x86, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x4e, 0x91 },
+	.alen	= 12,
+	.input	= { 0x49, 0x6e, 0x74, 0x65, 0x72, 0x6e, 0x65, 0x74,
+		    0x2d, 0x44, 0x72, 0x61, 0x66, 0x74, 0x73, 0x20,
+		    0x61, 0x72, 0x65, 0x20, 0x64, 0x72, 0x61, 0x66,
+		    0x74, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+		    0x6e, 0x74, 0x73, 0x20, 0x76, 0x61, 0x6c, 0x69,
+		    0x64, 0x20, 0x66, 0x6f, 0x72, 0x20, 0x61, 0x20,
+		    0x6d, 0x61, 0x78, 0x69, 0x6d, 0x75, 0x6d, 0x20,
+		    0x6f, 0x66, 0x20, 0x73, 0x69, 0x78, 0x20, 0x6d,
+		    0x6f, 0x6e, 0x74, 0x68, 0x73, 0x20, 0x61, 0x6e,
+		    0x64, 0x20, 0x6d, 0x61, 0x79, 0x20, 0x62, 0x65,
+		    0x20, 0x75, 0x70, 0x64, 0x61, 0x74, 0x65, 0x64,
+		    0x2c, 0x20, 0x72, 0x65, 0x70, 0x6c, 0x61, 0x63,
+		    0x65, 0x64, 0x2c, 0x20, 0x6f, 0x72, 0x20, 0x6f,
+		    0x62, 0x73, 0x6f, 0x6c, 0x65, 0x74, 0x65, 0x64,
+		    0x20, 0x62, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65,
+		    0x72, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+		    0x6e, 0x74, 0x73, 0x20, 0x61, 0x74, 0x20, 0x61,
+		    0x6e, 0x79, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x2e,
+		    0x20, 0x49, 0x74, 0x20, 0x69, 0x73, 0x20, 0x69,
+		    0x6e, 0x61, 0x70, 0x70, 0x72, 0x6f, 0x70, 0x72,
+		    0x69, 0x61, 0x74, 0x65, 0x20, 0x74, 0x6f, 0x20,
+		    0x75, 0x73, 0x65, 0x20, 0x49, 0x6e, 0x74, 0x65,
+		    0x72, 0x6e, 0x65, 0x74, 0x2d, 0x44, 0x72, 0x61,
+		    0x66, 0x74, 0x73, 0x20, 0x61, 0x73, 0x20, 0x72,
+		    0x65, 0x66, 0x65, 0x72, 0x65, 0x6e, 0x63, 0x65,
+		    0x20, 0x6d, 0x61, 0x74, 0x65, 0x72, 0x69, 0x61,
+		    0x6c, 0x20, 0x6f, 0x72, 0x20, 0x74, 0x6f, 0x20,
+		    0x63, 0x69, 0x74, 0x65, 0x20, 0x74, 0x68, 0x65,
+		    0x6d, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20,
+		    0x74, 0x68, 0x61, 0x6e, 0x20, 0x61, 0x73, 0x20,
+		    0x2f, 0xe2, 0x80, 0x9c, 0x77, 0x6f, 0x72, 0x6b,
+		    0x20, 0x69, 0x6e, 0x20, 0x70, 0x72, 0x6f, 0x67,
+		    0x72, 0x65, 0x73, 0x73, 0x2e, 0x2f, 0xe2, 0x80,
+		    0x9d },
+	.ilen	= 265,
+	.result	= { 0x64, 0xa0, 0x86, 0x15, 0x75, 0x86, 0x1a, 0xf4,
+		    0x60, 0xf0, 0x62, 0xc7, 0x9b, 0xe6, 0x43, 0xbd,
+		    0x5e, 0x80, 0x5c, 0xfd, 0x34, 0x5c, 0xf3, 0x89,
+		    0xf1, 0x08, 0x67, 0x0a, 0xc7, 0x6c, 0x8c, 0xb2,
+		    0x4c, 0x6c, 0xfc, 0x18, 0x75, 0x5d, 0x43, 0xee,
+		    0xa0, 0x9e, 0xe9, 0x4e, 0x38, 0x2d, 0x26, 0xb0,
+		    0xbd, 0xb7, 0xb7, 0x3c, 0x32, 0x1b, 0x01, 0x00,
+		    0xd4, 0xf0, 0x3b, 0x7f, 0x35, 0x58, 0x94, 0xcf,
+		    0x33, 0x2f, 0x83, 0x0e, 0x71, 0x0b, 0x97, 0xce,
+		    0x98, 0xc8, 0xa8, 0x4a, 0xbd, 0x0b, 0x94, 0x81,
+		    0x14, 0xad, 0x17, 0x6e, 0x00, 0x8d, 0x33, 0xbd,
+		    0x60, 0xf9, 0x82, 0xb1, 0xff, 0x37, 0xc8, 0x55,
+		    0x97, 0x97, 0xa0, 0x6e, 0xf4, 0xf0, 0xef, 0x61,
+		    0xc1, 0x86, 0x32, 0x4e, 0x2b, 0x35, 0x06, 0x38,
+		    0x36, 0x06, 0x90, 0x7b, 0x6a, 0x7c, 0x02, 0xb0,
+		    0xf9, 0xf6, 0x15, 0x7b, 0x53, 0xc8, 0x67, 0xe4,
+		    0xb9, 0x16, 0x6c, 0x76, 0x7b, 0x80, 0x4d, 0x46,
+		    0xa5, 0x9b, 0x52, 0x16, 0xcd, 0xe7, 0xa4, 0xe9,
+		    0x90, 0x40, 0xc5, 0xa4, 0x04, 0x33, 0x22, 0x5e,
+		    0xe2, 0x82, 0xa1, 0xb0, 0xa0, 0x6c, 0x52, 0x3e,
+		    0xaf, 0x45, 0x34, 0xd7, 0xf8, 0x3f, 0xa1, 0x15,
+		    0x5b, 0x00, 0x47, 0x71, 0x8c, 0xbc, 0x54, 0x6a,
+		    0x0d, 0x07, 0x2b, 0x04, 0xb3, 0x56, 0x4e, 0xea,
+		    0x1b, 0x42, 0x22, 0x73, 0xf5, 0x48, 0x27, 0x1a,
+		    0x0b, 0xb2, 0x31, 0x60, 0x53, 0xfa, 0x76, 0x99,
+		    0x19, 0x55, 0xeb, 0xd6, 0x31, 0x59, 0x43, 0x4e,
+		    0xce, 0xbb, 0x4e, 0x46, 0x6d, 0xae, 0x5a, 0x10,
+		    0x73, 0xa6, 0x72, 0x76, 0x27, 0x09, 0x7a, 0x10,
+		    0x49, 0xe6, 0x17, 0xd9, 0x1d, 0x36, 0x10, 0x94,
+		    0xfa, 0x68, 0xf0, 0xff, 0x77, 0x98, 0x71, 0x30,
+		    0x30, 0x5b, 0xea, 0xba, 0x2e, 0xda, 0x04, 0xdf,
+		    0x99, 0x7b, 0x71, 0x4d, 0x6c, 0x6f, 0x2c, 0x29,
+		    0xa6, 0xad, 0x5c, 0xb4, 0x02, 0x2b, 0x02, 0x70,
+		    0x9b, 0xee, 0xad, 0x9d, 0x67, 0x89, 0x0c, 0xbb,
+		    0x22, 0x39, 0x23, 0x36, 0xfe, 0xa1, 0x85, 0x1f,
+		    0x38 }
+}, {
+	.key	= { 0x4c, 0xf5, 0x96, 0x83, 0x38, 0xe6, 0xae, 0x7f,
+		    0x2d, 0x29, 0x25, 0x76, 0xd5, 0x75, 0x27, 0x86,
+		    0x91, 0x9a, 0x27, 0x7a, 0xfb, 0x46, 0xc5, 0xef,
+		    0x94, 0x81, 0x79, 0x57, 0x14, 0x59, 0x40, 0x68 },
+	.nonce	= { 0xca, 0xbf, 0x33, 0x71, 0x32, 0x45, 0x77, 0x8e },
+	.nlen	= 8,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= "",
+	.ilen	= 0,
+	.result	= { 0xea, 0xe0, 0x1e, 0x9e, 0x2c, 0x91, 0xaa, 0xe1,
+		    0xdb, 0x5d, 0x99, 0x3f, 0x8a, 0xf7, 0x69, 0x92 }
+}, {
+	.key	= { 0x2d, 0xb0, 0x5d, 0x40, 0xc8, 0xed, 0x44, 0x88,
+		    0x34, 0xd1, 0x13, 0xaf, 0x57, 0xa1, 0xeb, 0x3a,
+		    0x2a, 0x80, 0x51, 0x36, 0xec, 0x5b, 0xbc, 0x08,
+		    0x93, 0x84, 0x21, 0xb5, 0x13, 0x88, 0x3c, 0x0d },
+	.nonce	= { 0x3d, 0x86, 0xb5, 0x6b, 0xc8, 0xa3, 0x1f, 0x1d },
+	.nlen	= 8,
+	.assoc	= { 0x33, 0x10, 0x41, 0x12, 0x1f, 0xf3, 0xd2, 0x6b },
+	.alen	= 8,
+	.input	= "",
+	.ilen	= 0,
+	.result	= { 0xdd, 0x6b, 0x3b, 0x82, 0xce, 0x5a, 0xbd, 0xd6,
+		    0xa9, 0x35, 0x83, 0xd8, 0x8c, 0x3d, 0x85, 0x77 }
+}, {
+	.key	= { 0x4b, 0x28, 0x4b, 0xa3, 0x7b, 0xbe, 0xe9, 0xf8,
+		    0x31, 0x80, 0x82, 0xd7, 0xd8, 0xe8, 0xb5, 0xa1,
+		    0xe2, 0x18, 0x18, 0x8a, 0x9c, 0xfa, 0xa3, 0x3d,
+		    0x25, 0x71, 0x3e, 0x40, 0xbc, 0x54, 0x7a, 0x3e },
+	.nonce	= { 0xd2, 0x32, 0x1f, 0x29, 0x28, 0xc6, 0xc4, 0xc4 },
+	.nlen	= 8,
+	.assoc	= { 0x6a, 0xe2, 0xad, 0x3f, 0x88, 0x39, 0x5a, 0x40 },
+	.alen	= 8,
+	.input	= { 0xa4 },
+	.ilen	= 1,
+	.result	= { 0xb7, 0x1b, 0xb0, 0x73, 0x59, 0xb0, 0x84, 0xb2,
+		    0x6d, 0x8e, 0xab, 0x94, 0x31, 0xa1, 0xae, 0xac,
+		    0x89 }
+}, {
+	.key	= { 0x66, 0xca, 0x9c, 0x23, 0x2a, 0x4b, 0x4b, 0x31,
+		    0x0e, 0x92, 0x89, 0x8b, 0xf4, 0x93, 0xc7, 0x87,
+		    0x98, 0xa3, 0xd8, 0x39, 0xf8, 0xf4, 0xa7, 0x01,
+		    0xc0, 0x2e, 0x0a, 0xa6, 0x7e, 0x5a, 0x78, 0x87 },
+	.nonce	= { 0x20, 0x1c, 0xaa, 0x5f, 0x9c, 0xbf, 0x92, 0x30 },
+	.nlen	= 8,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x2d },
+	.ilen	= 1,
+	.result	= { 0xbf, 0xe1, 0x5b, 0x0b, 0xdb, 0x6b, 0xf5, 0x5e,
+		    0x6c, 0x5d, 0x84, 0x44, 0x39, 0x81, 0xc1, 0x9c,
+		    0xac }
+}, {
+	.key	= { 0x68, 0x7b, 0x8d, 0x8e, 0xe3, 0xc4, 0xdd, 0xae,
+		    0xdf, 0x72, 0x7f, 0x53, 0x72, 0x25, 0x1e, 0x78,
+		    0x91, 0xcb, 0x69, 0x76, 0x1f, 0x49, 0x93, 0xf9,
+		    0x6f, 0x21, 0xcc, 0x39, 0x9c, 0xad, 0xb1, 0x01 },
+	.nonce	= { 0xdf, 0x51, 0x84, 0x82, 0x42, 0x0c, 0x75, 0x9c },
+	.nlen	= 8,
+	.assoc	= { 0x70, 0xd3, 0x33, 0xf3, 0x8b, 0x18, 0x0b },
+	.alen	= 7,
+	.input	= { 0x33, 0x2f, 0x94, 0xc1, 0xa4, 0xef, 0xcc, 0x2a,
+		    0x5b, 0xa6, 0xe5, 0x8f, 0x1d, 0x40, 0xf0, 0x92,
+		    0x3c, 0xd9, 0x24, 0x11, 0xa9, 0x71, 0xf9, 0x37,
+		    0x14, 0x99, 0xfa, 0xbe, 0xe6, 0x80, 0xde, 0x50,
+		    0xc9, 0x96, 0xd4, 0xb0, 0xec, 0x9e, 0x17, 0xec,
+		    0xd2, 0x5e, 0x72, 0x99, 0xfc, 0x0a, 0xe1, 0xcb,
+		    0x48, 0xd2, 0x85, 0xdd, 0x2f, 0x90, 0xe0, 0x66,
+		    0x3b, 0xe6, 0x20, 0x74, 0xbe, 0x23, 0x8f, 0xcb,
+		    0xb4, 0xe4, 0xda, 0x48, 0x40, 0xa6, 0xd1, 0x1b,
+		    0xc7, 0x42, 0xce, 0x2f, 0x0c, 0xa6, 0x85, 0x6e,
+		    0x87, 0x37, 0x03, 0xb1, 0x7c, 0x25, 0x96, 0xa3,
+		    0x05, 0xd8, 0xb0, 0xf4, 0xed, 0xea, 0xc2, 0xf0,
+		    0x31, 0x98, 0x6c, 0xd1, 0x14, 0x25, 0xc0, 0xcb,
+		    0x01, 0x74, 0xd0, 0x82, 0xf4, 0x36, 0xf5, 0x41,
+		    0xd5, 0xdc, 0xca, 0xc5, 0xbb, 0x98, 0xfe, 0xfc,
+		    0x69, 0x21, 0x70, 0xd8, 0xa4, 0x4b, 0xc8, 0xde,
+		    0x8f },
+	.ilen	= 129,
+	.result	= { 0x8b, 0x06, 0xd3, 0x31, 0xb0, 0x93, 0x45, 0xb1,
+		    0x75, 0x6e, 0x26, 0xf9, 0x67, 0xbc, 0x90, 0x15,
+		    0x81, 0x2c, 0xb5, 0xf0, 0xc6, 0x2b, 0xc7, 0x8c,
+		    0x56, 0xd1, 0xbf, 0x69, 0x6c, 0x07, 0xa0, 0xda,
+		    0x65, 0x27, 0xc9, 0x90, 0x3d, 0xef, 0x4b, 0x11,
+		    0x0f, 0x19, 0x07, 0xfd, 0x29, 0x92, 0xd9, 0xc8,
+		    0xf7, 0x99, 0x2e, 0x4a, 0xd0, 0xb8, 0x2c, 0xdc,
+		    0x93, 0xf5, 0x9e, 0x33, 0x78, 0xd1, 0x37, 0xc3,
+		    0x66, 0xd7, 0x5e, 0xbc, 0x44, 0xbf, 0x53, 0xa5,
+		    0xbc, 0xc4, 0xcb, 0x7b, 0x3a, 0x8e, 0x7f, 0x02,
+		    0xbd, 0xbb, 0xe7, 0xca, 0xa6, 0x6c, 0x6b, 0x93,
+		    0x21, 0x93, 0x10, 0x61, 0xe7, 0x69, 0xd0, 0x78,
+		    0xf3, 0x07, 0x5a, 0x1a, 0x8f, 0x73, 0xaa, 0xb1,
+		    0x4e, 0xd3, 0xda, 0x4f, 0xf3, 0x32, 0xe1, 0x66,
+		    0x3e, 0x6c, 0xc6, 0x13, 0xba, 0x06, 0x5b, 0xfc,
+		    0x6a, 0xe5, 0x6f, 0x60, 0xfb, 0x07, 0x40, 0xb0,
+		    0x8c, 0x9d, 0x84, 0x43, 0x6b, 0xc1, 0xf7, 0x8d,
+		    0x8d, 0x31, 0xf7, 0x7a, 0x39, 0x4d, 0x8f, 0x9a,
+		    0xeb }
+}, {
+	.key	= { 0x8d, 0xb8, 0x91, 0x48, 0xf0, 0xe7, 0x0a, 0xbd,
+		    0xf9, 0x3f, 0xcd, 0xd9, 0xa0, 0x1e, 0x42, 0x4c,
+		    0xe7, 0xde, 0x25, 0x3d, 0xa3, 0xd7, 0x05, 0x80,
+		    0x8d, 0xf2, 0x82, 0xac, 0x44, 0x16, 0x51, 0x01 },
+	.nonce	= { 0xde, 0x7b, 0xef, 0xc3, 0x65, 0x1b, 0x68, 0xb0 },
+	.nlen	= 8,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x9b, 0x18, 0xdb, 0xdd, 0x9a, 0x0f, 0x3e, 0xa5,
+		    0x15, 0x17, 0xde, 0xdf, 0x08, 0x9d, 0x65, 0x0a,
+		    0x67, 0x30, 0x12, 0xe2, 0x34, 0x77, 0x4b, 0xc1,
+		    0xd9, 0xc6, 0x1f, 0xab, 0xc6, 0x18, 0x50, 0x17,
+		    0xa7, 0x9d, 0x3c, 0xa6, 0xc5, 0x35, 0x8c, 0x1c,
+		    0xc0, 0xa1, 0x7c, 0x9f, 0x03, 0x89, 0xca, 0xe1,
+		    0xe6, 0xe9, 0xd4, 0xd3, 0x88, 0xdb, 0xb4, 0x51,
+		    0x9d, 0xec, 0xb4, 0xfc, 0x52, 0xee, 0x6d, 0xf1,
+		    0x75, 0x42, 0xc6, 0xfd, 0xbd, 0x7a, 0x8e, 0x86,
+		    0xfc, 0x44, 0xb3, 0x4f, 0xf3, 0xea, 0x67, 0x5a,
+		    0x41, 0x13, 0xba, 0xb0, 0xdc, 0xe1, 0xd3, 0x2a,
+		    0x7c, 0x22, 0xb3, 0xca, 0xac, 0x6a, 0x37, 0x98,
+		    0x3e, 0x1d, 0x40, 0x97, 0xf7, 0x9b, 0x1d, 0x36,
+		    0x6b, 0xb3, 0x28, 0xbd, 0x60, 0x82, 0x47, 0x34,
+		    0xaa, 0x2f, 0x7d, 0xe9, 0xa8, 0x70, 0x81, 0x57,
+		    0xd4, 0xb9, 0x77, 0x0a, 0x9d, 0x29, 0xa7, 0x84,
+		    0x52, 0x4f, 0xc2, 0x4a, 0x40, 0x3b, 0x3c, 0xd4,
+		    0xc9, 0x2a, 0xdb, 0x4a, 0x53, 0xc4, 0xbe, 0x80,
+		    0xe9, 0x51, 0x7f, 0x8f, 0xc7, 0xa2, 0xce, 0x82,
+		    0x5c, 0x91, 0x1e, 0x74, 0xd9, 0xd0, 0xbd, 0xd5,
+		    0xf3, 0xfd, 0xda, 0x4d, 0x25, 0xb4, 0xbb, 0x2d,
+		    0xac, 0x2f, 0x3d, 0x71, 0x85, 0x7b, 0xcf, 0x3c,
+		    0x7b, 0x3e, 0x0e, 0x22, 0x78, 0x0c, 0x29, 0xbf,
+		    0xe4, 0xf4, 0x57, 0xb3, 0xcb, 0x49, 0xa0, 0xfc,
+		    0x1e, 0x05, 0x4e, 0x16, 0xbc, 0xd5, 0xa8, 0xa3,
+		    0xee, 0x05, 0x35, 0xc6, 0x7c, 0xab, 0x60, 0x14,
+		    0x55, 0x1a, 0x8e, 0xc5, 0x88, 0x5d, 0xd5, 0x81,
+		    0xc2, 0x81, 0xa5, 0xc4, 0x60, 0xdb, 0xaf, 0x77,
+		    0x91, 0xe1, 0xce, 0xa2, 0x7e, 0x7f, 0x42, 0xe3,
+		    0xb0, 0x13, 0x1c, 0x1f, 0x25, 0x60, 0x21, 0xe2,
+		    0x40, 0x5f, 0x99, 0xb7, 0x73, 0xec, 0x9b, 0x2b,
+		    0xf0, 0x65, 0x11, 0xc8, 0xd0, 0x0a, 0x9f, 0xd3 },
+	.ilen	= 256,
+	.result	= { 0x85, 0x04, 0xc2, 0xed, 0x8d, 0xfd, 0x97, 0x5c,
+		    0xd2, 0xb7, 0xe2, 0xc1, 0x6b, 0xa3, 0xba, 0xf8,
+		    0xc9, 0x50, 0xc3, 0xc6, 0xa5, 0xe3, 0xa4, 0x7c,
+		    0xc3, 0x23, 0x49, 0x5e, 0xa9, 0xb9, 0x32, 0xeb,
+		    0x8a, 0x7c, 0xca, 0xe5, 0xec, 0xfb, 0x7c, 0xc0,
+		    0xcb, 0x7d, 0xdc, 0x2c, 0x9d, 0x92, 0x55, 0x21,
+		    0x0a, 0xc8, 0x43, 0x63, 0x59, 0x0a, 0x31, 0x70,
+		    0x82, 0x67, 0x41, 0x03, 0xf8, 0xdf, 0xf2, 0xac,
+		    0xa7, 0x02, 0xd4, 0xd5, 0x8a, 0x2d, 0xc8, 0x99,
+		    0x19, 0x66, 0xd0, 0xf6, 0x88, 0x2c, 0x77, 0xd9,
+		    0xd4, 0x0d, 0x6c, 0xbd, 0x98, 0xde, 0xe7, 0x7f,
+		    0xad, 0x7e, 0x8a, 0xfb, 0xe9, 0x4b, 0xe5, 0xf7,
+		    0xe5, 0x50, 0xa0, 0x90, 0x3f, 0xd6, 0x22, 0x53,
+		    0xe3, 0xfe, 0x1b, 0xcc, 0x79, 0x3b, 0xec, 0x12,
+		    0x47, 0x52, 0xa7, 0xd6, 0x04, 0xe3, 0x52, 0xe6,
+		    0x93, 0x90, 0x91, 0x32, 0x73, 0x79, 0xb8, 0xd0,
+		    0x31, 0xde, 0x1f, 0x9f, 0x2f, 0x05, 0x38, 0x54,
+		    0x2f, 0x35, 0x04, 0x39, 0xe0, 0xa7, 0xba, 0xc6,
+		    0x52, 0xf6, 0x37, 0x65, 0x4c, 0x07, 0xa9, 0x7e,
+		    0xb3, 0x21, 0x6f, 0x74, 0x8c, 0xc9, 0xde, 0xdb,
+		    0x65, 0x1b, 0x9b, 0xaa, 0x60, 0xb1, 0x03, 0x30,
+		    0x6b, 0xb2, 0x03, 0xc4, 0x1c, 0x04, 0xf8, 0x0f,
+		    0x64, 0xaf, 0x46, 0xe4, 0x65, 0x99, 0x49, 0xe2,
+		    0xea, 0xce, 0x78, 0x00, 0xd8, 0x8b, 0xd5, 0x2e,
+		    0xcf, 0xfc, 0x40, 0x49, 0xe8, 0x58, 0xdc, 0x34,
+		    0x9c, 0x8c, 0x61, 0xbf, 0x0a, 0x8e, 0xec, 0x39,
+		    0xa9, 0x30, 0x05, 0x5a, 0xd2, 0x56, 0x01, 0xc7,
+		    0xda, 0x8f, 0x4e, 0xbb, 0x43, 0xa3, 0x3a, 0xf9,
+		    0x15, 0x2a, 0xd0, 0xa0, 0x7a, 0x87, 0x34, 0x82,
+		    0xfe, 0x8a, 0xd1, 0x2d, 0x5e, 0xc7, 0xbf, 0x04,
+		    0x53, 0x5f, 0x3b, 0x36, 0xd4, 0x25, 0x5c, 0x34,
+		    0x7a, 0x8d, 0xd5, 0x05, 0xce, 0x72, 0xca, 0xef,
+		    0x7a, 0x4b, 0xbc, 0xb0, 0x10, 0x5c, 0x96, 0x42,
+		    0x3a, 0x00, 0x98, 0xcd, 0x15, 0xe8, 0xb7, 0x53 }
+}, {
+	.key	= { 0xf2, 0xaa, 0x4f, 0x99, 0xfd, 0x3e, 0xa8, 0x53,
+		    0xc1, 0x44, 0xe9, 0x81, 0x18, 0xdc, 0xf5, 0xf0,
+		    0x3e, 0x44, 0x15, 0x59, 0xe0, 0xc5, 0x44, 0x86,
+		    0xc3, 0x91, 0xa8, 0x75, 0xc0, 0x12, 0x46, 0xba },
+	.nonce	= { 0x0e, 0x0d, 0x57, 0xbb, 0x7b, 0x40, 0x54, 0x02 },
+	.nlen	= 8,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xc3, 0x09, 0x94, 0x62, 0xe6, 0x46, 0x2e, 0x10,
+		    0xbe, 0x00, 0xe4, 0xfc, 0xf3, 0x40, 0xa3, 0xe2,
+		    0x0f, 0xc2, 0x8b, 0x28, 0xdc, 0xba, 0xb4, 0x3c,
+		    0xe4, 0x21, 0x58, 0x61, 0xcd, 0x8b, 0xcd, 0xfb,
+		    0xac, 0x94, 0xa1, 0x45, 0xf5, 0x1c, 0xe1, 0x12,
+		    0xe0, 0x3b, 0x67, 0x21, 0x54, 0x5e, 0x8c, 0xaa,
+		    0xcf, 0xdb, 0xb4, 0x51, 0xd4, 0x13, 0xda, 0xe6,
+		    0x83, 0x89, 0xb6, 0x92, 0xe9, 0x21, 0x76, 0xa4,
+		    0x93, 0x7d, 0x0e, 0xfd, 0x96, 0x36, 0x03, 0x91,
+		    0x43, 0x5c, 0x92, 0x49, 0x62, 0x61, 0x7b, 0xeb,
+		    0x43, 0x89, 0xb8, 0x12, 0x20, 0x43, 0xd4, 0x47,
+		    0x06, 0x84, 0xee, 0x47, 0xe9, 0x8a, 0x73, 0x15,
+		    0x0f, 0x72, 0xcf, 0xed, 0xce, 0x96, 0xb2, 0x7f,
+		    0x21, 0x45, 0x76, 0xeb, 0x26, 0x28, 0x83, 0x6a,
+		    0xad, 0xaa, 0xa6, 0x81, 0xd8, 0x55, 0xb1, 0xa3,
+		    0x85, 0xb3, 0x0c, 0xdf, 0xf1, 0x69, 0x2d, 0x97,
+		    0x05, 0x2a, 0xbc, 0x7c, 0x7b, 0x25, 0xf8, 0x80,
+		    0x9d, 0x39, 0x25, 0xf3, 0x62, 0xf0, 0x66, 0x5e,
+		    0xf4, 0xa0, 0xcf, 0xd8, 0xfd, 0x4f, 0xb1, 0x1f,
+		    0x60, 0x3a, 0x08, 0x47, 0xaf, 0xe1, 0xf6, 0x10,
+		    0x77, 0x09, 0xa7, 0x27, 0x8f, 0x9a, 0x97, 0x5a,
+		    0x26, 0xfa, 0xfe, 0x41, 0x32, 0x83, 0x10, 0xe0,
+		    0x1d, 0xbf, 0x64, 0x0d, 0xf4, 0x1c, 0x32, 0x35,
+		    0xe5, 0x1b, 0x36, 0xef, 0xd4, 0x4a, 0x93, 0x4d,
+		    0x00, 0x7c, 0xec, 0x02, 0x07, 0x8b, 0x5d, 0x7d,
+		    0x1b, 0x0e, 0xd1, 0xa6, 0xa5, 0x5d, 0x7d, 0x57,
+		    0x88, 0xa8, 0xcc, 0x81, 0xb4, 0x86, 0x4e, 0xb4,
+		    0x40, 0xe9, 0x1d, 0xc3, 0xb1, 0x24, 0x3e, 0x7f,
+		    0xcc, 0x8a, 0x24, 0x9b, 0xdf, 0x6d, 0xf0, 0x39,
+		    0x69, 0x3e, 0x4c, 0xc0, 0x96, 0xe4, 0x13, 0xda,
+		    0x90, 0xda, 0xf4, 0x95, 0x66, 0x8b, 0x17, 0x17,
+		    0xfe, 0x39, 0x43, 0x25, 0xaa, 0xda, 0xa0, 0x43,
+		    0x3c, 0xb1, 0x41, 0x02, 0xa3, 0xf0, 0xa7, 0x19,
+		    0x59, 0xbc, 0x1d, 0x7d, 0x6c, 0x6d, 0x91, 0x09,
+		    0x5c, 0xb7, 0x5b, 0x01, 0xd1, 0x6f, 0x17, 0x21,
+		    0x97, 0xbf, 0x89, 0x71, 0xa5, 0xb0, 0x6e, 0x07,
+		    0x45, 0xfd, 0x9d, 0xea, 0x07, 0xf6, 0x7a, 0x9f,
+		    0x10, 0x18, 0x22, 0x30, 0x73, 0xac, 0xd4, 0x6b,
+		    0x72, 0x44, 0xed, 0xd9, 0x19, 0x9b, 0x2d, 0x4a,
+		    0x41, 0xdd, 0xd1, 0x85, 0x5e, 0x37, 0x19, 0xed,
+		    0xd2, 0x15, 0x8f, 0x5e, 0x91, 0xdb, 0x33, 0xf2,
+		    0xe4, 0xdb, 0xff, 0x98, 0xfb, 0xa3, 0xb5, 0xca,
+		    0x21, 0x69, 0x08, 0xe7, 0x8a, 0xdf, 0x90, 0xff,
+		    0x3e, 0xe9, 0x20, 0x86, 0x3c, 0xe9, 0xfc, 0x0b,
+		    0xfe, 0x5c, 0x61, 0xaa, 0x13, 0x92, 0x7f, 0x7b,
+		    0xec, 0xe0, 0x6d, 0xa8, 0x23, 0x22, 0xf6, 0x6b,
+		    0x77, 0xc4, 0xfe, 0x40, 0x07, 0x3b, 0xb6, 0xf6,
+		    0x8e, 0x5f, 0xd4, 0xb9, 0xb7, 0x0f, 0x21, 0x04,
+		    0xef, 0x83, 0x63, 0x91, 0x69, 0x40, 0xa3, 0x48,
+		    0x5c, 0xd2, 0x60, 0xf9, 0x4f, 0x6c, 0x47, 0x8b,
+		    0x3b, 0xb1, 0x9f, 0x8e, 0xee, 0x16, 0x8a, 0x13,
+		    0xfc, 0x46, 0x17, 0xc3, 0xc3, 0x32, 0x56, 0xf8,
+		    0x3c, 0x85, 0x3a, 0xb6, 0x3e, 0xaa, 0x89, 0x4f,
+		    0xb3, 0xdf, 0x38, 0xfd, 0xf1, 0xe4, 0x3a, 0xc0,
+		    0xe6, 0x58, 0xb5, 0x8f, 0xc5, 0x29, 0xa2, 0x92,
+		    0x4a, 0xb6, 0xa0, 0x34, 0x7f, 0xab, 0xb5, 0x8a,
+		    0x90, 0xa1, 0xdb, 0x4d, 0xca, 0xb6, 0x2c, 0x41,
+		    0x3c, 0xf7, 0x2b, 0x21, 0xc3, 0xfd, 0xf4, 0x17,
+		    0x5c, 0xb5, 0x33, 0x17, 0x68, 0x2b, 0x08, 0x30,
+		    0xf3, 0xf7, 0x30, 0x3c, 0x96, 0xe6, 0x6a, 0x20,
+		    0x97, 0xe7, 0x4d, 0x10, 0x5f, 0x47, 0x5f, 0x49,
+		    0x96, 0x09, 0xf0, 0x27, 0x91, 0xc8, 0xf8, 0x5a,
+		    0x2e, 0x79, 0xb5, 0xe2, 0xb8, 0xe8, 0xb9, 0x7b,
+		    0xd5, 0x10, 0xcb, 0xff, 0x5d, 0x14, 0x73, 0xf3 },
+	.ilen	= 512,
+	.result	= { 0x14, 0xf6, 0x41, 0x37, 0xa6, 0xd4, 0x27, 0xcd,
+		    0xdb, 0x06, 0x3e, 0x9a, 0x4e, 0xab, 0xd5, 0xb1,
+		    0x1e, 0x6b, 0xd2, 0xbc, 0x11, 0xf4, 0x28, 0x93,
+		    0x63, 0x54, 0xef, 0xbb, 0x5e, 0x1d, 0x3a, 0x1d,
+		    0x37, 0x3c, 0x0a, 0x6c, 0x1e, 0xc2, 0xd1, 0x2c,
+		    0xb5, 0xa3, 0xb5, 0x7b, 0xb8, 0x8f, 0x25, 0xa6,
+		    0x1b, 0x61, 0x1c, 0xec, 0x28, 0x58, 0x26, 0xa4,
+		    0xa8, 0x33, 0x28, 0x25, 0x5c, 0x45, 0x05, 0xe5,
+		    0x6c, 0x99, 0xe5, 0x45, 0xc4, 0xa2, 0x03, 0x84,
+		    0x03, 0x73, 0x1e, 0x8c, 0x49, 0xac, 0x20, 0xdd,
+		    0x8d, 0xb3, 0xc4, 0xf5, 0xe7, 0x4f, 0xf1, 0xed,
+		    0xa1, 0x98, 0xde, 0xa4, 0x96, 0xdd, 0x2f, 0xab,
+		    0xab, 0x97, 0xcf, 0x3e, 0xd2, 0x9e, 0xb8, 0x13,
+		    0x07, 0x28, 0x29, 0x19, 0xaf, 0xfd, 0xf2, 0x49,
+		    0x43, 0xea, 0x49, 0x26, 0x91, 0xc1, 0x07, 0xd6,
+		    0xbb, 0x81, 0x75, 0x35, 0x0d, 0x24, 0x7f, 0xc8,
+		    0xda, 0xd4, 0xb7, 0xeb, 0xe8, 0x5c, 0x09, 0xa2,
+		    0x2f, 0xdc, 0x28, 0x7d, 0x3a, 0x03, 0xfa, 0x94,
+		    0xb5, 0x1d, 0x17, 0x99, 0x36, 0xc3, 0x1c, 0x18,
+		    0x34, 0xe3, 0x9f, 0xf5, 0x55, 0x7c, 0xb0, 0x60,
+		    0x9d, 0xff, 0xac, 0xd4, 0x61, 0xf2, 0xad, 0xf8,
+		    0xce, 0xc7, 0xbe, 0x5c, 0xd2, 0x95, 0xa8, 0x4b,
+		    0x77, 0x13, 0x19, 0x59, 0x26, 0xc9, 0xb7, 0x8f,
+		    0x6a, 0xcb, 0x2d, 0x37, 0x91, 0xea, 0x92, 0x9c,
+		    0x94, 0x5b, 0xda, 0x0b, 0xce, 0xfe, 0x30, 0x20,
+		    0xf8, 0x51, 0xad, 0xf2, 0xbe, 0xe7, 0xc7, 0xff,
+		    0xb3, 0x33, 0x91, 0x6a, 0xc9, 0x1a, 0x41, 0xc9,
+		    0x0f, 0xf3, 0x10, 0x0e, 0xfd, 0x53, 0xff, 0x6c,
+		    0x16, 0x52, 0xd9, 0xf3, 0xf7, 0x98, 0x2e, 0xc9,
+		    0x07, 0x31, 0x2c, 0x0c, 0x72, 0xd7, 0xc5, 0xc6,
+		    0x08, 0x2a, 0x7b, 0xda, 0xbd, 0x7e, 0x02, 0xea,
+		    0x1a, 0xbb, 0xf2, 0x04, 0x27, 0x61, 0x28, 0x8e,
+		    0xf5, 0x04, 0x03, 0x1f, 0x4c, 0x07, 0x55, 0x82,
+		    0xec, 0x1e, 0xd7, 0x8b, 0x2f, 0x65, 0x56, 0xd1,
+		    0xd9, 0x1e, 0x3c, 0xe9, 0x1f, 0x5e, 0x98, 0x70,
+		    0x38, 0x4a, 0x8c, 0x49, 0xc5, 0x43, 0xa0, 0xa1,
+		    0x8b, 0x74, 0x9d, 0x4c, 0x62, 0x0d, 0x10, 0x0c,
+		    0xf4, 0x6c, 0x8f, 0xe0, 0xaa, 0x9a, 0x8d, 0xb7,
+		    0xe0, 0xbe, 0x4c, 0x87, 0xf1, 0x98, 0x2f, 0xcc,
+		    0xed, 0xc0, 0x52, 0x29, 0xdc, 0x83, 0xf8, 0xfc,
+		    0x2c, 0x0e, 0xa8, 0x51, 0x4d, 0x80, 0x0d, 0xa3,
+		    0xfe, 0xd8, 0x37, 0xe7, 0x41, 0x24, 0xfc, 0xfb,
+		    0x75, 0xe3, 0x71, 0x7b, 0x57, 0x45, 0xf5, 0x97,
+		    0x73, 0x65, 0x63, 0x14, 0x74, 0xb8, 0x82, 0x9f,
+		    0xf8, 0x60, 0x2f, 0x8a, 0xf2, 0x4e, 0xf1, 0x39,
+		    0xda, 0x33, 0x91, 0xf8, 0x36, 0xe0, 0x8d, 0x3f,
+		    0x1f, 0x3b, 0x56, 0xdc, 0xa0, 0x8f, 0x3c, 0x9d,
+		    0x71, 0x52, 0xa7, 0xb8, 0xc0, 0xa5, 0xc6, 0xa2,
+		    0x73, 0xda, 0xf4, 0x4b, 0x74, 0x5b, 0x00, 0x3d,
+		    0x99, 0xd7, 0x96, 0xba, 0xe6, 0xe1, 0xa6, 0x96,
+		    0x38, 0xad, 0xb3, 0xc0, 0xd2, 0xba, 0x91, 0x6b,
+		    0xf9, 0x19, 0xdd, 0x3b, 0xbe, 0xbe, 0x9c, 0x20,
+		    0x50, 0xba, 0xa1, 0xd0, 0xce, 0x11, 0xbd, 0x95,
+		    0xd8, 0xd1, 0xdd, 0x33, 0x85, 0x74, 0xdc, 0xdb,
+		    0x66, 0x76, 0x44, 0xdc, 0x03, 0x74, 0x48, 0x35,
+		    0x98, 0xb1, 0x18, 0x47, 0x94, 0x7d, 0xff, 0x62,
+		    0xe4, 0x58, 0x78, 0xab, 0xed, 0x95, 0x36, 0xd9,
+		    0x84, 0x91, 0x82, 0x64, 0x41, 0xbb, 0x58, 0xe6,
+		    0x1c, 0x20, 0x6d, 0x15, 0x6b, 0x13, 0x96, 0xe8,
+		    0x35, 0x7f, 0xdc, 0x40, 0x2c, 0xe9, 0xbc, 0x8a,
+		    0x4f, 0x92, 0xec, 0x06, 0x2d, 0x50, 0xdf, 0x93,
+		    0x5d, 0x65, 0x5a, 0xa8, 0xfc, 0x20, 0x50, 0x14,
+		    0xa9, 0x8a, 0x7e, 0x1d, 0x08, 0x1f, 0xe2, 0x99,
+		    0xd0, 0xbe, 0xfb, 0x3a, 0x21, 0x9d, 0xad, 0x86,
+		    0x54, 0xfd, 0x0d, 0x98, 0x1c, 0x5a, 0x6f, 0x1f,
+		    0x9a, 0x40, 0xcd, 0xa2, 0xff, 0x6a, 0xf1, 0x54 }
+}, {
+	.key	= { 0xea, 0xbc, 0x56, 0x99, 0xe3, 0x50, 0xff, 0xc5,
+		    0xcc, 0x1a, 0xd7, 0xc1, 0x57, 0x72, 0xea, 0x86,
+		    0x5b, 0x89, 0x88, 0x61, 0x3d, 0x2f, 0x9b, 0xb2,
+		    0xe7, 0x9c, 0xec, 0x74, 0x6e, 0x3e, 0xf4, 0x3b },
+	.nonce	= { 0xef, 0x2d, 0x63, 0xee, 0x6b, 0x80, 0x8b, 0x78 },
+	.nlen	= 8,
+	.assoc	= { 0x5a, 0x27, 0xff, 0xeb, 0xdf, 0x84, 0xb2, 0x9e,
+		    0xef },
+	.alen	= 9,
+	.input	= { 0xe6, 0xc3, 0xdb, 0x63, 0x55, 0x15, 0xe3, 0x5b,
+		    0xb7, 0x4b, 0x27, 0x8b, 0x5a, 0xdd, 0xc2, 0xe8,
+		    0x3a, 0x6b, 0xd7, 0x81, 0x96, 0x35, 0x97, 0xca,
+		    0xd7, 0x68, 0xe8, 0xef, 0xce, 0xab, 0xda, 0x09,
+		    0x6e, 0xd6, 0x8e, 0xcb, 0x55, 0xb5, 0xe1, 0xe5,
+		    0x57, 0xfd, 0xc4, 0xe3, 0xe0, 0x18, 0x4f, 0x85,
+		    0xf5, 0x3f, 0x7e, 0x4b, 0x88, 0xc9, 0x52, 0x44,
+		    0x0f, 0xea, 0xaf, 0x1f, 0x71, 0x48, 0x9f, 0x97,
+		    0x6d, 0xb9, 0x6f, 0x00, 0xa6, 0xde, 0x2b, 0x77,
+		    0x8b, 0x15, 0xad, 0x10, 0xa0, 0x2b, 0x7b, 0x41,
+		    0x90, 0x03, 0x2d, 0x69, 0xae, 0xcc, 0x77, 0x7c,
+		    0xa5, 0x9d, 0x29, 0x22, 0xc2, 0xea, 0xb4, 0x00,
+		    0x1a, 0xd2, 0x7a, 0x98, 0x8a, 0xf9, 0xf7, 0x82,
+		    0xb0, 0xab, 0xd8, 0xa6, 0x94, 0x8d, 0x58, 0x2f,
+		    0x01, 0x9e, 0x00, 0x20, 0xfc, 0x49, 0xdc, 0x0e,
+		    0x03, 0xe8, 0x45, 0x10, 0xd6, 0xa8, 0xda, 0x55,
+		    0x10, 0x9a, 0xdf, 0x67, 0x22, 0x8b, 0x43, 0xab,
+		    0x00, 0xbb, 0x02, 0xc8, 0xdd, 0x7b, 0x97, 0x17,
+		    0xd7, 0x1d, 0x9e, 0x02, 0x5e, 0x48, 0xde, 0x8e,
+		    0xcf, 0x99, 0x07, 0x95, 0x92, 0x3c, 0x5f, 0x9f,
+		    0xc5, 0x8a, 0xc0, 0x23, 0xaa, 0xd5, 0x8c, 0x82,
+		    0x6e, 0x16, 0x92, 0xb1, 0x12, 0x17, 0x07, 0xc3,
+		    0xfb, 0x36, 0xf5, 0x6c, 0x35, 0xd6, 0x06, 0x1f,
+		    0x9f, 0xa7, 0x94, 0xa2, 0x38, 0x63, 0x9c, 0xb0,
+		    0x71, 0xb3, 0xa5, 0xd2, 0xd8, 0xba, 0x9f, 0x08,
+		    0x01, 0xb3, 0xff, 0x04, 0x97, 0x73, 0x45, 0x1b,
+		    0xd5, 0xa9, 0x9c, 0x80, 0xaf, 0x04, 0x9a, 0x85,
+		    0xdb, 0x32, 0x5b, 0x5d, 0x1a, 0xc1, 0x36, 0x28,
+		    0x10, 0x79, 0xf1, 0x3c, 0xbf, 0x1a, 0x41, 0x5c,
+		    0x4e, 0xdf, 0xb2, 0x7c, 0x79, 0x3b, 0x7a, 0x62,
+		    0x3d, 0x4b, 0xc9, 0x9b, 0x2a, 0x2e, 0x7c, 0xa2,
+		    0xb1, 0x11, 0x98, 0xa7, 0x34, 0x1a, 0x00, 0xf3,
+		    0xd1, 0xbc, 0x18, 0x22, 0xba, 0x02, 0x56, 0x62,
+		    0x31, 0x10, 0x11, 0x6d, 0xe0, 0x54, 0x9d, 0x40,
+		    0x1f, 0x26, 0x80, 0x41, 0xca, 0x3f, 0x68, 0x0f,
+		    0x32, 0x1d, 0x0a, 0x8e, 0x79, 0xd8, 0xa4, 0x1b,
+		    0x29, 0x1c, 0x90, 0x8e, 0xc5, 0xe3, 0xb4, 0x91,
+		    0x37, 0x9a, 0x97, 0x86, 0x99, 0xd5, 0x09, 0xc5,
+		    0xbb, 0xa3, 0x3f, 0x21, 0x29, 0x82, 0x14, 0x5c,
+		    0xab, 0x25, 0xfb, 0xf2, 0x4f, 0x58, 0x26, 0xd4,
+		    0x83, 0xaa, 0x66, 0x89, 0x67, 0x7e, 0xc0, 0x49,
+		    0xe1, 0x11, 0x10, 0x7f, 0x7a, 0xda, 0x29, 0x04,
+		    0xff, 0xf0, 0xcb, 0x09, 0x7c, 0x9d, 0xfa, 0x03,
+		    0x6f, 0x81, 0x09, 0x31, 0x60, 0xfb, 0x08, 0xfa,
+		    0x74, 0xd3, 0x64, 0x44, 0x7c, 0x55, 0x85, 0xec,
+		    0x9c, 0x6e, 0x25, 0xb7, 0x6c, 0xc5, 0x37, 0xb6,
+		    0x83, 0x87, 0x72, 0x95, 0x8b, 0x9d, 0xe1, 0x69,
+		    0x5c, 0x31, 0x95, 0x42, 0xa6, 0x2c, 0xd1, 0x36,
+		    0x47, 0x1f, 0xec, 0x54, 0xab, 0xa2, 0x1c, 0xd8,
+		    0x00, 0xcc, 0xbc, 0x0d, 0x65, 0xe2, 0x67, 0xbf,
+		    0xbc, 0xea, 0xee, 0x9e, 0xe4, 0x36, 0x95, 0xbe,
+		    0x73, 0xd9, 0xa6, 0xd9, 0x0f, 0xa0, 0xcc, 0x82,
+		    0x76, 0x26, 0xad, 0x5b, 0x58, 0x6c, 0x4e, 0xab,
+		    0x29, 0x64, 0xd3, 0xd9, 0xa9, 0x08, 0x8c, 0x1d,
+		    0xa1, 0x4f, 0x80, 0xd8, 0x3f, 0x94, 0xfb, 0xd3,
+		    0x7b, 0xfc, 0xd1, 0x2b, 0xc3, 0x21, 0xeb, 0xe5,
+		    0x1c, 0x84, 0x23, 0x7f, 0x4b, 0xfa, 0xdb, 0x34,
+		    0x18, 0xa2, 0xc2, 0xe5, 0x13, 0xfe, 0x6c, 0x49,
+		    0x81, 0xd2, 0x73, 0xe7, 0xe2, 0xd7, 0xe4, 0x4f,
+		    0x4b, 0x08, 0x6e, 0xb1, 0x12, 0x22, 0x10, 0x9d,
+		    0xac, 0x51, 0x1e, 0x17, 0xd9, 0x8a, 0x0b, 0x42,
+		    0x88, 0x16, 0x81, 0x37, 0x7c, 0x6a, 0xf7, 0xef,
+		    0x2d, 0xe3, 0xd9, 0xf8, 0x5f, 0xe0, 0x53, 0x27,
+		    0x74, 0xb9, 0xe2, 0xd6, 0x1c, 0x80, 0x2c, 0x52,
+		    0x65 },
+	.ilen	= 513,
+	.result	= { 0xfd, 0x81, 0x8d, 0xd0, 0x3d, 0xb4, 0xd5, 0xdf,
+		    0xd3, 0x42, 0x47, 0x5a, 0x6d, 0x19, 0x27, 0x66,
+		    0x4b, 0x2e, 0x0c, 0x27, 0x9c, 0x96, 0x4c, 0x72,
+		    0x02, 0xa3, 0x65, 0xc3, 0xb3, 0x6f, 0x2e, 0xbd,
+		    0x63, 0x8a, 0x4a, 0x5d, 0x29, 0xa2, 0xd0, 0x28,
+		    0x48, 0xc5, 0x3d, 0x98, 0xa3, 0xbc, 0xe0, 0xbe,
+		    0x3b, 0x3f, 0xe6, 0x8a, 0xa4, 0x7f, 0x53, 0x06,
+		    0xfa, 0x7f, 0x27, 0x76, 0x72, 0x31, 0xa1, 0xf5,
+		    0xd6, 0x0c, 0x52, 0x47, 0xba, 0xcd, 0x4f, 0xd7,
+		    0xeb, 0x05, 0x48, 0x0d, 0x7c, 0x35, 0x4a, 0x09,
+		    0xc9, 0x76, 0x71, 0x02, 0xa3, 0xfb, 0xb7, 0x1a,
+		    0x65, 0xb7, 0xed, 0x98, 0xc6, 0x30, 0x8a, 0x00,
+		    0xae, 0xa1, 0x31, 0xe5, 0xb5, 0x9e, 0x6d, 0x62,
+		    0xda, 0xda, 0x07, 0x0f, 0x38, 0x38, 0xd3, 0xcb,
+		    0xc1, 0xb0, 0xad, 0xec, 0x72, 0xec, 0xb1, 0xa2,
+		    0x7b, 0x59, 0xf3, 0x3d, 0x2b, 0xef, 0xcd, 0x28,
+		    0x5b, 0x83, 0xcc, 0x18, 0x91, 0x88, 0xb0, 0x2e,
+		    0xf9, 0x29, 0x31, 0x18, 0xf9, 0x4e, 0xe9, 0x0a,
+		    0x91, 0x92, 0x9f, 0xae, 0x2d, 0xad, 0xf4, 0xe6,
+		    0x1a, 0xe2, 0xa4, 0xee, 0x47, 0x15, 0xbf, 0x83,
+		    0x6e, 0xd7, 0x72, 0x12, 0x3b, 0x2d, 0x24, 0xe9,
+		    0xb2, 0x55, 0xcb, 0x3c, 0x10, 0xf0, 0x24, 0x8a,
+		    0x4a, 0x02, 0xea, 0x90, 0x25, 0xf0, 0xb4, 0x79,
+		    0x3a, 0xef, 0x6e, 0xf5, 0x52, 0xdf, 0xb0, 0x0a,
+		    0xcd, 0x24, 0x1c, 0xd3, 0x2e, 0x22, 0x74, 0xea,
+		    0x21, 0x6f, 0xe9, 0xbd, 0xc8, 0x3e, 0x36, 0x5b,
+		    0x19, 0xf1, 0xca, 0x99, 0x0a, 0xb4, 0xa7, 0x52,
+		    0x1a, 0x4e, 0xf2, 0xad, 0x8d, 0x56, 0x85, 0xbb,
+		    0x64, 0x89, 0xba, 0x26, 0xf9, 0xc7, 0xe1, 0x89,
+		    0x19, 0x22, 0x77, 0xc3, 0xa8, 0xfc, 0xff, 0xad,
+		    0xfe, 0xb9, 0x48, 0xae, 0x12, 0x30, 0x9f, 0x19,
+		    0xfb, 0x1b, 0xef, 0x14, 0x87, 0x8a, 0x78, 0x71,
+		    0xf3, 0xf4, 0xb7, 0x00, 0x9c, 0x1d, 0xb5, 0x3d,
+		    0x49, 0x00, 0x0c, 0x06, 0xd4, 0x50, 0xf9, 0x54,
+		    0x45, 0xb2, 0x5b, 0x43, 0xdb, 0x6d, 0xcf, 0x1a,
+		    0xe9, 0x7a, 0x7a, 0xcf, 0xfc, 0x8a, 0x4e, 0x4d,
+		    0x0b, 0x07, 0x63, 0x28, 0xd8, 0xe7, 0x08, 0x95,
+		    0xdf, 0xa6, 0x72, 0x93, 0x2e, 0xbb, 0xa0, 0x42,
+		    0x89, 0x16, 0xf1, 0xd9, 0x0c, 0xf9, 0xa1, 0x16,
+		    0xfd, 0xd9, 0x03, 0xb4, 0x3b, 0x8a, 0xf5, 0xf6,
+		    0xe7, 0x6b, 0x2e, 0x8e, 0x4c, 0x3d, 0xe2, 0xaf,
+		    0x08, 0x45, 0x03, 0xff, 0x09, 0xb6, 0xeb, 0x2d,
+		    0xc6, 0x1b, 0x88, 0x94, 0xac, 0x3e, 0xf1, 0x9f,
+		    0x0e, 0x0e, 0x2b, 0xd5, 0x00, 0x4d, 0x3f, 0x3b,
+		    0x53, 0xae, 0xaf, 0x1c, 0x33, 0x5f, 0x55, 0x6e,
+		    0x8d, 0xaf, 0x05, 0x7a, 0x10, 0x34, 0xc9, 0xf4,
+		    0x66, 0xcb, 0x62, 0x12, 0xa6, 0xee, 0xe8, 0x1c,
+		    0x5d, 0x12, 0x86, 0xdb, 0x6f, 0x1c, 0x33, 0xc4,
+		    0x1c, 0xda, 0x82, 0x2d, 0x3b, 0x59, 0xfe, 0xb1,
+		    0xa4, 0x59, 0x41, 0x86, 0xd0, 0xef, 0xae, 0xfb,
+		    0xda, 0x6d, 0x11, 0xb8, 0xca, 0xe9, 0x6e, 0xff,
+		    0xf7, 0xa9, 0xd9, 0x70, 0x30, 0xfc, 0x53, 0xe2,
+		    0xd7, 0xa2, 0x4e, 0xc7, 0x91, 0xd9, 0x07, 0x06,
+		    0xaa, 0xdd, 0xb0, 0x59, 0x28, 0x1d, 0x00, 0x66,
+		    0xc5, 0x54, 0xc2, 0xfc, 0x06, 0xda, 0x05, 0x90,
+		    0x52, 0x1d, 0x37, 0x66, 0xee, 0xf0, 0xb2, 0x55,
+		    0x8a, 0x5d, 0xd2, 0x38, 0x86, 0x94, 0x9b, 0xfc,
+		    0x10, 0x4c, 0xa1, 0xb9, 0x64, 0x3e, 0x44, 0xb8,
+		    0x5f, 0xb0, 0x0c, 0xec, 0xe0, 0xc9, 0xe5, 0x62,
+		    0x75, 0x3f, 0x09, 0xd5, 0xf5, 0xd9, 0x26, 0xba,
+		    0x9e, 0xd2, 0xf4, 0xb9, 0x48, 0x0a, 0xbc, 0xa2,
+		    0xd6, 0x7c, 0x36, 0x11, 0x7d, 0x26, 0x81, 0x89,
+		    0xcf, 0xa4, 0xad, 0x73, 0x0e, 0xee, 0xcc, 0x06,
+		    0xa9, 0xdb, 0xb1, 0xfd, 0xfb, 0x09, 0x7f, 0x90,
+		    0x42, 0x37, 0x2f, 0xe1, 0x9c, 0x0f, 0x6f, 0xcf,
+		    0x43, 0xb5, 0xd9, 0x90, 0xe1, 0x85, 0xf5, 0xa8,
+		    0xae }
+}, {
+	.key	= { 0x47, 0x11, 0xeb, 0x86, 0x2b, 0x2c, 0xab, 0x44,
+		    0x34, 0xda, 0x7f, 0x57, 0x03, 0x39, 0x0c, 0xaf,
+		    0x2c, 0x14, 0xfd, 0x65, 0x23, 0xe9, 0x8e, 0x74,
+		    0xd5, 0x08, 0x68, 0x08, 0xe7, 0xb4, 0x72, 0xd7 },
+	.nonce	= { 0xdb, 0x92, 0x0f, 0x7f, 0x17, 0x54, 0x0c, 0x30 },
+	.nlen	= 8,
+	.assoc	= { 0xd2, 0xa1, 0x70, 0xdb, 0x7a, 0xf8, 0xfa, 0x27,
+		    0xba, 0x73, 0x0f, 0xbf, 0x3d, 0x1e, 0x82, 0xb2 },
+	.alen	= 16,
+	.input	= { 0x42, 0x93, 0xe4, 0xeb, 0x97, 0xb0, 0x57, 0xbf,
+		    0x1a, 0x8b, 0x1f, 0xe4, 0x5f, 0x36, 0x20, 0x3c,
+		    0xef, 0x0a, 0xa9, 0x48, 0x5f, 0x5f, 0x37, 0x22,
+		    0x3a, 0xde, 0xe3, 0xae, 0xbe, 0xad, 0x07, 0xcc,
+		    0xb1, 0xf6, 0xf5, 0xf9, 0x56, 0xdd, 0xe7, 0x16,
+		    0x1e, 0x7f, 0xdf, 0x7a, 0x9e, 0x75, 0xb7, 0xc7,
+		    0xbe, 0xbe, 0x8a, 0x36, 0x04, 0xc0, 0x10, 0xf4,
+		    0x95, 0x20, 0x03, 0xec, 0xdc, 0x05, 0xa1, 0x7d,
+		    0xc4, 0xa9, 0x2c, 0x82, 0xd0, 0xbc, 0x8b, 0xc5,
+		    0xc7, 0x45, 0x50, 0xf6, 0xa2, 0x1a, 0xb5, 0x46,
+		    0x3b, 0x73, 0x02, 0xa6, 0x83, 0x4b, 0x73, 0x82,
+		    0x58, 0x5e, 0x3b, 0x65, 0x2f, 0x0e, 0xfd, 0x2b,
+		    0x59, 0x16, 0xce, 0xa1, 0x60, 0x9c, 0xe8, 0x3a,
+		    0x99, 0xed, 0x8d, 0x5a, 0xcf, 0xf6, 0x83, 0xaf,
+		    0xba, 0xd7, 0x73, 0x73, 0x40, 0x97, 0x3d, 0xca,
+		    0xef, 0x07, 0x57, 0xe6, 0xd9, 0x70, 0x0e, 0x95,
+		    0xae, 0xa6, 0x8d, 0x04, 0xcc, 0xee, 0xf7, 0x09,
+		    0x31, 0x77, 0x12, 0xa3, 0x23, 0x97, 0x62, 0xb3,
+		    0x7b, 0x32, 0xfb, 0x80, 0x14, 0x48, 0x81, 0xc3,
+		    0xe5, 0xea, 0x91, 0x39, 0x52, 0x81, 0xa2, 0x4f,
+		    0xe4, 0xb3, 0x09, 0xff, 0xde, 0x5e, 0xe9, 0x58,
+		    0x84, 0x6e, 0xf9, 0x3d, 0xdf, 0x25, 0xea, 0xad,
+		    0xae, 0xe6, 0x9a, 0xd1, 0x89, 0x55, 0xd3, 0xde,
+		    0x6c, 0x52, 0xdb, 0x70, 0xfe, 0x37, 0xce, 0x44,
+		    0x0a, 0xa8, 0x25, 0x5f, 0x92, 0xc1, 0x33, 0x4a,
+		    0x4f, 0x9b, 0x62, 0x35, 0xff, 0xce, 0xc0, 0xa9,
+		    0x60, 0xce, 0x52, 0x00, 0x97, 0x51, 0x35, 0x26,
+		    0x2e, 0xb9, 0x36, 0xa9, 0x87, 0x6e, 0x1e, 0xcc,
+		    0x91, 0x78, 0x53, 0x98, 0x86, 0x5b, 0x9c, 0x74,
+		    0x7d, 0x88, 0x33, 0xe1, 0xdf, 0x37, 0x69, 0x2b,
+		    0xbb, 0xf1, 0x4d, 0xf4, 0xd1, 0xf1, 0x39, 0x93,
+		    0x17, 0x51, 0x19, 0xe3, 0x19, 0x1e, 0x76, 0x37,
+		    0x25, 0xfb, 0x09, 0x27, 0x6a, 0xab, 0x67, 0x6f,
+		    0x14, 0x12, 0x64, 0xe7, 0xc4, 0x07, 0xdf, 0x4d,
+		    0x17, 0xbb, 0x6d, 0xe0, 0xe9, 0xb9, 0xab, 0xca,
+		    0x10, 0x68, 0xaf, 0x7e, 0xb7, 0x33, 0x54, 0x73,
+		    0x07, 0x6e, 0xf7, 0x81, 0x97, 0x9c, 0x05, 0x6f,
+		    0x84, 0x5f, 0xd2, 0x42, 0xfb, 0x38, 0xcf, 0xd1,
+		    0x2f, 0x14, 0x30, 0x88, 0x98, 0x4d, 0x5a, 0xa9,
+		    0x76, 0xd5, 0x4f, 0x3e, 0x70, 0x6c, 0x85, 0x76,
+		    0xd7, 0x01, 0xa0, 0x1a, 0xc8, 0x4e, 0xaa, 0xac,
+		    0x78, 0xfe, 0x46, 0xde, 0x6a, 0x05, 0x46, 0xa7,
+		    0x43, 0x0c, 0xb9, 0xde, 0xb9, 0x68, 0xfb, 0xce,
+		    0x42, 0x99, 0x07, 0x4d, 0x0b, 0x3b, 0x5a, 0x30,
+		    0x35, 0xa8, 0xf9, 0x3a, 0x73, 0xef, 0x0f, 0xdb,
+		    0x1e, 0x16, 0x42, 0xc4, 0xba, 0xae, 0x58, 0xaa,
+		    0xf8, 0xe5, 0x75, 0x2f, 0x1b, 0x15, 0x5c, 0xfd,
+		    0x0a, 0x97, 0xd0, 0xe4, 0x37, 0x83, 0x61, 0x5f,
+		    0x43, 0xa6, 0xc7, 0x3f, 0x38, 0x59, 0xe6, 0xeb,
+		    0xa3, 0x90, 0xc3, 0xaa, 0xaa, 0x5a, 0xd3, 0x34,
+		    0xd4, 0x17, 0xc8, 0x65, 0x3e, 0x57, 0xbc, 0x5e,
+		    0xdd, 0x9e, 0xb7, 0xf0, 0x2e, 0x5b, 0xb2, 0x1f,
+		    0x8a, 0x08, 0x0d, 0x45, 0x91, 0x0b, 0x29, 0x53,
+		    0x4f, 0x4c, 0x5a, 0x73, 0x56, 0xfe, 0xaf, 0x41,
+		    0x01, 0x39, 0x0a, 0x24, 0x3c, 0x7e, 0xbe, 0x4e,
+		    0x53, 0xf3, 0xeb, 0x06, 0x66, 0x51, 0x28, 0x1d,
+		    0xbd, 0x41, 0x0a, 0x01, 0xab, 0x16, 0x47, 0x27,
+		    0x47, 0x47, 0xf7, 0xcb, 0x46, 0x0a, 0x70, 0x9e,
+		    0x01, 0x9c, 0x09, 0xe1, 0x2a, 0x00, 0x1a, 0xd8,
+		    0xd4, 0x79, 0x9d, 0x80, 0x15, 0x8e, 0x53, 0x2a,
+		    0x65, 0x83, 0x78, 0x3e, 0x03, 0x00, 0x07, 0x12,
+		    0x1f, 0x33, 0x3e, 0x7b, 0x13, 0x37, 0xf1, 0xc3,
+		    0xef, 0xb7, 0xc1, 0x20, 0x3c, 0x3e, 0x67, 0x66,
+		    0x5d, 0x88, 0xa7, 0x7d, 0x33, 0x50, 0x77, 0xb0,
+		    0x28, 0x8e, 0xe7, 0x2c, 0x2e, 0x7a, 0xf4, 0x3c,
+		    0x8d, 0x74, 0x83, 0xaf, 0x8e, 0x87, 0x0f, 0xe4,
+		    0x50, 0xff, 0x84, 0x5c, 0x47, 0x0c, 0x6a, 0x49,
+		    0xbf, 0x42, 0x86, 0x77, 0x15, 0x48, 0xa5, 0x90,
+		    0x5d, 0x93, 0xd6, 0x2a, 0x11, 0xd5, 0xd5, 0x11,
+		    0xaa, 0xce, 0xe7, 0x6f, 0xa5, 0xb0, 0x09, 0x2c,
+		    0x8d, 0xd3, 0x92, 0xf0, 0x5a, 0x2a, 0xda, 0x5b,
+		    0x1e, 0xd5, 0x9a, 0xc4, 0xc4, 0xf3, 0x49, 0x74,
+		    0x41, 0xca, 0xe8, 0xc1, 0xf8, 0x44, 0xd6, 0x3c,
+		    0xae, 0x6c, 0x1d, 0x9a, 0x30, 0x04, 0x4d, 0x27,
+		    0x0e, 0xb1, 0x5f, 0x59, 0xa2, 0x24, 0xe8, 0xe1,
+		    0x98, 0xc5, 0x6a, 0x4c, 0xfe, 0x41, 0xd2, 0x27,
+		    0x42, 0x52, 0xe1, 0xe9, 0x7d, 0x62, 0xe4, 0x88,
+		    0x0f, 0xad, 0xb2, 0x70, 0xcb, 0x9d, 0x4c, 0x27,
+		    0x2e, 0x76, 0x1e, 0x1a, 0x63, 0x65, 0xf5, 0x3b,
+		    0xf8, 0x57, 0x69, 0xeb, 0x5b, 0x38, 0x26, 0x39,
+		    0x33, 0x25, 0x45, 0x3e, 0x91, 0xb8, 0xd8, 0xc7,
+		    0xd5, 0x42, 0xc0, 0x22, 0x31, 0x74, 0xf4, 0xbc,
+		    0x0c, 0x23, 0xf1, 0xca, 0xc1, 0x8d, 0xd7, 0xbe,
+		    0xc9, 0x62, 0xe4, 0x08, 0x1a, 0xcf, 0x36, 0xd5,
+		    0xfe, 0x55, 0x21, 0x59, 0x91, 0x87, 0x87, 0xdf,
+		    0x06, 0xdb, 0xdf, 0x96, 0x45, 0x58, 0xda, 0x05,
+		    0xcd, 0x50, 0x4d, 0xd2, 0x7d, 0x05, 0x18, 0x73,
+		    0x6a, 0x8d, 0x11, 0x85, 0xa6, 0x88, 0xe8, 0xda,
+		    0xe6, 0x30, 0x33, 0xa4, 0x89, 0x31, 0x75, 0xbe,
+		    0x69, 0x43, 0x84, 0x43, 0x50, 0x87, 0xdd, 0x71,
+		    0x36, 0x83, 0xc3, 0x78, 0x74, 0x24, 0x0a, 0xed,
+		    0x7b, 0xdb, 0xa4, 0x24, 0x0b, 0xb9, 0x7e, 0x5d,
+		    0xff, 0xde, 0xb1, 0xef, 0x61, 0x5a, 0x45, 0x33,
+		    0xf6, 0x17, 0x07, 0x08, 0x98, 0x83, 0x92, 0x0f,
+		    0x23, 0x6d, 0xe6, 0xaa, 0x17, 0x54, 0xad, 0x6a,
+		    0xc8, 0xdb, 0x26, 0xbe, 0xb8, 0xb6, 0x08, 0xfa,
+		    0x68, 0xf1, 0xd7, 0x79, 0x6f, 0x18, 0xb4, 0x9e,
+		    0x2d, 0x3f, 0x1b, 0x64, 0xaf, 0x8d, 0x06, 0x0e,
+		    0x49, 0x28, 0xe0, 0x5d, 0x45, 0x68, 0x13, 0x87,
+		    0xfa, 0xde, 0x40, 0x7b, 0xd2, 0xc3, 0x94, 0xd5,
+		    0xe1, 0xd9, 0xc2, 0xaf, 0x55, 0x89, 0xeb, 0xb4,
+		    0x12, 0x59, 0xa8, 0xd4, 0xc5, 0x29, 0x66, 0x38,
+		    0xe6, 0xac, 0x22, 0x22, 0xd9, 0x64, 0x9b, 0x34,
+		    0x0a, 0x32, 0x9f, 0xc2, 0xbf, 0x17, 0x6c, 0x3f,
+		    0x71, 0x7a, 0x38, 0x6b, 0x98, 0xfb, 0x49, 0x36,
+		    0x89, 0xc9, 0xe2, 0xd6, 0xc7, 0x5d, 0xd0, 0x69,
+		    0x5f, 0x23, 0x35, 0xc9, 0x30, 0xe2, 0xfd, 0x44,
+		    0x58, 0x39, 0xd7, 0x97, 0xfb, 0x5c, 0x00, 0xd5,
+		    0x4f, 0x7a, 0x1a, 0x95, 0x8b, 0x62, 0x4b, 0xce,
+		    0xe5, 0x91, 0x21, 0x7b, 0x30, 0x00, 0xd6, 0xdd,
+		    0x6d, 0x02, 0x86, 0x49, 0x0f, 0x3c, 0x1a, 0x27,
+		    0x3c, 0xd3, 0x0e, 0x71, 0xf2, 0xff, 0xf5, 0x2f,
+		    0x87, 0xac, 0x67, 0x59, 0x81, 0xa3, 0xf7, 0xf8,
+		    0xd6, 0x11, 0x0c, 0x84, 0xa9, 0x03, 0xee, 0x2a,
+		    0xc4, 0xf3, 0x22, 0xab, 0x7c, 0xe2, 0x25, 0xf5,
+		    0x67, 0xa3, 0xe4, 0x11, 0xe0, 0x59, 0xb3, 0xca,
+		    0x87, 0xa0, 0xae, 0xc9, 0xa6, 0x62, 0x1b, 0x6e,
+		    0x4d, 0x02, 0x6b, 0x07, 0x9d, 0xfd, 0xd0, 0x92,
+		    0x06, 0xe1, 0xb2, 0x9a, 0x4a, 0x1f, 0x1f, 0x13,
+		    0x49, 0x99, 0x97, 0x08, 0xde, 0x7f, 0x98, 0xaf,
+		    0x51, 0x98, 0xee, 0x2c, 0xcb, 0xf0, 0x0b, 0xc6,
+		    0xb6, 0xb7, 0x2d, 0x9a, 0xb1, 0xac, 0xa6, 0xe3,
+		    0x15, 0x77, 0x9d, 0x6b, 0x1a, 0xe4, 0xfc, 0x8b,
+		    0xf2, 0x17, 0x59, 0x08, 0x04, 0x58, 0x81, 0x9d,
+		    0x1b, 0x1b, 0x69, 0x55, 0xc2, 0xb4, 0x3c, 0x1f,
+		    0x50, 0xf1, 0x7f, 0x77, 0x90, 0x4c, 0x66, 0x40,
+		    0x5a, 0xc0, 0x33, 0x1f, 0xcb, 0x05, 0x6d, 0x5c,
+		    0x06, 0x87, 0x52, 0xa2, 0x8f, 0x26, 0xd5, 0x4f },
+	.ilen	= 1024,
+	.result	= { 0xe5, 0x26, 0xa4, 0x3d, 0xbd, 0x33, 0xd0, 0x4b,
+		    0x6f, 0x05, 0xa7, 0x6e, 0x12, 0x7a, 0xd2, 0x74,
+		    0xa6, 0xdd, 0xbd, 0x95, 0xeb, 0xf9, 0xa4, 0xf1,
+		    0x59, 0x93, 0x91, 0x70, 0xd9, 0xfe, 0x9a, 0xcd,
+		    0x53, 0x1f, 0x3a, 0xab, 0xa6, 0x7c, 0x9f, 0xa6,
+		    0x9e, 0xbd, 0x99, 0xd9, 0xb5, 0x97, 0x44, 0xd5,
+		    0x14, 0x48, 0x4d, 0x9d, 0xc0, 0xd0, 0x05, 0x96,
+		    0xeb, 0x4c, 0x78, 0x55, 0x09, 0x08, 0x01, 0x02,
+		    0x30, 0x90, 0x7b, 0x96, 0x7a, 0x7b, 0x5f, 0x30,
+		    0x41, 0x24, 0xce, 0x68, 0x61, 0x49, 0x86, 0x57,
+		    0x82, 0xdd, 0x53, 0x1c, 0x51, 0x28, 0x2b, 0x53,
+		    0x6e, 0x2d, 0xc2, 0x20, 0x4c, 0xdd, 0x8f, 0x65,
+		    0x10, 0x20, 0x50, 0xdd, 0x9d, 0x50, 0xe5, 0x71,
+		    0x40, 0x53, 0x69, 0xfc, 0x77, 0x48, 0x11, 0xb9,
+		    0xde, 0xa4, 0x8d, 0x58, 0xe4, 0xa6, 0x1a, 0x18,
+		    0x47, 0x81, 0x7e, 0xfc, 0xdd, 0xf6, 0xef, 0xce,
+		    0x2f, 0x43, 0x68, 0xd6, 0x06, 0xe2, 0x74, 0x6a,
+		    0xad, 0x90, 0xf5, 0x37, 0xf3, 0x3d, 0x82, 0x69,
+		    0x40, 0xe9, 0x6b, 0xa7, 0x3d, 0xa8, 0x1e, 0xd2,
+		    0x02, 0x7c, 0xb7, 0x9b, 0xe4, 0xda, 0x8f, 0x95,
+		    0x06, 0xc5, 0xdf, 0x73, 0xa3, 0x20, 0x9a, 0x49,
+		    0xde, 0x9c, 0xbc, 0xee, 0x14, 0x3f, 0x81, 0x5e,
+		    0xf8, 0x3b, 0x59, 0x3c, 0xe1, 0x68, 0x12, 0x5a,
+		    0x3a, 0x76, 0x3a, 0x3f, 0xf7, 0x87, 0x33, 0x0a,
+		    0x01, 0xb8, 0xd4, 0xed, 0xb6, 0xbe, 0x94, 0x5e,
+		    0x70, 0x40, 0x56, 0x67, 0x1f, 0x50, 0x44, 0x19,
+		    0xce, 0x82, 0x70, 0x10, 0x87, 0x13, 0x20, 0x0b,
+		    0x4c, 0x5a, 0xb6, 0xf6, 0xa7, 0xae, 0x81, 0x75,
+		    0x01, 0x81, 0xe6, 0x4b, 0x57, 0x7c, 0xdd, 0x6d,
+		    0xf8, 0x1c, 0x29, 0x32, 0xf7, 0xda, 0x3c, 0x2d,
+		    0xf8, 0x9b, 0x25, 0x6e, 0x00, 0xb4, 0xf7, 0x2f,
+		    0xf7, 0x04, 0xf7, 0xa1, 0x56, 0xac, 0x4f, 0x1a,
+		    0x64, 0xb8, 0x47, 0x55, 0x18, 0x7b, 0x07, 0x4d,
+		    0xbd, 0x47, 0x24, 0x80, 0x5d, 0xa2, 0x70, 0xc5,
+		    0xdd, 0x8e, 0x82, 0xd4, 0xeb, 0xec, 0xb2, 0x0c,
+		    0x39, 0xd2, 0x97, 0xc1, 0xcb, 0xeb, 0xf4, 0x77,
+		    0x59, 0xb4, 0x87, 0xef, 0xcb, 0x43, 0x2d, 0x46,
+		    0x54, 0xd1, 0xa7, 0xd7, 0x15, 0x99, 0x0a, 0x43,
+		    0xa1, 0xe0, 0x99, 0x33, 0x71, 0xc1, 0xed, 0xfe,
+		    0x72, 0x46, 0x33, 0x8e, 0x91, 0x08, 0x9f, 0xc8,
+		    0x2e, 0xca, 0xfa, 0xdc, 0x59, 0xd5, 0xc3, 0x76,
+		    0x84, 0x9f, 0xa3, 0x37, 0x68, 0xc3, 0xf0, 0x47,
+		    0x2c, 0x68, 0xdb, 0x5e, 0xc3, 0x49, 0x4c, 0xe8,
+		    0x92, 0x85, 0xe2, 0x23, 0xd3, 0x3f, 0xad, 0x32,
+		    0xe5, 0x2b, 0x82, 0xd7, 0x8f, 0x99, 0x0a, 0x59,
+		    0x5c, 0x45, 0xd9, 0xb4, 0x51, 0x52, 0xc2, 0xae,
+		    0xbf, 0x80, 0xcf, 0xc9, 0xc9, 0x51, 0x24, 0x2a,
+		    0x3b, 0x3a, 0x4d, 0xae, 0xeb, 0xbd, 0x22, 0xc3,
+		    0x0e, 0x0f, 0x59, 0x25, 0x92, 0x17, 0xe9, 0x74,
+		    0xc7, 0x8b, 0x70, 0x70, 0x36, 0x55, 0x95, 0x75,
+		    0x4b, 0xad, 0x61, 0x2b, 0x09, 0xbc, 0x82, 0xf2,
+		    0x6e, 0x94, 0x43, 0xae, 0xc3, 0xd5, 0xcd, 0x8e,
+		    0xfe, 0x5b, 0x9a, 0x88, 0x43, 0x01, 0x75, 0xb2,
+		    0x23, 0x09, 0xf7, 0x89, 0x83, 0xe7, 0xfa, 0xf9,
+		    0xb4, 0x9b, 0xf8, 0xef, 0xbd, 0x1c, 0x92, 0xc1,
+		    0xda, 0x7e, 0xfe, 0x05, 0xba, 0x5a, 0xcd, 0x07,
+		    0x6a, 0x78, 0x9e, 0x5d, 0xfb, 0x11, 0x2f, 0x79,
+		    0x38, 0xb6, 0xc2, 0x5b, 0x6b, 0x51, 0xb4, 0x71,
+		    0xdd, 0xf7, 0x2a, 0xe4, 0xf4, 0x72, 0x76, 0xad,
+		    0xc2, 0xdd, 0x64, 0x5d, 0x79, 0xb6, 0xf5, 0x7a,
+		    0x77, 0x20, 0x05, 0x3d, 0x30, 0x06, 0xd4, 0x4c,
+		    0x0a, 0x2c, 0x98, 0x5a, 0xb9, 0xd4, 0x98, 0xa9,
+		    0x3f, 0xc6, 0x12, 0xea, 0x3b, 0x4b, 0xc5, 0x79,
+		    0x64, 0x63, 0x6b, 0x09, 0x54, 0x3b, 0x14, 0x27,
+		    0xba, 0x99, 0x80, 0xc8, 0x72, 0xa8, 0x12, 0x90,
+		    0x29, 0xba, 0x40, 0x54, 0x97, 0x2b, 0x7b, 0xfe,
+		    0xeb, 0xcd, 0x01, 0x05, 0x44, 0x72, 0xdb, 0x99,
+		    0xe4, 0x61, 0xc9, 0x69, 0xd6, 0xb9, 0x28, 0xd1,
+		    0x05, 0x3e, 0xf9, 0x0b, 0x49, 0x0a, 0x49, 0xe9,
+		    0x8d, 0x0e, 0xa7, 0x4a, 0x0f, 0xaf, 0x32, 0xd0,
+		    0xe0, 0xb2, 0x3a, 0x55, 0x58, 0xfe, 0x5c, 0x28,
+		    0x70, 0x51, 0x23, 0xb0, 0x7b, 0x6a, 0x5f, 0x1e,
+		    0xb8, 0x17, 0xd7, 0x94, 0x15, 0x8f, 0xee, 0x20,
+		    0xc7, 0x42, 0x25, 0x3e, 0x9a, 0x14, 0xd7, 0x60,
+		    0x72, 0x39, 0x47, 0x48, 0xa9, 0xfe, 0xdd, 0x47,
+		    0x0a, 0xb1, 0xe6, 0x60, 0x28, 0x8c, 0x11, 0x68,
+		    0xe1, 0xff, 0xd7, 0xce, 0xc8, 0xbe, 0xb3, 0xfe,
+		    0x27, 0x30, 0x09, 0x70, 0xd7, 0xfa, 0x02, 0x33,
+		    0x3a, 0x61, 0x2e, 0xc7, 0xff, 0xa4, 0x2a, 0xa8,
+		    0x6e, 0xb4, 0x79, 0x35, 0x6d, 0x4c, 0x1e, 0x38,
+		    0xf8, 0xee, 0xd4, 0x84, 0x4e, 0x6e, 0x28, 0xa7,
+		    0xce, 0xc8, 0xc1, 0xcf, 0x80, 0x05, 0xf3, 0x04,
+		    0xef, 0xc8, 0x18, 0x28, 0x2e, 0x8d, 0x5e, 0x0c,
+		    0xdf, 0xb8, 0x5f, 0x96, 0xe8, 0xc6, 0x9c, 0x2f,
+		    0xe5, 0xa6, 0x44, 0xd7, 0xe7, 0x99, 0x44, 0x0c,
+		    0xec, 0xd7, 0x05, 0x60, 0x97, 0xbb, 0x74, 0x77,
+		    0x58, 0xd5, 0xbb, 0x48, 0xde, 0x5a, 0xb2, 0x54,
+		    0x7f, 0x0e, 0x46, 0x70, 0x6a, 0x6f, 0x78, 0xa5,
+		    0x08, 0x89, 0x05, 0x4e, 0x7e, 0xa0, 0x69, 0xb4,
+		    0x40, 0x60, 0x55, 0x77, 0x75, 0x9b, 0x19, 0xf2,
+		    0xd5, 0x13, 0x80, 0x77, 0xf9, 0x4b, 0x3f, 0x1e,
+		    0xee, 0xe6, 0x76, 0x84, 0x7b, 0x8c, 0xe5, 0x27,
+		    0xa8, 0x0a, 0x91, 0x01, 0x68, 0x71, 0x8a, 0x3f,
+		    0x06, 0xab, 0xf6, 0xa9, 0xa5, 0xe6, 0x72, 0x92,
+		    0xe4, 0x67, 0xe2, 0xa2, 0x46, 0x35, 0x84, 0x55,
+		    0x7d, 0xca, 0xa8, 0x85, 0xd0, 0xf1, 0x3f, 0xbe,
+		    0xd7, 0x34, 0x64, 0xfc, 0xae, 0xe3, 0xe4, 0x04,
+		    0x9f, 0x66, 0x02, 0xb9, 0x88, 0x10, 0xd9, 0xc4,
+		    0x4c, 0x31, 0x43, 0x7a, 0x93, 0xe2, 0x9b, 0x56,
+		    0x43, 0x84, 0xdc, 0xdc, 0xde, 0x1d, 0xa4, 0x02,
+		    0x0e, 0xc2, 0xef, 0xc3, 0xf8, 0x78, 0xd1, 0xb2,
+		    0x6b, 0x63, 0x18, 0xc9, 0xa9, 0xe5, 0x72, 0xd8,
+		    0xf3, 0xb9, 0xd1, 0x8a, 0xc7, 0x1a, 0x02, 0x27,
+		    0x20, 0x77, 0x10, 0xe5, 0xc8, 0xd4, 0x4a, 0x47,
+		    0xe5, 0xdf, 0x5f, 0x01, 0xaa, 0xb0, 0xd4, 0x10,
+		    0xbb, 0x69, 0xe3, 0x36, 0xc8, 0xe1, 0x3d, 0x43,
+		    0xfb, 0x86, 0xcd, 0xcc, 0xbf, 0xf4, 0x88, 0xe0,
+		    0x20, 0xca, 0xb7, 0x1b, 0xf1, 0x2f, 0x5c, 0xee,
+		    0xd4, 0xd3, 0xa3, 0xcc, 0xa4, 0x1e, 0x1c, 0x47,
+		    0xfb, 0xbf, 0xfc, 0xa2, 0x41, 0x55, 0x9d, 0xf6,
+		    0x5a, 0x5e, 0x65, 0x32, 0x34, 0x7b, 0x52, 0x8d,
+		    0xd5, 0xd0, 0x20, 0x60, 0x03, 0xab, 0x3f, 0x8c,
+		    0xd4, 0x21, 0xea, 0x2a, 0xd9, 0xc4, 0xd0, 0xd3,
+		    0x65, 0xd8, 0x7a, 0x13, 0x28, 0x62, 0x32, 0x4b,
+		    0x2c, 0x87, 0x93, 0xa8, 0xb4, 0x52, 0x45, 0x09,
+		    0x44, 0xec, 0xec, 0xc3, 0x17, 0xdb, 0x9a, 0x4d,
+		    0x5c, 0xa9, 0x11, 0xd4, 0x7d, 0xaf, 0x9e, 0xf1,
+		    0x2d, 0xb2, 0x66, 0xc5, 0x1d, 0xed, 0xb7, 0xcd,
+		    0x0b, 0x25, 0x5e, 0x30, 0x47, 0x3f, 0x40, 0xf4,
+		    0xa1, 0xa0, 0x00, 0x94, 0x10, 0xc5, 0x6a, 0x63,
+		    0x1a, 0xd5, 0x88, 0x92, 0x8e, 0x82, 0x39, 0x87,
+		    0x3c, 0x78, 0x65, 0x58, 0x42, 0x75, 0x5b, 0xdd,
+		    0x77, 0x3e, 0x09, 0x4e, 0x76, 0x5b, 0xe6, 0x0e,
+		    0x4d, 0x38, 0xb2, 0xc0, 0xb8, 0x95, 0x01, 0x7a,
+		    0x10, 0xe0, 0xfb, 0x07, 0xf2, 0xab, 0x2d, 0x8c,
+		    0x32, 0xed, 0x2b, 0xc0, 0x46, 0xc2, 0xf5, 0x38,
+		    0x83, 0xf0, 0x17, 0xec, 0xc1, 0x20, 0x6a, 0x9a,
+		    0x0b, 0x00, 0xa0, 0x98, 0x22, 0x50, 0x23, 0xd5,
+		    0x80, 0x6b, 0xf6, 0x1f, 0xc3, 0xcc, 0x97, 0xc9,
+		    0x24, 0x9f, 0xf3, 0xaf, 0x43, 0x14, 0xd5, 0xa0 }
+}, {
+	.key	= { 0x35, 0x4e, 0xb5, 0x70, 0x50, 0x42, 0x8a, 0x85,
+		    0xf2, 0xfb, 0xed, 0x7b, 0xd0, 0x9e, 0x97, 0xca,
+		    0xfa, 0x98, 0x66, 0x63, 0xee, 0x37, 0xcc, 0x52,
+		    0xfe, 0xd1, 0xdf, 0x95, 0x15, 0x34, 0x29, 0x38 },
+	.nonce	= { 0xfd, 0x87, 0xd4, 0xd8, 0x62, 0xfd, 0xec, 0xaa },
+	.nlen	= 8,
+	.assoc	= { 0xd6, 0x31, 0xda, 0x5d, 0x42, 0x5e, 0xd7 },
+	.alen	= 7,
+	.input	= { 0x7a, 0x57, 0xf2, 0xc7, 0x06, 0x3f, 0x50, 0x7b,
+		    0x36, 0x1a, 0x66, 0x5c, 0xb9, 0x0e, 0x5e, 0x3b,
+		    0x45, 0x60, 0xbe, 0x9a, 0x31, 0x9f, 0xff, 0x5d,
+		    0x66, 0x34, 0xb4, 0xdc, 0xfb, 0x9d, 0x8e, 0xee,
+		    0x6a, 0x33, 0xa4, 0x07, 0x3c, 0xf9, 0x4c, 0x30,
+		    0xa1, 0x24, 0x52, 0xf9, 0x50, 0x46, 0x88, 0x20,
+		    0x02, 0x32, 0x3a, 0x0e, 0x99, 0x63, 0xaf, 0x1f,
+		    0x15, 0x28, 0x2a, 0x05, 0xff, 0x57, 0x59, 0x5e,
+		    0x18, 0xa1, 0x1f, 0xd0, 0x92, 0x5c, 0x88, 0x66,
+		    0x1b, 0x00, 0x64, 0xa5, 0x93, 0x8d, 0x06, 0x46,
+		    0xb0, 0x64, 0x8b, 0x8b, 0xef, 0x99, 0x05, 0x35,
+		    0x85, 0xb3, 0xf3, 0x33, 0xbb, 0xec, 0x66, 0xb6,
+		    0x3d, 0x57, 0x42, 0xe3, 0xb4, 0xc6, 0xaa, 0xb0,
+		    0x41, 0x2a, 0xb9, 0x59, 0xa9, 0xf6, 0x3e, 0x15,
+		    0x26, 0x12, 0x03, 0x21, 0x4c, 0x74, 0x43, 0x13,
+		    0x2a, 0x03, 0x27, 0x09, 0xb4, 0xfb, 0xe7, 0xb7,
+		    0x40, 0xff, 0x5e, 0xce, 0x48, 0x9a, 0x60, 0xe3,
+		    0x8b, 0x80, 0x8c, 0x38, 0x2d, 0xcb, 0x93, 0x37,
+		    0x74, 0x05, 0x52, 0x6f, 0x73, 0x3e, 0xc3, 0xbc,
+		    0xca, 0x72, 0x0a, 0xeb, 0xf1, 0x3b, 0xa0, 0x95,
+		    0xdc, 0x8a, 0xc4, 0xa9, 0xdc, 0xca, 0x44, 0xd8,
+		    0x08, 0x63, 0x6a, 0x36, 0xd3, 0x3c, 0xb8, 0xac,
+		    0x46, 0x7d, 0xfd, 0xaa, 0xeb, 0x3e, 0x0f, 0x45,
+		    0x8f, 0x49, 0xda, 0x2b, 0xf2, 0x12, 0xbd, 0xaf,
+		    0x67, 0x8a, 0x63, 0x48, 0x4b, 0x55, 0x5f, 0x6d,
+		    0x8c, 0xb9, 0x76, 0x34, 0x84, 0xae, 0xc2, 0xfc,
+		    0x52, 0x64, 0x82, 0xf7, 0xb0, 0x06, 0xf0, 0x45,
+		    0x73, 0x12, 0x50, 0x30, 0x72, 0xea, 0x78, 0x9a,
+		    0xa8, 0xaf, 0xb5, 0xe3, 0xbb, 0x77, 0x52, 0xec,
+		    0x59, 0x84, 0xbf, 0x6b, 0x8f, 0xce, 0x86, 0x5e,
+		    0x1f, 0x23, 0xe9, 0xfb, 0x08, 0x86, 0xf7, 0x10,
+		    0xb9, 0xf2, 0x44, 0x96, 0x44, 0x63, 0xa9, 0xa8,
+		    0x78, 0x00, 0x23, 0xd6, 0xc7, 0xe7, 0x6e, 0x66,
+		    0x4f, 0xcc, 0xee, 0x15, 0xb3, 0xbd, 0x1d, 0xa0,
+		    0xe5, 0x9c, 0x1b, 0x24, 0x2c, 0x4d, 0x3c, 0x62,
+		    0x35, 0x9c, 0x88, 0x59, 0x09, 0xdd, 0x82, 0x1b,
+		    0xcf, 0x0a, 0x83, 0x6b, 0x3f, 0xae, 0x03, 0xc4,
+		    0xb4, 0xdd, 0x7e, 0x5b, 0x28, 0x76, 0x25, 0x96,
+		    0xd9, 0xc9, 0x9d, 0x5f, 0x86, 0xfa, 0xf6, 0xd7,
+		    0xd2, 0xe6, 0x76, 0x1d, 0x0f, 0xa1, 0xdc, 0x74,
+		    0x05, 0x1b, 0x1d, 0xe0, 0xcd, 0x16, 0xb0, 0xa8,
+		    0x8a, 0x34, 0x7b, 0x15, 0x11, 0x77, 0xe5, 0x7b,
+		    0x7e, 0x20, 0xf7, 0xda, 0x38, 0xda, 0xce, 0x70,
+		    0xe9, 0xf5, 0x6c, 0xd9, 0xbe, 0x0c, 0x4c, 0x95,
+		    0x4c, 0xc2, 0x9b, 0x34, 0x55, 0x55, 0xe1, 0xf3,
+		    0x46, 0x8e, 0x48, 0x74, 0x14, 0x4f, 0x9d, 0xc9,
+		    0xf5, 0xe8, 0x1a, 0xf0, 0x11, 0x4a, 0xc1, 0x8d,
+		    0xe0, 0x93, 0xa0, 0xbe, 0x09, 0x1c, 0x2b, 0x4e,
+		    0x0f, 0xb2, 0x87, 0x8b, 0x84, 0xfe, 0x92, 0x32,
+		    0x14, 0xd7, 0x93, 0xdf, 0xe7, 0x44, 0xbc, 0xc5,
+		    0xae, 0x53, 0x69, 0xd8, 0xb3, 0x79, 0x37, 0x80,
+		    0xe3, 0x17, 0x5c, 0xec, 0x53, 0x00, 0x9a, 0xe3,
+		    0x8e, 0xdc, 0x38, 0xb8, 0x66, 0xf0, 0xd3, 0xad,
+		    0x1d, 0x02, 0x96, 0x86, 0x3e, 0x9d, 0x3b, 0x5d,
+		    0xa5, 0x7f, 0x21, 0x10, 0xf1, 0x1f, 0x13, 0x20,
+		    0xf9, 0x57, 0x87, 0x20, 0xf5, 0x5f, 0xf1, 0x17,
+		    0x48, 0x0a, 0x51, 0x5a, 0xcd, 0x19, 0x03, 0xa6,
+		    0x5a, 0xd1, 0x12, 0x97, 0xe9, 0x48, 0xe2, 0x1d,
+		    0x83, 0x75, 0x50, 0xd9, 0x75, 0x7d, 0x6a, 0x82,
+		    0xa1, 0xf9, 0x4e, 0x54, 0x87, 0x89, 0xc9, 0x0c,
+		    0xb7, 0x5b, 0x6a, 0x91, 0xc1, 0x9c, 0xb2, 0xa9,
+		    0xdc, 0x9a, 0xa4, 0x49, 0x0a, 0x6d, 0x0d, 0xbb,
+		    0xde, 0x86, 0x44, 0xdd, 0x5d, 0x89, 0x2b, 0x96,
+		    0x0f, 0x23, 0x95, 0xad, 0xcc, 0xa2, 0xb3, 0xb9,
+		    0x7e, 0x74, 0x38, 0xba, 0x9f, 0x73, 0xae, 0x5f,
+		    0xf8, 0x68, 0xa2, 0xe0, 0xa9, 0xce, 0xbd, 0x40,
+		    0xd4, 0x4c, 0x6b, 0xd2, 0x56, 0x62, 0xb0, 0xcc,
+		    0x63, 0x7e, 0x5b, 0xd3, 0xae, 0xd1, 0x75, 0xce,
+		    0xbb, 0xb4, 0x5b, 0xa8, 0xf8, 0xb4, 0xac, 0x71,
+		    0x75, 0xaa, 0xc9, 0x9f, 0xbb, 0x6c, 0xad, 0x0f,
+		    0x55, 0x5d, 0xe8, 0x85, 0x7d, 0xf9, 0x21, 0x35,
+		    0xea, 0x92, 0x85, 0x2b, 0x00, 0xec, 0x84, 0x90,
+		    0x0a, 0x63, 0x96, 0xe4, 0x6b, 0xa9, 0x77, 0xb8,
+		    0x91, 0xf8, 0x46, 0x15, 0x72, 0x63, 0x70, 0x01,
+		    0x40, 0xa3, 0xa5, 0x76, 0x62, 0x2b, 0xbf, 0xf1,
+		    0xe5, 0x8d, 0x9f, 0xa3, 0xfa, 0x9b, 0x03, 0xbe,
+		    0xfe, 0x65, 0x6f, 0xa2, 0x29, 0x0d, 0x54, 0xb4,
+		    0x71, 0xce, 0xa9, 0xd6, 0x3d, 0x88, 0xf9, 0xaf,
+		    0x6b, 0xa8, 0x9e, 0xf4, 0x16, 0x96, 0x36, 0xb9,
+		    0x00, 0xdc, 0x10, 0xab, 0xb5, 0x08, 0x31, 0x1f,
+		    0x00, 0xb1, 0x3c, 0xd9, 0x38, 0x3e, 0xc6, 0x04,
+		    0xa7, 0x4e, 0xe8, 0xae, 0xed, 0x98, 0xc2, 0xf7,
+		    0xb9, 0x00, 0x5f, 0x8c, 0x60, 0xd1, 0xe5, 0x15,
+		    0xf7, 0xae, 0x1e, 0x84, 0x88, 0xd1, 0xf6, 0xbc,
+		    0x3a, 0x89, 0x35, 0x22, 0x83, 0x7c, 0xca, 0xf0,
+		    0x33, 0x82, 0x4c, 0x79, 0x3c, 0xfd, 0xb1, 0xae,
+		    0x52, 0x62, 0x55, 0xd2, 0x41, 0x60, 0xc6, 0xbb,
+		    0xfa, 0x0e, 0x59, 0xd6, 0xa8, 0xfe, 0x5d, 0xed,
+		    0x47, 0x3d, 0xe0, 0xea, 0x1f, 0x6e, 0x43, 0x51,
+		    0xec, 0x10, 0x52, 0x56, 0x77, 0x42, 0x6b, 0x52,
+		    0x87, 0xd8, 0xec, 0xe0, 0xaa, 0x76, 0xa5, 0x84,
+		    0x2a, 0x22, 0x24, 0xfd, 0x92, 0x40, 0x88, 0xd5,
+		    0x85, 0x1c, 0x1f, 0x6b, 0x47, 0xa0, 0xc4, 0xe4,
+		    0xef, 0xf4, 0xea, 0xd7, 0x59, 0xac, 0x2a, 0x9e,
+		    0x8c, 0xfa, 0x1f, 0x42, 0x08, 0xfe, 0x4f, 0x74,
+		    0xa0, 0x26, 0xf5, 0xb3, 0x84, 0xf6, 0x58, 0x5f,
+		    0x26, 0x66, 0x3e, 0xd7, 0xe4, 0x22, 0x91, 0x13,
+		    0xc8, 0xac, 0x25, 0x96, 0x23, 0xd8, 0x09, 0xea,
+		    0x45, 0x75, 0x23, 0xb8, 0x5f, 0xc2, 0x90, 0x8b,
+		    0x09, 0xc4, 0xfc, 0x47, 0x6c, 0x6d, 0x0a, 0xef,
+		    0x69, 0xa4, 0x38, 0x19, 0xcf, 0x7d, 0xf9, 0x09,
+		    0x73, 0x9b, 0x60, 0x5a, 0xf7, 0x37, 0xb5, 0xfe,
+		    0x9f, 0xe3, 0x2b, 0x4c, 0x0d, 0x6e, 0x19, 0xf1,
+		    0xd6, 0xc0, 0x70, 0xf3, 0x9d, 0x22, 0x3c, 0xf9,
+		    0x49, 0xce, 0x30, 0x8e, 0x44, 0xb5, 0x76, 0x15,
+		    0x8f, 0x52, 0xfd, 0xa5, 0x04, 0xb8, 0x55, 0x6a,
+		    0x36, 0x59, 0x7c, 0xc4, 0x48, 0xb8, 0xd7, 0xab,
+		    0x05, 0x66, 0xe9, 0x5e, 0x21, 0x6f, 0x6b, 0x36,
+		    0x29, 0xbb, 0xe9, 0xe3, 0xa2, 0x9a, 0xa8, 0xcd,
+		    0x55, 0x25, 0x11, 0xba, 0x5a, 0x58, 0xa0, 0xde,
+		    0xae, 0x19, 0x2a, 0x48, 0x5a, 0xff, 0x36, 0xcd,
+		    0x6d, 0x16, 0x7a, 0x73, 0x38, 0x46, 0xe5, 0x47,
+		    0x59, 0xc8, 0xa2, 0xf6, 0xe2, 0x6c, 0x83, 0xc5,
+		    0x36, 0x2c, 0x83, 0x7d, 0xb4, 0x01, 0x05, 0x69,
+		    0xe7, 0xaf, 0x5c, 0xc4, 0x64, 0x82, 0x12, 0x21,
+		    0xef, 0xf7, 0xd1, 0x7d, 0xb8, 0x8d, 0x8c, 0x98,
+		    0x7c, 0x5f, 0x7d, 0x92, 0x88, 0xb9, 0x94, 0x07,
+		    0x9c, 0xd8, 0xe9, 0x9c, 0x17, 0x38, 0xe3, 0x57,
+		    0x6c, 0xe0, 0xdc, 0xa5, 0x92, 0x42, 0xb3, 0xbd,
+		    0x50, 0xa2, 0x7e, 0xb5, 0xb1, 0x52, 0x72, 0x03,
+		    0x97, 0xd8, 0xaa, 0x9a, 0x1e, 0x75, 0x41, 0x11,
+		    0xa3, 0x4f, 0xcc, 0xd4, 0xe3, 0x73, 0xad, 0x96,
+		    0xdc, 0x47, 0x41, 0x9f, 0xb0, 0xbe, 0x79, 0x91,
+		    0xf5, 0xb6, 0x18, 0xfe, 0xc2, 0x83, 0x18, 0x7d,
+		    0x73, 0xd9, 0x4f, 0x83, 0x84, 0x03, 0xb3, 0xf0,
+		    0x77, 0x66, 0x3d, 0x83, 0x63, 0x2e, 0x2c, 0xf9,
+		    0xdd, 0xa6, 0x1f, 0x89, 0x82, 0xb8, 0x23, 0x42,
+		    0xeb, 0xe2, 0xca, 0x70, 0x82, 0x61, 0x41, 0x0a,
+		    0x6d, 0x5f, 0x75, 0xc5, 0xe2, 0xc4, 0x91, 0x18,
+		    0x44, 0x22, 0xfa, 0x34, 0x10, 0xf5, 0x20, 0xdc,
+		    0xb7, 0xdd, 0x2a, 0x20, 0x77, 0xf5, 0xf9, 0xce,
+		    0xdb, 0xa0, 0x0a, 0x52, 0x2a, 0x4e, 0xdd, 0xcc,
+		    0x97, 0xdf, 0x05, 0xe4, 0x5e, 0xb7, 0xaa, 0xf0,
+		    0xe2, 0x80, 0xff, 0xba, 0x1a, 0x0f, 0xac, 0xdf,
+		    0x02, 0x32, 0xe6, 0xf7, 0xc7, 0x17, 0x13, 0xb7,
+		    0xfc, 0x98, 0x48, 0x8c, 0x0d, 0x82, 0xc9, 0x80,
+		    0x7a, 0xe2, 0x0a, 0xc5, 0xb4, 0xde, 0x7c, 0x3c,
+		    0x79, 0x81, 0x0e, 0x28, 0x65, 0x79, 0x67, 0x82,
+		    0x69, 0x44, 0x66, 0x09, 0xf7, 0x16, 0x1a, 0xf9,
+		    0x7d, 0x80, 0xa1, 0x79, 0x14, 0xa9, 0xc8, 0x20,
+		    0xfb, 0xa2, 0x46, 0xbe, 0x08, 0x35, 0x17, 0x58,
+		    0xc1, 0x1a, 0xda, 0x2a, 0x6b, 0x2e, 0x1e, 0xe6,
+		    0x27, 0x55, 0x7b, 0x19, 0xe2, 0xfb, 0x64, 0xfc,
+		    0x5e, 0x15, 0x54, 0x3c, 0xe7, 0xc2, 0x11, 0x50,
+		    0x30, 0xb8, 0x72, 0x03, 0x0b, 0x1a, 0x9f, 0x86,
+		    0x27, 0x11, 0x5c, 0x06, 0x2b, 0xbd, 0x75, 0x1a,
+		    0x0a, 0xda, 0x01, 0xfa, 0x5c, 0x4a, 0xc1, 0x80,
+		    0x3a, 0x6e, 0x30, 0xc8, 0x2c, 0xeb, 0x56, 0xec,
+		    0x89, 0xfa, 0x35, 0x7b, 0xb2, 0xf0, 0x97, 0x08,
+		    0x86, 0x53, 0xbe, 0xbd, 0x40, 0x41, 0x38, 0x1c,
+		    0xb4, 0x8b, 0x79, 0x2e, 0x18, 0x96, 0x94, 0xde,
+		    0xe8, 0xca, 0xe5, 0x9f, 0x92, 0x9f, 0x15, 0x5d,
+		    0x56, 0x60, 0x5c, 0x09, 0xf9, 0x16, 0xf4, 0x17,
+		    0x0f, 0xf6, 0x4c, 0xda, 0xe6, 0x67, 0x89, 0x9f,
+		    0xca, 0x6c, 0xe7, 0x9b, 0x04, 0x62, 0x0e, 0x26,
+		    0xa6, 0x52, 0xbd, 0x29, 0xff, 0xc7, 0xa4, 0x96,
+		    0xe6, 0x6a, 0x02, 0xa5, 0x2e, 0x7b, 0xfe, 0x97,
+		    0x68, 0x3e, 0x2e, 0x5f, 0x3b, 0x0f, 0x36, 0xd6,
+		    0x98, 0x19, 0x59, 0x48, 0xd2, 0xc6, 0xe1, 0x55,
+		    0x1a, 0x6e, 0xd6, 0xed, 0x2c, 0xba, 0xc3, 0x9e,
+		    0x64, 0xc9, 0x95, 0x86, 0x35, 0x5e, 0x3e, 0x88,
+		    0x69, 0x99, 0x4b, 0xee, 0xbe, 0x9a, 0x99, 0xb5,
+		    0x6e, 0x58, 0xae, 0xdd, 0x22, 0xdb, 0xdd, 0x6b,
+		    0xfc, 0xaf, 0x90, 0xa3, 0x3d, 0xa4, 0xc1, 0x15,
+		    0x92, 0x18, 0x8d, 0xd2, 0x4b, 0x7b, 0x06, 0xd1,
+		    0x37, 0xb5, 0xe2, 0x7c, 0x2c, 0xf0, 0x25, 0xe4,
+		    0x94, 0x2a, 0xbd, 0xe3, 0x82, 0x70, 0x78, 0xa3,
+		    0x82, 0x10, 0x5a, 0x90, 0xd7, 0xa4, 0xfa, 0xaf,
+		    0x1a, 0x88, 0x59, 0xdc, 0x74, 0x12, 0xb4, 0x8e,
+		    0xd7, 0x19, 0x46, 0xf4, 0x84, 0x69, 0x9f, 0xbb,
+		    0x70, 0xa8, 0x4c, 0x52, 0x81, 0xa9, 0xff, 0x76,
+		    0x1c, 0xae, 0xd8, 0x11, 0x3d, 0x7f, 0x7d, 0xc5,
+		    0x12, 0x59, 0x28, 0x18, 0xc2, 0xa2, 0xb7, 0x1c,
+		    0x88, 0xf8, 0xd6, 0x1b, 0xa6, 0x7d, 0x9e, 0xde,
+		    0x29, 0xf8, 0xed, 0xff, 0xeb, 0x92, 0x24, 0x4f,
+		    0x05, 0xaa, 0xd9, 0x49, 0xba, 0x87, 0x59, 0x51,
+		    0xc9, 0x20, 0x5c, 0x9b, 0x74, 0xcf, 0x03, 0xd9,
+		    0x2d, 0x34, 0xc7, 0x5b, 0xa5, 0x40, 0xb2, 0x99,
+		    0xf5, 0xcb, 0xb4, 0xf6, 0xb7, 0x72, 0x4a, 0xd6,
+		    0xbd, 0xb0, 0xf3, 0x93, 0xe0, 0x1b, 0xa8, 0x04,
+		    0x1e, 0x35, 0xd4, 0x80, 0x20, 0xf4, 0x9c, 0x31,
+		    0x6b, 0x45, 0xb9, 0x15, 0xb0, 0x5e, 0xdd, 0x0a,
+		    0x33, 0x9c, 0x83, 0xcd, 0x58, 0x89, 0x50, 0x56,
+		    0xbb, 0x81, 0x00, 0x91, 0x32, 0xf3, 0x1b, 0x3e,
+		    0xcf, 0x45, 0xe1, 0xf9, 0xe1, 0x2c, 0x26, 0x78,
+		    0x93, 0x9a, 0x60, 0x46, 0xc9, 0xb5, 0x5e, 0x6a,
+		    0x28, 0x92, 0x87, 0x3f, 0x63, 0x7b, 0xdb, 0xf7,
+		    0xd0, 0x13, 0x9d, 0x32, 0x40, 0x5e, 0xcf, 0xfb,
+		    0x79, 0x68, 0x47, 0x4c, 0xfd, 0x01, 0x17, 0xe6,
+		    0x97, 0x93, 0x78, 0xbb, 0xa6, 0x27, 0xa3, 0xe8,
+		    0x1a, 0xe8, 0x94, 0x55, 0x7d, 0x08, 0xe5, 0xdc,
+		    0x66, 0xa3, 0x69, 0xc8, 0xca, 0xc5, 0xa1, 0x84,
+		    0x55, 0xde, 0x08, 0x91, 0x16, 0x3a, 0x0c, 0x86,
+		    0xab, 0x27, 0x2b, 0x64, 0x34, 0x02, 0x6c, 0x76,
+		    0x8b, 0xc6, 0xaf, 0xcc, 0xe1, 0xd6, 0x8c, 0x2a,
+		    0x18, 0x3d, 0xa6, 0x1b, 0x37, 0x75, 0x45, 0x73,
+		    0xc2, 0x75, 0xd7, 0x53, 0x78, 0x3a, 0xd6, 0xe8,
+		    0x29, 0xd2, 0x4a, 0xa8, 0x1e, 0x82, 0xf6, 0xb6,
+		    0x81, 0xde, 0x21, 0xed, 0x2b, 0x56, 0xbb, 0xf2,
+		    0xd0, 0x57, 0xc1, 0x7c, 0xd2, 0x6a, 0xd2, 0x56,
+		    0xf5, 0x13, 0x5f, 0x1c, 0x6a, 0x0b, 0x74, 0xfb,
+		    0xe9, 0xfe, 0x9e, 0xea, 0x95, 0xb2, 0x46, 0xab,
+		    0x0a, 0xfc, 0xfd, 0xf3, 0xbb, 0x04, 0x2b, 0x76,
+		    0x1b, 0xa4, 0x74, 0xb0, 0xc1, 0x78, 0xc3, 0x69,
+		    0xe2, 0xb0, 0x01, 0xe1, 0xde, 0x32, 0x4c, 0x8d,
+		    0x1a, 0xb3, 0x38, 0x08, 0xd5, 0xfc, 0x1f, 0xdc,
+		    0x0e, 0x2c, 0x9c, 0xb1, 0xa1, 0x63, 0x17, 0x22,
+		    0xf5, 0x6c, 0x93, 0x70, 0x74, 0x00, 0xf8, 0x39,
+		    0x01, 0x94, 0xd1, 0x32, 0x23, 0x56, 0x5d, 0xa6,
+		    0x02, 0x76, 0x76, 0x93, 0xce, 0x2f, 0x19, 0xe9,
+		    0x17, 0x52, 0xae, 0x6e, 0x2c, 0x6d, 0x61, 0x7f,
+		    0x3b, 0xaa, 0xe0, 0x52, 0x85, 0xc5, 0x65, 0xc1,
+		    0xbb, 0x8e, 0x5b, 0x21, 0xd5, 0xc9, 0x78, 0x83,
+		    0x07, 0x97, 0x4c, 0x62, 0x61, 0x41, 0xd4, 0xfc,
+		    0xc9, 0x39, 0xe3, 0x9b, 0xd0, 0xcc, 0x75, 0xc4,
+		    0x97, 0xe6, 0xdd, 0x2a, 0x5f, 0xa6, 0xe8, 0x59,
+		    0x6c, 0x98, 0xb9, 0x02, 0xe2, 0xa2, 0xd6, 0x68,
+		    0xee, 0x3b, 0x1d, 0xe3, 0x4d, 0x5b, 0x30, 0xef,
+		    0x03, 0xf2, 0xeb, 0x18, 0x57, 0x36, 0xe8, 0xa1,
+		    0xf4, 0x47, 0xfb, 0xcb, 0x8f, 0xcb, 0xc8, 0xf3,
+		    0x4f, 0x74, 0x9d, 0x9d, 0xb1, 0x8d, 0x14, 0x44,
+		    0xd9, 0x19, 0xb4, 0x54, 0x4f, 0x75, 0x19, 0x09,
+		    0xa0, 0x75, 0xbc, 0x3b, 0x82, 0xc6, 0x3f, 0xb8,
+		    0x83, 0x19, 0x6e, 0xd6, 0x37, 0xfe, 0x6e, 0x8a,
+		    0x4e, 0xe0, 0x4a, 0xab, 0x7b, 0xc8, 0xb4, 0x1d,
+		    0xf4, 0xed, 0x27, 0x03, 0x65, 0xa2, 0xa1, 0xae,
+		    0x11, 0xe7, 0x98, 0x78, 0x48, 0x91, 0xd2, 0xd2,
+		    0xd4, 0x23, 0x78, 0x50, 0xb1, 0x5b, 0x85, 0x10,
+		    0x8d, 0xca, 0x5f, 0x0f, 0x71, 0xae, 0x72, 0x9a,
+		    0xf6, 0x25, 0x19, 0x60, 0x06, 0xf7, 0x10, 0x34,
+		    0x18, 0x0d, 0xc9, 0x9f, 0x7b, 0x0c, 0x9b, 0x8f,
+		    0x91, 0x1b, 0x9f, 0xcd, 0x10, 0xee, 0x75, 0xf9,
+		    0x97, 0x66, 0xfc, 0x4d, 0x33, 0x6e, 0x28, 0x2b,
+		    0x92, 0x85, 0x4f, 0xab, 0x43, 0x8d, 0x8f, 0x7d,
+		    0x86, 0xa7, 0xc7, 0xd8, 0xd3, 0x0b, 0x8b, 0x57,
+		    0xb6, 0x1d, 0x95, 0x0d, 0xe9, 0xbc, 0xd9, 0x03,
+		    0xd9, 0x10, 0x19, 0xc3, 0x46, 0x63, 0x55, 0x87,
+		    0x61, 0x79, 0x6c, 0x95, 0x0e, 0x9c, 0xdd, 0xca,
+		    0xc3, 0xf3, 0x64, 0xf0, 0x7d, 0x76, 0xb7, 0x53,
+		    0x67, 0x2b, 0x1e, 0x44, 0x56, 0x81, 0xea, 0x8f,
+		    0x5c, 0x42, 0x16, 0xb8, 0x28, 0xeb, 0x1b, 0x61,
+		    0x10, 0x1e, 0xbf, 0xec, 0xa8 },
+	.ilen	= 1933,
+	.result	= { 0x6a, 0xfc, 0x4b, 0x25, 0xdf, 0xc0, 0xe4, 0xe8,
+		    0x17, 0x4d, 0x4c, 0xc9, 0x7e, 0xde, 0x3a, 0xcc,
+		    0x3c, 0xba, 0x6a, 0x77, 0x47, 0xdb, 0xe3, 0x74,
+		    0x7a, 0x4d, 0x5f, 0x8d, 0x37, 0x55, 0x80, 0x73,
+		    0x90, 0x66, 0x5d, 0x3a, 0x7d, 0x5d, 0x86, 0x5e,
+		    0x8d, 0xfd, 0x83, 0xff, 0x4e, 0x74, 0x6f, 0xf9,
+		    0xe6, 0x70, 0x17, 0x70, 0x3e, 0x96, 0xa7, 0x7e,
+		    0xcb, 0xab, 0x8f, 0x58, 0x24, 0x9b, 0x01, 0xfd,
+		    0xcb, 0xe6, 0x4d, 0x9b, 0xf0, 0x88, 0x94, 0x57,
+		    0x66, 0xef, 0x72, 0x4c, 0x42, 0x6e, 0x16, 0x19,
+		    0x15, 0xea, 0x70, 0x5b, 0xac, 0x13, 0xdb, 0x9f,
+		    0x18, 0xe2, 0x3c, 0x26, 0x97, 0xbc, 0xdc, 0x45,
+		    0x8c, 0x6c, 0x24, 0x69, 0x9c, 0xf7, 0x65, 0x1e,
+		    0x18, 0x59, 0x31, 0x7c, 0xe4, 0x73, 0xbc, 0x39,
+		    0x62, 0xc6, 0x5c, 0x9f, 0xbf, 0xfa, 0x90, 0x03,
+		    0xc9, 0x72, 0x26, 0xb6, 0x1b, 0xc2, 0xb7, 0x3f,
+		    0xf2, 0x13, 0x77, 0xf2, 0x8d, 0xb9, 0x47, 0xd0,
+		    0x53, 0xdd, 0xc8, 0x91, 0x83, 0x8b, 0xb1, 0xce,
+		    0xa3, 0xfe, 0xcd, 0xd9, 0xdd, 0x92, 0x7b, 0xdb,
+		    0xb8, 0xfb, 0xc9, 0x2d, 0x01, 0x59, 0x39, 0x52,
+		    0xad, 0x1b, 0xec, 0xcf, 0xd7, 0x70, 0x13, 0x21,
+		    0xf5, 0x47, 0xaa, 0x18, 0x21, 0x5c, 0xc9, 0x9a,
+		    0xd2, 0x6b, 0x05, 0x9c, 0x01, 0xa1, 0xda, 0x35,
+		    0x5d, 0xb3, 0x70, 0xe6, 0xa9, 0x80, 0x8b, 0x91,
+		    0xb7, 0xb3, 0x5f, 0x24, 0x9a, 0xb7, 0xd1, 0x6b,
+		    0xa1, 0x1c, 0x50, 0xba, 0x49, 0xe0, 0xee, 0x2e,
+		    0x75, 0xac, 0x69, 0xc0, 0xeb, 0x03, 0xdd, 0x19,
+		    0xe5, 0xf6, 0x06, 0xdd, 0xc3, 0xd7, 0x2b, 0x07,
+		    0x07, 0x30, 0xa7, 0x19, 0x0c, 0xbf, 0xe6, 0x18,
+		    0xcc, 0xb1, 0x01, 0x11, 0x85, 0x77, 0x1d, 0x96,
+		    0xa7, 0xa3, 0x00, 0x84, 0x02, 0xa2, 0x83, 0x68,
+		    0xda, 0x17, 0x27, 0xc8, 0x7f, 0x23, 0xb7, 0xf4,
+		    0x13, 0x85, 0xcf, 0xdd, 0x7a, 0x7d, 0x24, 0x57,
+		    0xfe, 0x05, 0x93, 0xf5, 0x74, 0xce, 0xed, 0x0c,
+		    0x20, 0x98, 0x8d, 0x92, 0x30, 0xa1, 0x29, 0x23,
+		    0x1a, 0xa0, 0x4f, 0x69, 0x56, 0x4c, 0xe1, 0xc8,
+		    0xce, 0xf6, 0x9a, 0x0c, 0xa4, 0xfa, 0x04, 0xf6,
+		    0x62, 0x95, 0xf2, 0xfa, 0xc7, 0x40, 0x68, 0x40,
+		    0x8f, 0x41, 0xda, 0xb4, 0x26, 0x6f, 0x70, 0xab,
+		    0x40, 0x61, 0xa4, 0x0e, 0x75, 0xfb, 0x86, 0xeb,
+		    0x9d, 0x9a, 0x1f, 0xec, 0x76, 0x99, 0xe7, 0xea,
+		    0xaa, 0x1e, 0x2d, 0xb5, 0xd4, 0xa6, 0x1a, 0xb8,
+		    0x61, 0x0a, 0x1d, 0x16, 0x5b, 0x98, 0xc2, 0x31,
+		    0x40, 0xe7, 0x23, 0x1d, 0x66, 0x99, 0xc8, 0xc0,
+		    0xd7, 0xce, 0xf3, 0x57, 0x40, 0x04, 0x3f, 0xfc,
+		    0xea, 0xb3, 0xfc, 0xd2, 0xd3, 0x99, 0xa4, 0x94,
+		    0x69, 0xa0, 0xef, 0xd1, 0x85, 0xb3, 0xa6, 0xb1,
+		    0x28, 0xbf, 0x94, 0x67, 0x22, 0xc3, 0x36, 0x46,
+		    0xf8, 0xd2, 0x0f, 0x5f, 0xf4, 0x59, 0x80, 0xe6,
+		    0x2d, 0x43, 0x08, 0x7d, 0x19, 0x09, 0x97, 0xa7,
+		    0x4c, 0x3d, 0x8d, 0xba, 0x65, 0x62, 0xa3, 0x71,
+		    0x33, 0x29, 0x62, 0xdb, 0xc1, 0x33, 0x34, 0x1a,
+		    0x63, 0x33, 0x16, 0xb6, 0x64, 0x7e, 0xab, 0x33,
+		    0xf0, 0xe6, 0x26, 0x68, 0xba, 0x1d, 0x2e, 0x38,
+		    0x08, 0xe6, 0x02, 0xd3, 0x25, 0x2c, 0x47, 0x23,
+		    0x58, 0x34, 0x0f, 0x9d, 0x63, 0x4f, 0x63, 0xbb,
+		    0x7f, 0x3b, 0x34, 0x38, 0xa7, 0xb5, 0x8d, 0x65,
+		    0xd9, 0x9f, 0x79, 0x55, 0x3e, 0x4d, 0xe7, 0x73,
+		    0xd8, 0xf6, 0x98, 0x97, 0x84, 0x60, 0x9c, 0xc8,
+		    0xa9, 0x3c, 0xf6, 0xdc, 0x12, 0x5c, 0xe1, 0xbb,
+		    0x0b, 0x8b, 0x98, 0x9c, 0x9d, 0x26, 0x7c, 0x4a,
+		    0xe6, 0x46, 0x36, 0x58, 0x21, 0x4a, 0xee, 0xca,
+		    0xd7, 0x3b, 0xc2, 0x6c, 0x49, 0x2f, 0xe5, 0xd5,
+		    0x03, 0x59, 0x84, 0x53, 0xcb, 0xfe, 0x92, 0x71,
+		    0x2e, 0x7c, 0x21, 0xcc, 0x99, 0x85, 0x7f, 0xb8,
+		    0x74, 0x90, 0x13, 0x42, 0x3f, 0xe0, 0x6b, 0x1d,
+		    0xf2, 0x4d, 0x54, 0xd4, 0xfc, 0x3a, 0x05, 0xe6,
+		    0x74, 0xaf, 0xa6, 0xa0, 0x2a, 0x20, 0x23, 0x5d,
+		    0x34, 0x5c, 0xd9, 0x3e, 0x4e, 0xfa, 0x93, 0xe7,
+		    0xaa, 0xe9, 0x6f, 0x08, 0x43, 0x67, 0x41, 0xc5,
+		    0xad, 0xfb, 0x31, 0x95, 0x82, 0x73, 0x32, 0xd8,
+		    0xa6, 0xa3, 0xed, 0x0e, 0x2d, 0xf6, 0x5f, 0xfd,
+		    0x80, 0xa6, 0x7a, 0xe0, 0xdf, 0x78, 0x15, 0x29,
+		    0x74, 0x33, 0xd0, 0x9e, 0x83, 0x86, 0x72, 0x22,
+		    0x57, 0x29, 0xb9, 0x9e, 0x5d, 0xd3, 0x1a, 0xb5,
+		    0x96, 0x72, 0x41, 0x3d, 0xf1, 0x64, 0x43, 0x67,
+		    0xee, 0xaa, 0x5c, 0xd3, 0x9a, 0x96, 0x13, 0x11,
+		    0x5d, 0xf3, 0x0c, 0x87, 0x82, 0x1e, 0x41, 0x9e,
+		    0xd0, 0x27, 0xd7, 0x54, 0x3b, 0x67, 0x73, 0x09,
+		    0x91, 0xe9, 0xd5, 0x36, 0xa7, 0xb5, 0x55, 0xe4,
+		    0xf3, 0x21, 0x51, 0x49, 0x22, 0x07, 0x55, 0x4f,
+		    0x44, 0x4b, 0xd2, 0x15, 0x93, 0x17, 0x2a, 0xfa,
+		    0x4d, 0x4a, 0x57, 0xdb, 0x4c, 0xa6, 0xeb, 0xec,
+		    0x53, 0x25, 0x6c, 0x21, 0xed, 0x00, 0x4c, 0x3b,
+		    0xca, 0x14, 0x57, 0xa9, 0xd6, 0x6a, 0xcd, 0x8d,
+		    0x5e, 0x74, 0xac, 0x72, 0xc1, 0x97, 0xe5, 0x1b,
+		    0x45, 0x4e, 0xda, 0xfc, 0xcc, 0x40, 0xe8, 0x48,
+		    0x88, 0x0b, 0xa3, 0xe3, 0x8d, 0x83, 0x42, 0xc3,
+		    0x23, 0xfd, 0x68, 0xb5, 0x8e, 0xf1, 0x9d, 0x63,
+		    0x77, 0xe9, 0xa3, 0x8e, 0x8c, 0x26, 0x6b, 0xbd,
+		    0x72, 0x73, 0x35, 0x0c, 0x03, 0xf8, 0x43, 0x78,
+		    0x52, 0x71, 0x15, 0x1f, 0x71, 0x5d, 0x6e, 0xed,
+		    0xb9, 0xcc, 0x86, 0x30, 0xdb, 0x2b, 0xd3, 0x82,
+		    0x88, 0x23, 0x71, 0x90, 0x53, 0x5c, 0xa9, 0x2f,
+		    0x76, 0x01, 0xb7, 0x9a, 0xfe, 0x43, 0x55, 0xa3,
+		    0x04, 0x9b, 0x0e, 0xe4, 0x59, 0xdf, 0xc9, 0xe9,
+		    0xb1, 0xea, 0x29, 0x28, 0x3c, 0x5c, 0xae, 0x72,
+		    0x84, 0xb6, 0xc6, 0xeb, 0x0c, 0x27, 0x07, 0x74,
+		    0x90, 0x0d, 0x31, 0xb0, 0x00, 0x77, 0xe9, 0x40,
+		    0x70, 0x6f, 0x68, 0xa7, 0xfd, 0x06, 0xec, 0x4b,
+		    0xc0, 0xb7, 0xac, 0xbc, 0x33, 0xb7, 0x6d, 0x0a,
+		    0xbd, 0x12, 0x1b, 0x59, 0xcb, 0xdd, 0x32, 0xf5,
+		    0x1d, 0x94, 0x57, 0x76, 0x9e, 0x0c, 0x18, 0x98,
+		    0x71, 0xd7, 0x2a, 0xdb, 0x0b, 0x7b, 0xa7, 0x71,
+		    0xb7, 0x67, 0x81, 0x23, 0x96, 0xae, 0xb9, 0x7e,
+		    0x32, 0x43, 0x92, 0x8a, 0x19, 0xa0, 0xc4, 0xd4,
+		    0x3b, 0x57, 0xf9, 0x4a, 0x2c, 0xfb, 0x51, 0x46,
+		    0xbb, 0xcb, 0x5d, 0xb3, 0xef, 0x13, 0x93, 0x6e,
+		    0x68, 0x42, 0x54, 0x57, 0xd3, 0x6a, 0x3a, 0x8f,
+		    0x9d, 0x66, 0xbf, 0xbd, 0x36, 0x23, 0xf5, 0x93,
+		    0x83, 0x7b, 0x9c, 0xc0, 0xdd, 0xc5, 0x49, 0xc0,
+		    0x64, 0xed, 0x07, 0x12, 0xb3, 0xe6, 0xe4, 0xe5,
+		    0x38, 0x95, 0x23, 0xb1, 0xa0, 0x3b, 0x1a, 0x61,
+		    0xda, 0x17, 0xac, 0xc3, 0x58, 0xdd, 0x74, 0x64,
+		    0x22, 0x11, 0xe8, 0x32, 0x1d, 0x16, 0x93, 0x85,
+		    0x99, 0xa5, 0x9c, 0x34, 0x55, 0xb1, 0xe9, 0x20,
+		    0x72, 0xc9, 0x28, 0x7b, 0x79, 0x00, 0xa1, 0xa6,
+		    0xa3, 0x27, 0x40, 0x18, 0x8a, 0x54, 0xe0, 0xcc,
+		    0xe8, 0x4e, 0x8e, 0x43, 0x96, 0xe7, 0x3f, 0xc8,
+		    0xe9, 0xb2, 0xf9, 0xc9, 0xda, 0x04, 0x71, 0x50,
+		    0x47, 0xe4, 0xaa, 0xce, 0xa2, 0x30, 0xc8, 0xe4,
+		    0xac, 0xc7, 0x0d, 0x06, 0x2e, 0xe6, 0xe8, 0x80,
+		    0x36, 0x29, 0x9e, 0x01, 0xb8, 0xc3, 0xf0, 0xa0,
+		    0x5d, 0x7a, 0xca, 0x4d, 0xa0, 0x57, 0xbd, 0x2a,
+		    0x45, 0xa7, 0x7f, 0x9c, 0x93, 0x07, 0x8f, 0x35,
+		    0x67, 0x92, 0xe3, 0xe9, 0x7f, 0xa8, 0x61, 0x43,
+		    0x9e, 0x25, 0x4f, 0x33, 0x76, 0x13, 0x6e, 0x12,
+		    0xb9, 0xdd, 0xa4, 0x7c, 0x08, 0x9f, 0x7c, 0xe7,
+		    0x0a, 0x8d, 0x84, 0x06, 0xa4, 0x33, 0x17, 0x34,
+		    0x5e, 0x10, 0x7c, 0xc0, 0xa8, 0x3d, 0x1f, 0x42,
+		    0x20, 0x51, 0x65, 0x5d, 0x09, 0xc3, 0xaa, 0xc0,
+		    0xc8, 0x0d, 0xf0, 0x79, 0xbc, 0x20, 0x1b, 0x95,
+		    0xe7, 0x06, 0x7d, 0x47, 0x20, 0x03, 0x1a, 0x74,
+		    0xdd, 0xe2, 0xd4, 0xae, 0x38, 0x71, 0x9b, 0xf5,
+		    0x80, 0xec, 0x08, 0x4e, 0x56, 0xba, 0x76, 0x12,
+		    0x1a, 0xdf, 0x48, 0xf3, 0xae, 0xb3, 0xe6, 0xe6,
+		    0xbe, 0xc0, 0x91, 0x2e, 0x01, 0xb3, 0x01, 0x86,
+		    0xa2, 0xb9, 0x52, 0xd1, 0x21, 0xae, 0xd4, 0x97,
+		    0x1d, 0xef, 0x41, 0x12, 0x95, 0x3d, 0x48, 0x45,
+		    0x1c, 0x56, 0x32, 0x8f, 0xb8, 0x43, 0xbb, 0x19,
+		    0xf3, 0xca, 0xe9, 0xeb, 0x6d, 0x84, 0xbe, 0x86,
+		    0x06, 0xe2, 0x36, 0xb2, 0x62, 0x9d, 0xd3, 0x4c,
+		    0x48, 0x18, 0x54, 0x13, 0x4e, 0xcf, 0xfd, 0xba,
+		    0x84, 0xb9, 0x30, 0x53, 0xcf, 0xfb, 0xb9, 0x29,
+		    0x8f, 0xdc, 0x9f, 0xef, 0x60, 0x0b, 0x64, 0xf6,
+		    0x8b, 0xee, 0xa6, 0x91, 0xc2, 0x41, 0x6c, 0xf6,
+		    0xfa, 0x79, 0x67, 0x4b, 0xc1, 0x3f, 0xaf, 0x09,
+		    0x81, 0xd4, 0x5d, 0xcb, 0x09, 0xdf, 0x36, 0x31,
+		    0xc0, 0x14, 0x3c, 0x7c, 0x0e, 0x65, 0x95, 0x99,
+		    0x6d, 0xa3, 0xf4, 0xd7, 0x38, 0xee, 0x1a, 0x2b,
+		    0x37, 0xe2, 0xa4, 0x3b, 0x4b, 0xd0, 0x65, 0xca,
+		    0xf8, 0xc3, 0xe8, 0x15, 0x20, 0xef, 0xf2, 0x00,
+		    0xfd, 0x01, 0x09, 0xc5, 0xc8, 0x17, 0x04, 0x93,
+		    0xd0, 0x93, 0x03, 0x55, 0xc5, 0xfe, 0x32, 0xa3,
+		    0x3e, 0x28, 0x2d, 0x3b, 0x93, 0x8a, 0xcc, 0x07,
+		    0x72, 0x80, 0x8b, 0x74, 0x16, 0x24, 0xbb, 0xda,
+		    0x94, 0x39, 0x30, 0x8f, 0xb1, 0xcd, 0x4a, 0x90,
+		    0x92, 0x7c, 0x14, 0x8f, 0x95, 0x4e, 0xac, 0x9b,
+		    0xd8, 0x8f, 0x1a, 0x87, 0xa4, 0x32, 0x27, 0x8a,
+		    0xba, 0xf7, 0x41, 0xcf, 0x84, 0x37, 0x19, 0xe6,
+		    0x06, 0xf5, 0x0e, 0xcf, 0x36, 0xf5, 0x9e, 0x6c,
+		    0xde, 0xbc, 0xff, 0x64, 0x7e, 0x4e, 0x59, 0x57,
+		    0x48, 0xfe, 0x14, 0xf7, 0x9c, 0x93, 0x5d, 0x15,
+		    0xad, 0xcc, 0x11, 0xb1, 0x17, 0x18, 0xb2, 0x7e,
+		    0xcc, 0xab, 0xe9, 0xce, 0x7d, 0x77, 0x5b, 0x51,
+		    0x1b, 0x1e, 0x20, 0xa8, 0x32, 0x06, 0x0e, 0x75,
+		    0x93, 0xac, 0xdb, 0x35, 0x37, 0x1f, 0xe9, 0x19,
+		    0x1d, 0xb4, 0x71, 0x97, 0xd6, 0x4e, 0x2c, 0x08,
+		    0xa5, 0x13, 0xf9, 0x0e, 0x7e, 0x78, 0x6e, 0x14,
+		    0xe0, 0xa9, 0xb9, 0x96, 0x4c, 0x80, 0x82, 0xba,
+		    0x17, 0xb3, 0x9d, 0x69, 0xb0, 0x84, 0x46, 0xff,
+		    0xf9, 0x52, 0x79, 0x94, 0x58, 0x3a, 0x62, 0x90,
+		    0x15, 0x35, 0x71, 0x10, 0x37, 0xed, 0xa1, 0x8e,
+		    0x53, 0x6e, 0xf4, 0x26, 0x57, 0x93, 0x15, 0x93,
+		    0xf6, 0x81, 0x2c, 0x5a, 0x10, 0xda, 0x92, 0xad,
+		    0x2f, 0xdb, 0x28, 0x31, 0x2d, 0x55, 0x04, 0xd2,
+		    0x06, 0x28, 0x8c, 0x1e, 0xdc, 0xea, 0x54, 0xac,
+		    0xff, 0xb7, 0x6c, 0x30, 0x15, 0xd4, 0xb4, 0x0d,
+		    0x00, 0x93, 0x57, 0xdd, 0xd2, 0x07, 0x07, 0x06,
+		    0xd9, 0x43, 0x9b, 0xcd, 0x3a, 0xf4, 0x7d, 0x4c,
+		    0x36, 0x5d, 0x23, 0xa2, 0xcc, 0x57, 0x40, 0x91,
+		    0xe9, 0x2c, 0x2f, 0x2c, 0xd5, 0x30, 0x9b, 0x17,
+		    0xb0, 0xc9, 0xf7, 0xa7, 0x2f, 0xd1, 0x93, 0x20,
+		    0x6b, 0xc6, 0xc1, 0xe4, 0x6f, 0xcb, 0xd1, 0xe7,
+		    0x09, 0x0f, 0x9e, 0xdc, 0xaa, 0x9f, 0x2f, 0xdf,
+		    0x56, 0x9f, 0xd4, 0x33, 0x04, 0xaf, 0xd3, 0x6c,
+		    0x58, 0x61, 0xf0, 0x30, 0xec, 0xf2, 0x7f, 0xf2,
+		    0x9c, 0xdf, 0x39, 0xbb, 0x6f, 0xa2, 0x8c, 0x7e,
+		    0xc4, 0x22, 0x51, 0x71, 0xc0, 0x4d, 0x14, 0x1a,
+		    0xc4, 0xcd, 0x04, 0xd9, 0x87, 0x08, 0x50, 0x05,
+		    0xcc, 0xaf, 0xf6, 0xf0, 0x8f, 0x92, 0x54, 0x58,
+		    0xc2, 0xc7, 0x09, 0x7a, 0x59, 0x02, 0x05, 0xe8,
+		    0xb0, 0x86, 0xd9, 0xbf, 0x7b, 0x35, 0x51, 0x4d,
+		    0xaf, 0x08, 0x97, 0x2c, 0x65, 0xda, 0x2a, 0x71,
+		    0x3a, 0xa8, 0x51, 0xcc, 0xf2, 0x73, 0x27, 0xc3,
+		    0xfd, 0x62, 0xcf, 0xe3, 0xb2, 0xca, 0xcb, 0xbe,
+		    0x1a, 0x0a, 0xa1, 0x34, 0x7b, 0x77, 0xc4, 0x62,
+		    0x68, 0x78, 0x5f, 0x94, 0x07, 0x04, 0x65, 0x16,
+		    0x4b, 0x61, 0xcb, 0xff, 0x75, 0x26, 0x50, 0x66,
+		    0x1f, 0x6e, 0x93, 0xf8, 0xc5, 0x51, 0xeb, 0xa4,
+		    0x4a, 0x48, 0x68, 0x6b, 0xe2, 0x5e, 0x44, 0xb2,
+		    0x50, 0x2c, 0x6c, 0xae, 0x79, 0x4e, 0x66, 0x35,
+		    0x81, 0x50, 0xac, 0xbc, 0x3f, 0xb1, 0x0c, 0xf3,
+		    0x05, 0x3c, 0x4a, 0xa3, 0x6c, 0x2a, 0x79, 0xb4,
+		    0xb7, 0xab, 0xca, 0xc7, 0x9b, 0x8e, 0xcd, 0x5f,
+		    0x11, 0x03, 0xcb, 0x30, 0xa3, 0xab, 0xda, 0xfe,
+		    0x64, 0xb9, 0xbb, 0xd8, 0x5e, 0x3a, 0x1a, 0x56,
+		    0xe5, 0x05, 0x48, 0x90, 0x1e, 0x61, 0x69, 0x1b,
+		    0x22, 0xe6, 0x1a, 0x3c, 0x75, 0xad, 0x1f, 0x37,
+		    0x28, 0xdc, 0xe4, 0x6d, 0xbd, 0x42, 0xdc, 0xd3,
+		    0xc8, 0xb6, 0x1c, 0x48, 0xfe, 0x94, 0x77, 0x7f,
+		    0xbd, 0x62, 0xac, 0xa3, 0x47, 0x27, 0xcf, 0x5f,
+		    0xd9, 0xdb, 0xaf, 0xec, 0xf7, 0x5e, 0xc1, 0xb0,
+		    0x9d, 0x01, 0x26, 0x99, 0x7e, 0x8f, 0x03, 0x70,
+		    0xb5, 0x42, 0xbe, 0x67, 0x28, 0x1b, 0x7c, 0xbd,
+		    0x61, 0x21, 0x97, 0xcc, 0x5c, 0xe1, 0x97, 0x8f,
+		    0x8d, 0xde, 0x2b, 0xaa, 0xa7, 0x71, 0x1d, 0x1e,
+		    0x02, 0x73, 0x70, 0x58, 0x32, 0x5b, 0x1d, 0x67,
+		    0x3d, 0xe0, 0x74, 0x4f, 0x03, 0xf2, 0x70, 0x51,
+		    0x79, 0xf1, 0x61, 0x70, 0x15, 0x74, 0x9d, 0x23,
+		    0x89, 0xde, 0xac, 0xfd, 0xde, 0xd0, 0x1f, 0xc3,
+		    0x87, 0x44, 0x35, 0x4b, 0xe5, 0xb0, 0x60, 0xc5,
+		    0x22, 0xe4, 0x9e, 0xca, 0xeb, 0xd5, 0x3a, 0x09,
+		    0x45, 0xa4, 0xdb, 0xfa, 0x3f, 0xeb, 0x1b, 0xc7,
+		    0xc8, 0x14, 0x99, 0x51, 0x92, 0x10, 0xed, 0xed,
+		    0x28, 0xe0, 0xa1, 0xf8, 0x26, 0xcf, 0xcd, 0xcb,
+		    0x63, 0xa1, 0x3b, 0xe3, 0xdf, 0x7e, 0xfe, 0xa6,
+		    0xf0, 0x81, 0x9a, 0xbf, 0x55, 0xde, 0x54, 0xd5,
+		    0x56, 0x60, 0x98, 0x10, 0x68, 0xf4, 0x38, 0x96,
+		    0x8e, 0x6f, 0x1d, 0x44, 0x7f, 0xd6, 0x2f, 0xfe,
+		    0x55, 0xfb, 0x0c, 0x7e, 0x67, 0xe2, 0x61, 0x44,
+		    0xed, 0xf2, 0x35, 0x30, 0x5d, 0xe9, 0xc7, 0xd6,
+		    0x6d, 0xe0, 0xa0, 0xed, 0xf3, 0xfc, 0xd8, 0x3e,
+		    0x0a, 0x7b, 0xcd, 0xaf, 0x65, 0x68, 0x18, 0xc0,
+		    0xec, 0x04, 0x1c, 0x74, 0x6d, 0xe2, 0x6e, 0x79,
+		    0xd4, 0x11, 0x2b, 0x62, 0xd5, 0x27, 0xad, 0x4f,
+		    0x01, 0x59, 0x73, 0xcc, 0x6a, 0x53, 0xfb, 0x2d,
+		    0xd5, 0x4e, 0x99, 0x21, 0x65, 0x4d, 0xf5, 0x82,
+		    0xf7, 0xd8, 0x42, 0xce, 0x6f, 0x3d, 0x36, 0x47,
+		    0xf1, 0x05, 0x16, 0xe8, 0x1b, 0x6a, 0x8f, 0x93,
+		    0xf2, 0x8f, 0x37, 0x40, 0x12, 0x28, 0xa3, 0xe6,
+		    0xb9, 0x17, 0x4a, 0x1f, 0xb1, 0xd1, 0x66, 0x69,
+		    0x86, 0xc4, 0xfc, 0x97, 0xae, 0x3f, 0x8f, 0x1e,
+		    0x2b, 0xdf, 0xcd, 0xf9, 0x3c }
+}, {
+	.key	= { 0xb3, 0x35, 0x50, 0x03, 0x54, 0x2e, 0x40, 0x5e,
+		    0x8f, 0x59, 0x8e, 0xc5, 0x90, 0xd5, 0x27, 0x2d,
+		    0xba, 0x29, 0x2e, 0xcb, 0x1b, 0x70, 0x44, 0x1e,
+		    0x65, 0x91, 0x6e, 0x2a, 0x79, 0x22, 0xda, 0x64 },
+	.nonce	= { 0x05, 0xa3, 0x93, 0xed, 0x30, 0xc5, 0xa2, 0x06 },
+	.nlen	= 8,
+	.assoc	= { 0xb1, 0x69, 0x83, 0x87, 0x30, 0xaa, 0x5d, 0xb8,
+		    0x77, 0xe8, 0x21, 0xff, 0x06, 0x59, 0x35, 0xce,
+		    0x75, 0xfe, 0x38, 0xef, 0xb8, 0x91, 0x43, 0x8c,
+		    0xcf, 0x70, 0xdd, 0x0a, 0x68, 0xbf, 0xd4, 0xbc,
+		    0x16, 0x76, 0x99, 0x36, 0x1e, 0x58, 0x79, 0x5e,
+		    0xd4, 0x29, 0xf7, 0x33, 0x93, 0x48, 0xdb, 0x5f,
+		    0x01, 0xae, 0x9c, 0xb6, 0xe4, 0x88, 0x6d, 0x2b,
+		    0x76, 0x75, 0xe0, 0xf3, 0x74, 0xe2, 0xc9 },
+	.alen	= 63,
+	.input	= { 0x74, 0xa6, 0x3e, 0xe4, 0xb1, 0xcb, 0xaf, 0xb0,
+		    0x40, 0xe5, 0x0f, 0x9e, 0xf1, 0xf2, 0x89, 0xb5,
+		    0x42, 0x34, 0x8a, 0xa1, 0x03, 0xb7, 0xe9, 0x57,
+		    0x46, 0xbe, 0x20, 0xe4, 0x6e, 0xb0, 0xeb, 0xff,
+		    0xea, 0x07, 0x7e, 0xef, 0xe2, 0x55, 0x9f, 0xe5,
+		    0x78, 0x3a, 0xb7, 0x83, 0xc2, 0x18, 0x40, 0x7b,
+		    0xeb, 0xcd, 0x81, 0xfb, 0x90, 0x12, 0x9e, 0x46,
+		    0xa9, 0xd6, 0x4a, 0xba, 0xb0, 0x62, 0xdb, 0x6b,
+		    0x99, 0xc4, 0xdb, 0x54, 0x4b, 0xb8, 0xa5, 0x71,
+		    0xcb, 0xcd, 0x63, 0x32, 0x55, 0xfb, 0x31, 0xf0,
+		    0x38, 0xf5, 0xbe, 0x78, 0xe4, 0x45, 0xce, 0x1b,
+		    0x6a, 0x5b, 0x0e, 0xf4, 0x16, 0xe4, 0xb1, 0x3d,
+		    0xf6, 0x63, 0x7b, 0xa7, 0x0c, 0xde, 0x6f, 0x8f,
+		    0x74, 0xdf, 0xe0, 0x1e, 0x9d, 0xce, 0x8f, 0x24,
+		    0xef, 0x23, 0x35, 0x33, 0x7b, 0x83, 0x34, 0x23,
+		    0x58, 0x74, 0x14, 0x77, 0x1f, 0xc2, 0x4f, 0x4e,
+		    0xc6, 0x89, 0xf9, 0x52, 0x09, 0x37, 0x64, 0x14,
+		    0xc4, 0x01, 0x6b, 0x9d, 0x77, 0xe8, 0x90, 0x5d,
+		    0xa8, 0x4a, 0x2a, 0xef, 0x5c, 0x7f, 0xeb, 0xbb,
+		    0xb2, 0xc6, 0x93, 0x99, 0x66, 0xdc, 0x7f, 0xd4,
+		    0x9e, 0x2a, 0xca, 0x8d, 0xdb, 0xe7, 0x20, 0xcf,
+		    0xe4, 0x73, 0xae, 0x49, 0x7d, 0x64, 0x0f, 0x0e,
+		    0x28, 0x46, 0xa9, 0xa8, 0x32, 0xe4, 0x0e, 0xf6,
+		    0x51, 0x53, 0xb8, 0x3c, 0xb1, 0xff, 0xa3, 0x33,
+		    0x41, 0x75, 0xff, 0xf1, 0x6f, 0xf1, 0xfb, 0xbb,
+		    0x83, 0x7f, 0x06, 0x9b, 0xe7, 0x1b, 0x0a, 0xe0,
+		    0x5c, 0x33, 0x60, 0x5b, 0xdb, 0x5b, 0xed, 0xfe,
+		    0xa5, 0x16, 0x19, 0x72, 0xa3, 0x64, 0x23, 0x00,
+		    0x02, 0xc7, 0xf3, 0x6a, 0x81, 0x3e, 0x44, 0x1d,
+		    0x79, 0x15, 0x5f, 0x9a, 0xde, 0xe2, 0xfd, 0x1b,
+		    0x73, 0xc1, 0xbc, 0x23, 0xba, 0x31, 0xd2, 0x50,
+		    0xd5, 0xad, 0x7f, 0x74, 0xa7, 0xc9, 0xf8, 0x3e,
+		    0x2b, 0x26, 0x10, 0xf6, 0x03, 0x36, 0x74, 0xe4,
+		    0x0e, 0x6a, 0x72, 0xb7, 0x73, 0x0a, 0x42, 0x28,
+		    0xc2, 0xad, 0x5e, 0x03, 0xbe, 0xb8, 0x0b, 0xa8,
+		    0x5b, 0xd4, 0xb8, 0xba, 0x52, 0x89, 0xb1, 0x9b,
+		    0xc1, 0xc3, 0x65, 0x87, 0xed, 0xa5, 0xf4, 0x86,
+		    0xfd, 0x41, 0x80, 0x91, 0x27, 0x59, 0x53, 0x67,
+		    0x15, 0x78, 0x54, 0x8b, 0x2d, 0x3d, 0xc7, 0xff,
+		    0x02, 0x92, 0x07, 0x5f, 0x7a, 0x4b, 0x60, 0x59,
+		    0x3c, 0x6f, 0x5c, 0xd8, 0xec, 0x95, 0xd2, 0xfe,
+		    0xa0, 0x3b, 0xd8, 0x3f, 0xd1, 0x69, 0xa6, 0xd6,
+		    0x41, 0xb2, 0xf4, 0x4d, 0x12, 0xf4, 0x58, 0x3e,
+		    0x66, 0x64, 0x80, 0x31, 0x9b, 0xa8, 0x4c, 0x8b,
+		    0x07, 0xb2, 0xec, 0x66, 0x94, 0x66, 0x47, 0x50,
+		    0x50, 0x5f, 0x18, 0x0b, 0x0e, 0xd6, 0xc0, 0x39,
+		    0x21, 0x13, 0x9e, 0x33, 0xbc, 0x79, 0x36, 0x02,
+		    0x96, 0x70, 0xf0, 0x48, 0x67, 0x2f, 0x26, 0xe9,
+		    0x6d, 0x10, 0xbb, 0xd6, 0x3f, 0xd1, 0x64, 0x7a,
+		    0x2e, 0xbe, 0x0c, 0x61, 0xf0, 0x75, 0x42, 0x38,
+		    0x23, 0xb1, 0x9e, 0x9f, 0x7c, 0x67, 0x66, 0xd9,
+		    0x58, 0x9a, 0xf1, 0xbb, 0x41, 0x2a, 0x8d, 0x65,
+		    0x84, 0x94, 0xfc, 0xdc, 0x6a, 0x50, 0x64, 0xdb,
+		    0x56, 0x33, 0x76, 0x00, 0x10, 0xed, 0xbe, 0xd2,
+		    0x12, 0xf6, 0xf6, 0x1b, 0xa2, 0x16, 0xde, 0xae,
+		    0x31, 0x95, 0xdd, 0xb1, 0x08, 0x7e, 0x4e, 0xee,
+		    0xe7, 0xf9, 0xa5, 0xfb, 0x5b, 0x61, 0x43, 0x00,
+		    0x40, 0xf6, 0x7e, 0x02, 0x04, 0x32, 0x4e, 0x0c,
+		    0xe2, 0x66, 0x0d, 0xd7, 0x07, 0x98, 0x0e, 0xf8,
+		    0x72, 0x34, 0x6d, 0x95, 0x86, 0xd7, 0xcb, 0x31,
+		    0x54, 0x47, 0xd0, 0x38, 0x29, 0x9c, 0x5a, 0x68,
+		    0xd4, 0x87, 0x76, 0xc9, 0xe7, 0x7e, 0xe3, 0xf4,
+		    0x81, 0x6d, 0x18, 0xcb, 0xc9, 0x05, 0xaf, 0xa0,
+		    0xfb, 0x66, 0xf7, 0xf1, 0x1c, 0xc6, 0x14, 0x11,
+		    0x4f, 0x2b, 0x79, 0x42, 0x8b, 0xbc, 0xac, 0xe7,
+		    0x6c, 0xfe, 0x0f, 0x58, 0xe7, 0x7c, 0x78, 0x39,
+		    0x30, 0xb0, 0x66, 0x2c, 0x9b, 0x6d, 0x3a, 0xe1,
+		    0xcf, 0xc9, 0xa4, 0x0e, 0x6d, 0x6d, 0x8a, 0xa1,
+		    0x3a, 0xe7, 0x28, 0xd4, 0x78, 0x4c, 0xa6, 0xa2,
+		    0x2a, 0xa6, 0x03, 0x30, 0xd7, 0xa8, 0x25, 0x66,
+		    0x87, 0x2f, 0x69, 0x5c, 0x4e, 0xdd, 0xa5, 0x49,
+		    0x5d, 0x37, 0x4a, 0x59, 0xc4, 0xaf, 0x1f, 0xa2,
+		    0xe4, 0xf8, 0xa6, 0x12, 0x97, 0xd5, 0x79, 0xf5,
+		    0xe2, 0x4a, 0x2b, 0x5f, 0x61, 0xe4, 0x9e, 0xe3,
+		    0xee, 0xb8, 0xa7, 0x5b, 0x2f, 0xf4, 0x9e, 0x6c,
+		    0xfb, 0xd1, 0xc6, 0x56, 0x77, 0xba, 0x75, 0xaa,
+		    0x3d, 0x1a, 0xa8, 0x0b, 0xb3, 0x68, 0x24, 0x00,
+		    0x10, 0x7f, 0xfd, 0xd7, 0xa1, 0x8d, 0x83, 0x54,
+		    0x4f, 0x1f, 0xd8, 0x2a, 0xbe, 0x8a, 0x0c, 0x87,
+		    0xab, 0xa2, 0xde, 0xc3, 0x39, 0xbf, 0x09, 0x03,
+		    0xa5, 0xf3, 0x05, 0x28, 0xe1, 0xe1, 0xee, 0x39,
+		    0x70, 0x9c, 0xd8, 0x81, 0x12, 0x1e, 0x02, 0x40,
+		    0xd2, 0x6e, 0xf0, 0xeb, 0x1b, 0x3d, 0x22, 0xc6,
+		    0xe5, 0xe3, 0xb4, 0x5a, 0x98, 0xbb, 0xf0, 0x22,
+		    0x28, 0x8d, 0xe5, 0xd3, 0x16, 0x48, 0x24, 0xa5,
+		    0xe6, 0x66, 0x0c, 0xf9, 0x08, 0xf9, 0x7e, 0x1e,
+		    0xe1, 0x28, 0x26, 0x22, 0xc7, 0xc7, 0x0a, 0x32,
+		    0x47, 0xfa, 0xa3, 0xbe, 0x3c, 0xc4, 0xc5, 0x53,
+		    0x0a, 0xd5, 0x94, 0x4a, 0xd7, 0x93, 0xd8, 0x42,
+		    0x99, 0xb9, 0x0a, 0xdb, 0x56, 0xf7, 0xb9, 0x1c,
+		    0x53, 0x4f, 0xfa, 0xd3, 0x74, 0xad, 0xd9, 0x68,
+		    0xf1, 0x1b, 0xdf, 0x61, 0xc6, 0x5e, 0xa8, 0x48,
+		    0xfc, 0xd4, 0x4a, 0x4c, 0x3c, 0x32, 0xf7, 0x1c,
+		    0x96, 0x21, 0x9b, 0xf9, 0xa3, 0xcc, 0x5a, 0xce,
+		    0xd5, 0xd7, 0x08, 0x24, 0xf6, 0x1c, 0xfd, 0xdd,
+		    0x38, 0xc2, 0x32, 0xe9, 0xb8, 0xe7, 0xb6, 0xfa,
+		    0x9d, 0x45, 0x13, 0x2c, 0x83, 0xfd, 0x4a, 0x69,
+		    0x82, 0xcd, 0xdc, 0xb3, 0x76, 0x0c, 0x9e, 0xd8,
+		    0xf4, 0x1b, 0x45, 0x15, 0xb4, 0x97, 0xe7, 0x58,
+		    0x34, 0xe2, 0x03, 0x29, 0x5a, 0xbf, 0xb6, 0xe0,
+		    0x5d, 0x13, 0xd9, 0x2b, 0xb4, 0x80, 0xb2, 0x45,
+		    0x81, 0x6a, 0x2e, 0x6c, 0x89, 0x7d, 0xee, 0xbb,
+		    0x52, 0xdd, 0x1f, 0x18, 0xe7, 0x13, 0x6b, 0x33,
+		    0x0e, 0xea, 0x36, 0x92, 0x77, 0x7b, 0x6d, 0x9c,
+		    0x5a, 0x5f, 0x45, 0x7b, 0x7b, 0x35, 0x62, 0x23,
+		    0xd1, 0xbf, 0x0f, 0xd0, 0x08, 0x1b, 0x2b, 0x80,
+		    0x6b, 0x7e, 0xf1, 0x21, 0x47, 0xb0, 0x57, 0xd1,
+		    0x98, 0x72, 0x90, 0x34, 0x1c, 0x20, 0x04, 0xff,
+		    0x3d, 0x5c, 0xee, 0x0e, 0x57, 0x5f, 0x6f, 0x24,
+		    0x4e, 0x3c, 0xea, 0xfc, 0xa5, 0xa9, 0x83, 0xc9,
+		    0x61, 0xb4, 0x51, 0x24, 0xf8, 0x27, 0x5e, 0x46,
+		    0x8c, 0xb1, 0x53, 0x02, 0x96, 0x35, 0xba, 0xb8,
+		    0x4c, 0x71, 0xd3, 0x15, 0x59, 0x35, 0x22, 0x20,
+		    0xad, 0x03, 0x9f, 0x66, 0x44, 0x3b, 0x9c, 0x35,
+		    0x37, 0x1f, 0x9b, 0xbb, 0xf3, 0xdb, 0x35, 0x63,
+		    0x30, 0x64, 0xaa, 0xa2, 0x06, 0xa8, 0x5d, 0xbb,
+		    0xe1, 0x9f, 0x70, 0xec, 0x82, 0x11, 0x06, 0x36,
+		    0xec, 0x8b, 0x69, 0x66, 0x24, 0x44, 0xc9, 0x4a,
+		    0x57, 0xbb, 0x9b, 0x78, 0x13, 0xce, 0x9c, 0x0c,
+		    0xba, 0x92, 0x93, 0x63, 0xb8, 0xe2, 0x95, 0x0f,
+		    0x0f, 0x16, 0x39, 0x52, 0xfd, 0x3a, 0x6d, 0x02,
+		    0x4b, 0xdf, 0x13, 0xd3, 0x2a, 0x22, 0xb4, 0x03,
+		    0x7c, 0x54, 0x49, 0x96, 0x68, 0x54, 0x10, 0xfa,
+		    0xef, 0xaa, 0x6c, 0xe8, 0x22, 0xdc, 0x71, 0x16,
+		    0x13, 0x1a, 0xf6, 0x28, 0xe5, 0x6d, 0x77, 0x3d,
+		    0xcd, 0x30, 0x63, 0xb1, 0x70, 0x52, 0xa1, 0xc5,
+		    0x94, 0x5f, 0xcf, 0xe8, 0xb8, 0x26, 0x98, 0xf7,
+		    0x06, 0xa0, 0x0a, 0x70, 0xfa, 0x03, 0x80, 0xac,
+		    0xc1, 0xec, 0xd6, 0x4c, 0x54, 0xd7, 0xfe, 0x47,
+		    0xb6, 0x88, 0x4a, 0xf7, 0x71, 0x24, 0xee, 0xf3,
+		    0xd2, 0xc2, 0x4a, 0x7f, 0xfe, 0x61, 0xc7, 0x35,
+		    0xc9, 0x37, 0x67, 0xcb, 0x24, 0x35, 0xda, 0x7e,
+		    0xca, 0x5f, 0xf3, 0x8d, 0xd4, 0x13, 0x8e, 0xd6,
+		    0xcb, 0x4d, 0x53, 0x8f, 0x53, 0x1f, 0xc0, 0x74,
+		    0xf7, 0x53, 0xb9, 0x5e, 0x23, 0x37, 0xba, 0x6e,
+		    0xe3, 0x9d, 0x07, 0x55, 0x25, 0x7b, 0xe6, 0x2a,
+		    0x64, 0xd1, 0x32, 0xdd, 0x54, 0x1b, 0x4b, 0xc0,
+		    0xe1, 0xd7, 0x69, 0x58, 0xf8, 0x93, 0x29, 0xc4,
+		    0xdd, 0x23, 0x2f, 0xa5, 0xfc, 0x9d, 0x7e, 0xf8,
+		    0xd4, 0x90, 0xcd, 0x82, 0x55, 0xdc, 0x16, 0x16,
+		    0x9f, 0x07, 0x52, 0x9b, 0x9d, 0x25, 0xed, 0x32,
+		    0xc5, 0x7b, 0xdf, 0xf6, 0x83, 0x46, 0x3d, 0x65,
+		    0xb7, 0xef, 0x87, 0x7a, 0x12, 0x69, 0x8f, 0x06,
+		    0x7c, 0x51, 0x15, 0x4a, 0x08, 0xe8, 0xac, 0x9a,
+		    0x0c, 0x24, 0xa7, 0x27, 0xd8, 0x46, 0x2f, 0xe7,
+		    0x01, 0x0e, 0x1c, 0xc6, 0x91, 0xb0, 0x6e, 0x85,
+		    0x65, 0xf0, 0x29, 0x0d, 0x2e, 0x6b, 0x3b, 0xfb,
+		    0x4b, 0xdf, 0xe4, 0x80, 0x93, 0x03, 0x66, 0x46,
+		    0x3e, 0x8a, 0x6e, 0xf3, 0x5e, 0x4d, 0x62, 0x0e,
+		    0x49, 0x05, 0xaf, 0xd4, 0xf8, 0x21, 0x20, 0x61,
+		    0x1d, 0x39, 0x17, 0xf4, 0x61, 0x47, 0x95, 0xfb,
+		    0x15, 0x2e, 0xb3, 0x4f, 0xd0, 0x5d, 0xf5, 0x7d,
+		    0x40, 0xda, 0x90, 0x3c, 0x6b, 0xcb, 0x17, 0x00,
+		    0x13, 0x3b, 0x64, 0x34, 0x1b, 0xf0, 0xf2, 0xe5,
+		    0x3b, 0xb2, 0xc7, 0xd3, 0x5f, 0x3a, 0x44, 0xa6,
+		    0x9b, 0xb7, 0x78, 0x0e, 0x42, 0x5d, 0x4c, 0xc1,
+		    0xe9, 0xd2, 0xcb, 0xb7, 0x78, 0xd1, 0xfe, 0x9a,
+		    0xb5, 0x07, 0xe9, 0xe0, 0xbe, 0xe2, 0x8a, 0xa7,
+		    0x01, 0x83, 0x00, 0x8c, 0x5c, 0x08, 0xe6, 0x63,
+		    0x12, 0x92, 0xb7, 0xb7, 0xa6, 0x19, 0x7d, 0x38,
+		    0x13, 0x38, 0x92, 0x87, 0x24, 0xf9, 0x48, 0xb3,
+		    0x5e, 0x87, 0x6a, 0x40, 0x39, 0x5c, 0x3f, 0xed,
+		    0x8f, 0xee, 0xdb, 0x15, 0x82, 0x06, 0xda, 0x49,
+		    0x21, 0x2b, 0xb5, 0xbf, 0x32, 0x7c, 0x9f, 0x42,
+		    0x28, 0x63, 0xcf, 0xaf, 0x1e, 0xf8, 0xc6, 0xa0,
+		    0xd1, 0x02, 0x43, 0x57, 0x62, 0xec, 0x9b, 0x0f,
+		    0x01, 0x9e, 0x71, 0xd8, 0x87, 0x9d, 0x01, 0xc1,
+		    0x58, 0x77, 0xd9, 0xaf, 0xb1, 0x10, 0x7e, 0xdd,
+		    0xa6, 0x50, 0x96, 0xe5, 0xf0, 0x72, 0x00, 0x6d,
+		    0x4b, 0xf8, 0x2a, 0x8f, 0x19, 0xf3, 0x22, 0x88,
+		    0x11, 0x4a, 0x8b, 0x7c, 0xfd, 0xb7, 0xed, 0xe1,
+		    0xf6, 0x40, 0x39, 0xe0, 0xe9, 0xf6, 0x3d, 0x25,
+		    0xe6, 0x74, 0x3c, 0x58, 0x57, 0x7f, 0xe1, 0x22,
+		    0x96, 0x47, 0x31, 0x91, 0xba, 0x70, 0x85, 0x28,
+		    0x6b, 0x9f, 0x6e, 0x25, 0xac, 0x23, 0x66, 0x2f,
+		    0x29, 0x88, 0x28, 0xce, 0x8c, 0x5c, 0x88, 0x53,
+		    0xd1, 0x3b, 0xcc, 0x6a, 0x51, 0xb2, 0xe1, 0x28,
+		    0x3f, 0x91, 0xb4, 0x0d, 0x00, 0x3a, 0xe3, 0xf8,
+		    0xc3, 0x8f, 0xd7, 0x96, 0x62, 0x0e, 0x2e, 0xfc,
+		    0xc8, 0x6c, 0x77, 0xa6, 0x1d, 0x22, 0xc1, 0xb8,
+		    0xe6, 0x61, 0xd7, 0x67, 0x36, 0x13, 0x7b, 0xbb,
+		    0x9b, 0x59, 0x09, 0xa6, 0xdf, 0xf7, 0x6b, 0xa3,
+		    0x40, 0x1a, 0xf5, 0x4f, 0xb4, 0xda, 0xd3, 0xf3,
+		    0x81, 0x93, 0xc6, 0x18, 0xd9, 0x26, 0xee, 0xac,
+		    0xf0, 0xaa, 0xdf, 0xc5, 0x9c, 0xca, 0xc2, 0xa2,
+		    0xcc, 0x7b, 0x5c, 0x24, 0xb0, 0xbc, 0xd0, 0x6a,
+		    0x4d, 0x89, 0x09, 0xb8, 0x07, 0xfe, 0x87, 0xad,
+		    0x0a, 0xea, 0xb8, 0x42, 0xf9, 0x5e, 0xb3, 0x3e,
+		    0x36, 0x4c, 0xaf, 0x75, 0x9e, 0x1c, 0xeb, 0xbd,
+		    0xbc, 0xbb, 0x80, 0x40, 0xa7, 0x3a, 0x30, 0xbf,
+		    0xa8, 0x44, 0xf4, 0xeb, 0x38, 0xad, 0x29, 0xba,
+		    0x23, 0xed, 0x41, 0x0c, 0xea, 0xd2, 0xbb, 0x41,
+		    0x18, 0xd6, 0xb9, 0xba, 0x65, 0x2b, 0xa3, 0x91,
+		    0x6d, 0x1f, 0xa9, 0xf4, 0xd1, 0x25, 0x8d, 0x4d,
+		    0x38, 0xff, 0x64, 0xa0, 0xec, 0xde, 0xa6, 0xb6,
+		    0x79, 0xab, 0x8e, 0x33, 0x6c, 0x47, 0xde, 0xaf,
+		    0x94, 0xa4, 0xa5, 0x86, 0x77, 0x55, 0x09, 0x92,
+		    0x81, 0x31, 0x76, 0xc7, 0x34, 0x22, 0x89, 0x8e,
+		    0x3d, 0x26, 0x26, 0xd7, 0xfc, 0x1e, 0x16, 0x72,
+		    0x13, 0x33, 0x63, 0xd5, 0x22, 0xbe, 0xb8, 0x04,
+		    0x34, 0x84, 0x41, 0xbb, 0x80, 0xd0, 0x9f, 0x46,
+		    0x48, 0x07, 0xa7, 0xfc, 0x2b, 0x3a, 0x75, 0x55,
+		    0x8c, 0xc7, 0x6a, 0xbd, 0x7e, 0x46, 0x08, 0x84,
+		    0x0f, 0xd5, 0x74, 0xc0, 0x82, 0x8e, 0xaa, 0x61,
+		    0x05, 0x01, 0xb2, 0x47, 0x6e, 0x20, 0x6a, 0x2d,
+		    0x58, 0x70, 0x48, 0x32, 0xa7, 0x37, 0xd2, 0xb8,
+		    0x82, 0x1a, 0x51, 0xb9, 0x61, 0xdd, 0xfd, 0x9d,
+		    0x6b, 0x0e, 0x18, 0x97, 0xf8, 0x45, 0x5f, 0x87,
+		    0x10, 0xcf, 0x34, 0x72, 0x45, 0x26, 0x49, 0x70,
+		    0xe7, 0xa3, 0x78, 0xe0, 0x52, 0x89, 0x84, 0x94,
+		    0x83, 0x82, 0xc2, 0x69, 0x8f, 0xe3, 0xe1, 0x3f,
+		    0x60, 0x74, 0x88, 0xc4, 0xf7, 0x75, 0x2c, 0xfb,
+		    0xbd, 0xb6, 0xc4, 0x7e, 0x10, 0x0a, 0x6c, 0x90,
+		    0x04, 0x9e, 0xc3, 0x3f, 0x59, 0x7c, 0xce, 0x31,
+		    0x18, 0x60, 0x57, 0x73, 0x46, 0x94, 0x7d, 0x06,
+		    0xa0, 0x6d, 0x44, 0xec, 0xa2, 0x0a, 0x9e, 0x05,
+		    0x15, 0xef, 0xca, 0x5c, 0xbf, 0x00, 0xeb, 0xf7,
+		    0x3d, 0x32, 0xd4, 0xa5, 0xef, 0x49, 0x89, 0x5e,
+		    0x46, 0xb0, 0xa6, 0x63, 0x5b, 0x8a, 0x73, 0xae,
+		    0x6f, 0xd5, 0x9d, 0xf8, 0x4f, 0x40, 0xb5, 0xb2,
+		    0x6e, 0xd3, 0xb6, 0x01, 0xa9, 0x26, 0xa2, 0x21,
+		    0xcf, 0x33, 0x7a, 0x3a, 0xa4, 0x23, 0x13, 0xb0,
+		    0x69, 0x6a, 0xee, 0xce, 0xd8, 0x9d, 0x01, 0x1d,
+		    0x50, 0xc1, 0x30, 0x6c, 0xb1, 0xcd, 0xa0, 0xf0,
+		    0xf0, 0xa2, 0x64, 0x6f, 0xbb, 0xbf, 0x5e, 0xe6,
+		    0xab, 0x87, 0xb4, 0x0f, 0x4f, 0x15, 0xaf, 0xb5,
+		    0x25, 0xa1, 0xb2, 0xd0, 0x80, 0x2c, 0xfb, 0xf9,
+		    0xfe, 0xd2, 0x33, 0xbb, 0x76, 0xfe, 0x7c, 0xa8,
+		    0x66, 0xf7, 0xe7, 0x85, 0x9f, 0x1f, 0x85, 0x57,
+		    0x88, 0xe1, 0xe9, 0x63, 0xe4, 0xd8, 0x1c, 0xa1,
+		    0xfb, 0xda, 0x44, 0x05, 0x2e, 0x1d, 0x3a, 0x1c,
+		    0xff, 0xc8, 0x3b, 0xc0, 0xfe, 0xda, 0x22, 0x0b,
+		    0x43, 0xd6, 0x88, 0x39, 0x4c, 0x4a, 0xa6, 0x69,
+		    0x18, 0x93, 0x42, 0x4e, 0xb5, 0xcc, 0x66, 0x0d,
+		    0x09, 0xf8, 0x1e, 0x7c, 0xd3, 0x3c, 0x99, 0x0d,
+		    0x50, 0x1d, 0x62, 0xe9, 0x57, 0x06, 0xbf, 0x19,
+		    0x88, 0xdd, 0xad, 0x7b, 0x4f, 0xf9, 0xc7, 0x82,
+		    0x6d, 0x8d, 0xc8, 0xc4, 0xc5, 0x78, 0x17, 0x20,
+		    0x15, 0xc5, 0x52, 0x41, 0xcf, 0x5b, 0xd6, 0x7f,
+		    0x94, 0x02, 0x41, 0xe0, 0x40, 0x22, 0x03, 0x5e,
+		    0xd1, 0x53, 0xd4, 0x86, 0xd3, 0x2c, 0x9f, 0x0f,
+		    0x96, 0xe3, 0x6b, 0x9a, 0x76, 0x32, 0x06, 0x47,
+		    0x4b, 0x11, 0xb3, 0xdd, 0x03, 0x65, 0xbd, 0x9b,
+		    0x01, 0xda, 0x9c, 0xb9, 0x7e, 0x3f, 0x6a, 0xc4,
+		    0x7b, 0xea, 0xd4, 0x3c, 0xb9, 0xfb, 0x5c, 0x6b,
+		    0x64, 0x33, 0x52, 0xba, 0x64, 0x78, 0x8f, 0xa4,
+		    0xaf, 0x7a, 0x61, 0x8d, 0xbc, 0xc5, 0x73, 0xe9,
+		    0x6b, 0x58, 0x97, 0x4b, 0xbf, 0x63, 0x22, 0xd3,
+		    0x37, 0x02, 0x54, 0xc5, 0xb9, 0x16, 0x4a, 0xf0,
+		    0x19, 0xd8, 0x94, 0x57, 0xb8, 0x8a, 0xb3, 0x16,
+		    0x3b, 0xd0, 0x84, 0x8e, 0x67, 0xa6, 0xa3, 0x7d,
+		    0x78, 0xec, 0x00 },
+	.ilen	= 2011,
+	.result	= { 0x52, 0x34, 0xb3, 0x65, 0x3b, 0xb7, 0xe5, 0xd3,
+		    0xab, 0x49, 0x17, 0x60, 0xd2, 0x52, 0x56, 0xdf,
+		    0xdf, 0x34, 0x56, 0x82, 0xe2, 0xbe, 0xe5, 0xe1,
+		    0x28, 0xd1, 0x4e, 0x5f, 0x4f, 0x01, 0x7d, 0x3f,
+		    0x99, 0x6b, 0x30, 0x6e, 0x1a, 0x7c, 0x4c, 0x8e,
+		    0x62, 0x81, 0xae, 0x86, 0x3f, 0x6b, 0xd0, 0xb5,
+		    0xa9, 0xcf, 0x50, 0xf1, 0x02, 0x12, 0xa0, 0x0b,
+		    0x24, 0xe9, 0xe6, 0x72, 0x89, 0x2c, 0x52, 0x1b,
+		    0x34, 0x38, 0xf8, 0x75, 0x5f, 0xa0, 0x74, 0xe2,
+		    0x99, 0xdd, 0xa6, 0x4b, 0x14, 0x50, 0x4e, 0xf1,
+		    0xbe, 0xd6, 0x9e, 0xdb, 0xb2, 0x24, 0x27, 0x74,
+		    0x12, 0x4a, 0x78, 0x78, 0x17, 0xa5, 0x58, 0x8e,
+		    0x2f, 0xf9, 0xf4, 0x8d, 0xee, 0x03, 0x88, 0xae,
+		    0xb8, 0x29, 0xa1, 0x2f, 0x4b, 0xee, 0x92, 0xbd,
+		    0x87, 0xb3, 0xce, 0x34, 0x21, 0x57, 0x46, 0x04,
+		    0x49, 0x0c, 0x80, 0xf2, 0x01, 0x13, 0xa1, 0x55,
+		    0xb3, 0xff, 0x44, 0x30, 0x3c, 0x1c, 0xd0, 0xef,
+		    0xbc, 0x18, 0x74, 0x26, 0xad, 0x41, 0x5b, 0x5b,
+		    0x3e, 0x9a, 0x7a, 0x46, 0x4f, 0x16, 0xd6, 0x74,
+		    0x5a, 0xb7, 0x3a, 0x28, 0x31, 0xd8, 0xae, 0x26,
+		    0xac, 0x50, 0x53, 0x86, 0xf2, 0x56, 0xd7, 0x3f,
+		    0x29, 0xbc, 0x45, 0x68, 0x8e, 0xcb, 0x98, 0x64,
+		    0xdd, 0xc9, 0xba, 0xb8, 0x4b, 0x7b, 0x82, 0xdd,
+		    0x14, 0xa7, 0xcb, 0x71, 0x72, 0x00, 0x5c, 0xad,
+		    0x7b, 0x6a, 0x89, 0xa4, 0x3d, 0xbf, 0xb5, 0x4b,
+		    0x3e, 0x7c, 0x5a, 0xcf, 0xb8, 0xa1, 0xc5, 0x6e,
+		    0xc8, 0xb6, 0x31, 0x57, 0x7b, 0xdf, 0xa5, 0x7e,
+		    0xb1, 0xd6, 0x42, 0x2a, 0x31, 0x36, 0xd1, 0xd0,
+		    0x3f, 0x7a, 0xe5, 0x94, 0xd6, 0x36, 0xa0, 0x6f,
+		    0xb7, 0x40, 0x7d, 0x37, 0xc6, 0x55, 0x7c, 0x50,
+		    0x40, 0x6d, 0x29, 0x89, 0xe3, 0x5a, 0xae, 0x97,
+		    0xe7, 0x44, 0x49, 0x6e, 0xbd, 0x81, 0x3d, 0x03,
+		    0x93, 0x06, 0x12, 0x06, 0xe2, 0x41, 0x12, 0x4a,
+		    0xf1, 0x6a, 0xa4, 0x58, 0xa2, 0xfb, 0xd2, 0x15,
+		    0xba, 0xc9, 0x79, 0xc9, 0xce, 0x5e, 0x13, 0xbb,
+		    0xf1, 0x09, 0x04, 0xcc, 0xfd, 0xe8, 0x51, 0x34,
+		    0x6a, 0xe8, 0x61, 0x88, 0xda, 0xed, 0x01, 0x47,
+		    0x84, 0xf5, 0x73, 0x25, 0xf9, 0x1c, 0x42, 0x86,
+		    0x07, 0xf3, 0x5b, 0x1a, 0x01, 0xb3, 0xeb, 0x24,
+		    0x32, 0x8d, 0xf6, 0xed, 0x7c, 0x4b, 0xeb, 0x3c,
+		    0x36, 0x42, 0x28, 0xdf, 0xdf, 0xb6, 0xbe, 0xd9,
+		    0x8c, 0x52, 0xd3, 0x2b, 0x08, 0x90, 0x8c, 0xe7,
+		    0x98, 0x31, 0xe2, 0x32, 0x8e, 0xfc, 0x11, 0x48,
+		    0x00, 0xa8, 0x6a, 0x42, 0x4a, 0x02, 0xc6, 0x4b,
+		    0x09, 0xf1, 0xe3, 0x49, 0xf3, 0x45, 0x1f, 0x0e,
+		    0xbc, 0x56, 0xe2, 0xe4, 0xdf, 0xfb, 0xeb, 0x61,
+		    0xfa, 0x24, 0xc1, 0x63, 0x75, 0xbb, 0x47, 0x75,
+		    0xaf, 0xe1, 0x53, 0x16, 0x96, 0x21, 0x85, 0x26,
+		    0x11, 0xb3, 0x76, 0xe3, 0x23, 0xa1, 0x6b, 0x74,
+		    0x37, 0xd0, 0xde, 0x06, 0x90, 0x71, 0x5d, 0x43,
+		    0x88, 0x9b, 0x00, 0x54, 0xa6, 0x75, 0x2f, 0xa1,
+		    0xc2, 0x0b, 0x73, 0x20, 0x1d, 0xb6, 0x21, 0x79,
+		    0x57, 0x3f, 0xfa, 0x09, 0xbe, 0x8a, 0x33, 0xc3,
+		    0x52, 0xf0, 0x1d, 0x82, 0x31, 0xd1, 0x55, 0xb5,
+		    0x6c, 0x99, 0x25, 0xcf, 0x5c, 0x32, 0xce, 0xe9,
+		    0x0d, 0xfa, 0x69, 0x2c, 0xd5, 0x0d, 0xc5, 0x6d,
+		    0x86, 0xd0, 0x0c, 0x3b, 0x06, 0x50, 0x79, 0xe8,
+		    0xc3, 0xae, 0x04, 0xe6, 0xcd, 0x51, 0xe4, 0x26,
+		    0x9b, 0x4f, 0x7e, 0xa6, 0x0f, 0xab, 0xd8, 0xe5,
+		    0xde, 0xa9, 0x00, 0x95, 0xbe, 0xa3, 0x9d, 0x5d,
+		    0xb2, 0x09, 0x70, 0x18, 0x1c, 0xf0, 0xac, 0x29,
+		    0x23, 0x02, 0x29, 0x28, 0xd2, 0x74, 0x35, 0x57,
+		    0x62, 0x0f, 0x24, 0xea, 0x5e, 0x33, 0xc2, 0x92,
+		    0xf3, 0x78, 0x4d, 0x30, 0x1e, 0xa1, 0x99, 0xa9,
+		    0x82, 0xb0, 0x42, 0x31, 0x8d, 0xad, 0x8a, 0xbc,
+		    0xfc, 0xd4, 0x57, 0x47, 0x3e, 0xb4, 0x50, 0xdd,
+		    0x6e, 0x2c, 0x80, 0x4d, 0x22, 0xf1, 0xfb, 0x57,
+		    0xc4, 0xdd, 0x17, 0xe1, 0x8a, 0x36, 0x4a, 0xb3,
+		    0x37, 0xca, 0xc9, 0x4e, 0xab, 0xd5, 0x69, 0xc4,
+		    0xf4, 0xbc, 0x0b, 0x3b, 0x44, 0x4b, 0x29, 0x9c,
+		    0xee, 0xd4, 0x35, 0x22, 0x21, 0xb0, 0x1f, 0x27,
+		    0x64, 0xa8, 0x51, 0x1b, 0xf0, 0x9f, 0x19, 0x5c,
+		    0xfb, 0x5a, 0x64, 0x74, 0x70, 0x45, 0x09, 0xf5,
+		    0x64, 0xfe, 0x1a, 0x2d, 0xc9, 0x14, 0x04, 0x14,
+		    0xcf, 0xd5, 0x7d, 0x60, 0xaf, 0x94, 0x39, 0x94,
+		    0xe2, 0x7d, 0x79, 0x82, 0xd0, 0x65, 0x3b, 0x6b,
+		    0x9c, 0x19, 0x84, 0xb4, 0x6d, 0xb3, 0x0c, 0x99,
+		    0xc0, 0x56, 0xa8, 0xbd, 0x73, 0xce, 0x05, 0x84,
+		    0x3e, 0x30, 0xaa, 0xc4, 0x9b, 0x1b, 0x04, 0x2a,
+		    0x9f, 0xd7, 0x43, 0x2b, 0x23, 0xdf, 0xbf, 0xaa,
+		    0xd5, 0xc2, 0x43, 0x2d, 0x70, 0xab, 0xdc, 0x75,
+		    0xad, 0xac, 0xf7, 0xc0, 0xbe, 0x67, 0xb2, 0x74,
+		    0xed, 0x67, 0x10, 0x4a, 0x92, 0x60, 0xc1, 0x40,
+		    0x50, 0x19, 0x8a, 0x8a, 0x8c, 0x09, 0x0e, 0x72,
+		    0xe1, 0x73, 0x5e, 0xe8, 0x41, 0x85, 0x63, 0x9f,
+		    0x3f, 0xd7, 0x7d, 0xc4, 0xfb, 0x22, 0x5d, 0x92,
+		    0x6c, 0xb3, 0x1e, 0xe2, 0x50, 0x2f, 0x82, 0xa8,
+		    0x28, 0xc0, 0xb5, 0xd7, 0x5f, 0x68, 0x0d, 0x2c,
+		    0x2d, 0xaf, 0x7e, 0xfa, 0x2e, 0x08, 0x0f, 0x1f,
+		    0x70, 0x9f, 0xe9, 0x19, 0x72, 0x55, 0xf8, 0xfb,
+		    0x51, 0xd2, 0x33, 0x5d, 0xa0, 0xd3, 0x2b, 0x0a,
+		    0x6c, 0xbc, 0x4e, 0xcf, 0x36, 0x4d, 0xdc, 0x3b,
+		    0xe9, 0x3e, 0x81, 0x7c, 0x61, 0xdb, 0x20, 0x2d,
+		    0x3a, 0xc3, 0xb3, 0x0c, 0x1e, 0x00, 0xb9, 0x7c,
+		    0xf5, 0xca, 0x10, 0x5f, 0x3a, 0x71, 0xb3, 0xe4,
+		    0x20, 0xdb, 0x0c, 0x2a, 0x98, 0x63, 0x45, 0x00,
+		    0x58, 0xf6, 0x68, 0xe4, 0x0b, 0xda, 0x13, 0x3b,
+		    0x60, 0x5c, 0x76, 0xdb, 0xb9, 0x97, 0x71, 0xe4,
+		    0xd9, 0xb7, 0xdb, 0xbd, 0x68, 0xc7, 0x84, 0x84,
+		    0xaa, 0x7c, 0x68, 0x62, 0x5e, 0x16, 0xfc, 0xba,
+		    0x72, 0xaa, 0x9a, 0xa9, 0xeb, 0x7c, 0x75, 0x47,
+		    0x97, 0x7e, 0xad, 0xe2, 0xd9, 0x91, 0xe8, 0xe4,
+		    0xa5, 0x31, 0xd7, 0x01, 0x8e, 0xa2, 0x11, 0x88,
+		    0x95, 0xb9, 0xf2, 0x9b, 0xd3, 0x7f, 0x1b, 0x81,
+		    0x22, 0xf7, 0x98, 0x60, 0x0a, 0x64, 0xa6, 0xc1,
+		    0xf6, 0x49, 0xc7, 0xe3, 0x07, 0x4d, 0x94, 0x7a,
+		    0xcf, 0x6e, 0x68, 0x0c, 0x1b, 0x3f, 0x6e, 0x2e,
+		    0xee, 0x92, 0xfa, 0x52, 0xb3, 0x59, 0xf8, 0xf1,
+		    0x8f, 0x6a, 0x66, 0xa3, 0x82, 0x76, 0x4a, 0x07,
+		    0x1a, 0xc7, 0xdd, 0xf5, 0xda, 0x9c, 0x3c, 0x24,
+		    0xbf, 0xfd, 0x42, 0xa1, 0x10, 0x64, 0x6a, 0x0f,
+		    0x89, 0xee, 0x36, 0xa5, 0xce, 0x99, 0x48, 0x6a,
+		    0xf0, 0x9f, 0x9e, 0x69, 0xa4, 0x40, 0x20, 0xe9,
+		    0x16, 0x15, 0xf7, 0xdb, 0x75, 0x02, 0xcb, 0xe9,
+		    0x73, 0x8b, 0x3b, 0x49, 0x2f, 0xf0, 0xaf, 0x51,
+		    0x06, 0x5c, 0xdf, 0x27, 0x27, 0x49, 0x6a, 0xd1,
+		    0xcc, 0xc7, 0xb5, 0x63, 0xb5, 0xfc, 0xb8, 0x5c,
+		    0x87, 0x7f, 0x84, 0xb4, 0xcc, 0x14, 0xa9, 0x53,
+		    0xda, 0xa4, 0x56, 0xf8, 0xb6, 0x1b, 0xcc, 0x40,
+		    0x27, 0x52, 0x06, 0x5a, 0x13, 0x81, 0xd7, 0x3a,
+		    0xd4, 0x3b, 0xfb, 0x49, 0x65, 0x31, 0x33, 0xb2,
+		    0xfa, 0xcd, 0xad, 0x58, 0x4e, 0x2b, 0xae, 0xd2,
+		    0x20, 0xfb, 0x1a, 0x48, 0xb4, 0x3f, 0x9a, 0xd8,
+		    0x7a, 0x35, 0x4a, 0xc8, 0xee, 0x88, 0x5e, 0x07,
+		    0x66, 0x54, 0xb9, 0xec, 0x9f, 0xa3, 0xe3, 0xb9,
+		    0x37, 0xaa, 0x49, 0x76, 0x31, 0xda, 0x74, 0x2d,
+		    0x3c, 0xa4, 0x65, 0x10, 0x32, 0x38, 0xf0, 0xde,
+		    0xd3, 0x99, 0x17, 0xaa, 0x71, 0xaa, 0x8f, 0x0f,
+		    0x8c, 0xaf, 0xa2, 0xf8, 0x5d, 0x64, 0xba, 0x1d,
+		    0xa3, 0xef, 0x96, 0x73, 0xe8, 0xa1, 0x02, 0x8d,
+		    0x0c, 0x6d, 0xb8, 0x06, 0x90, 0xb8, 0x08, 0x56,
+		    0x2c, 0xa7, 0x06, 0xc9, 0xc2, 0x38, 0xdb, 0x7c,
+		    0x63, 0xb1, 0x57, 0x8e, 0xea, 0x7c, 0x79, 0xf3,
+		    0x49, 0x1d, 0xfe, 0x9f, 0xf3, 0x6e, 0xb1, 0x1d,
+		    0xba, 0x19, 0x80, 0x1a, 0x0a, 0xd3, 0xb0, 0x26,
+		    0x21, 0x40, 0xb1, 0x7c, 0xf9, 0x4d, 0x8d, 0x10,
+		    0xc1, 0x7e, 0xf4, 0xf6, 0x3c, 0xa8, 0xfd, 0x7c,
+		    0xa3, 0x92, 0xb2, 0x0f, 0xaa, 0xcc, 0xa6, 0x11,
+		    0xfe, 0x04, 0xe3, 0xd1, 0x7a, 0x32, 0x89, 0xdf,
+		    0x0d, 0xc4, 0x8f, 0x79, 0x6b, 0xca, 0x16, 0x7c,
+		    0x6e, 0xf9, 0xad, 0x0f, 0xf6, 0xfe, 0x27, 0xdb,
+		    0xc4, 0x13, 0x70, 0xf1, 0x62, 0x1a, 0x4f, 0x79,
+		    0x40, 0xc9, 0x9b, 0x8b, 0x21, 0xea, 0x84, 0xfa,
+		    0xf5, 0xf1, 0x89, 0xce, 0xb7, 0x55, 0x0a, 0x80,
+		    0x39, 0x2f, 0x55, 0x36, 0x16, 0x9c, 0x7b, 0x08,
+		    0xbd, 0x87, 0x0d, 0xa5, 0x32, 0xf1, 0x52, 0x7c,
+		    0xe8, 0x55, 0x60, 0x5b, 0xd7, 0x69, 0xe4, 0xfc,
+		    0xfa, 0x12, 0x85, 0x96, 0xea, 0x50, 0x28, 0xab,
+		    0x8a, 0xf7, 0xbb, 0x0e, 0x53, 0x74, 0xca, 0xa6,
+		    0x27, 0x09, 0xc2, 0xb5, 0xde, 0x18, 0x14, 0xd9,
+		    0xea, 0xe5, 0x29, 0x1c, 0x40, 0x56, 0xcf, 0xd7,
+		    0xae, 0x05, 0x3f, 0x65, 0xaf, 0x05, 0x73, 0xe2,
+		    0x35, 0x96, 0x27, 0x07, 0x14, 0xc0, 0xad, 0x33,
+		    0xf1, 0xdc, 0x44, 0x7a, 0x89, 0x17, 0x77, 0xd2,
+		    0x9c, 0x58, 0x60, 0xf0, 0x3f, 0x7b, 0x2d, 0x2e,
+		    0x57, 0x95, 0x54, 0x87, 0xed, 0xf2, 0xc7, 0x4c,
+		    0xf0, 0xae, 0x56, 0x29, 0x19, 0x7d, 0x66, 0x4b,
+		    0x9b, 0x83, 0x84, 0x42, 0x3b, 0x01, 0x25, 0x66,
+		    0x8e, 0x02, 0xde, 0xb9, 0x83, 0x54, 0x19, 0xf6,
+		    0x9f, 0x79, 0x0d, 0x67, 0xc5, 0x1d, 0x7a, 0x44,
+		    0x02, 0x98, 0xa7, 0x16, 0x1c, 0x29, 0x0d, 0x74,
+		    0xff, 0x85, 0x40, 0x06, 0xef, 0x2c, 0xa9, 0xc6,
+		    0xf5, 0x53, 0x07, 0x06, 0xae, 0xe4, 0xfa, 0x5f,
+		    0xd8, 0x39, 0x4d, 0xf1, 0x9b, 0x6b, 0xd9, 0x24,
+		    0x84, 0xfe, 0x03, 0x4c, 0xb2, 0x3f, 0xdf, 0xa1,
+		    0x05, 0x9e, 0x50, 0x14, 0x5a, 0xd9, 0x1a, 0xa2,
+		    0xa7, 0xfa, 0xfa, 0x17, 0xf7, 0x78, 0xd6, 0xb5,
+		    0x92, 0x61, 0x91, 0xac, 0x36, 0xfa, 0x56, 0x0d,
+		    0x38, 0x32, 0x18, 0x85, 0x08, 0x58, 0x37, 0xf0,
+		    0x4b, 0xdb, 0x59, 0xe7, 0xa4, 0x34, 0xc0, 0x1b,
+		    0x01, 0xaf, 0x2d, 0xde, 0xa1, 0xaa, 0x5d, 0xd3,
+		    0xec, 0xe1, 0xd4, 0xf7, 0xe6, 0x54, 0x68, 0xf0,
+		    0x51, 0x97, 0xa7, 0x89, 0xea, 0x24, 0xad, 0xd3,
+		    0x6e, 0x47, 0x93, 0x8b, 0x4b, 0xb4, 0xf7, 0x1c,
+		    0x42, 0x06, 0x67, 0xe8, 0x99, 0xf6, 0xf5, 0x7b,
+		    0x85, 0xb5, 0x65, 0xb5, 0xb5, 0xd2, 0x37, 0xf5,
+		    0xf3, 0x02, 0xa6, 0x4d, 0x11, 0xa7, 0xdc, 0x51,
+		    0x09, 0x7f, 0xa0, 0xd8, 0x88, 0x1c, 0x13, 0x71,
+		    0xae, 0x9c, 0xb7, 0x7b, 0x34, 0xd6, 0x4e, 0x68,
+		    0x26, 0x83, 0x51, 0xaf, 0x1d, 0xee, 0x8b, 0xbb,
+		    0x69, 0x43, 0x2b, 0x9e, 0x8a, 0xbc, 0x02, 0x0e,
+		    0xa0, 0x1b, 0xe0, 0xa8, 0x5f, 0x6f, 0xaf, 0x1b,
+		    0x8f, 0xe7, 0x64, 0x71, 0x74, 0x11, 0x7e, 0xa8,
+		    0xd8, 0xf9, 0x97, 0x06, 0xc3, 0xb6, 0xfb, 0xfb,
+		    0xb7, 0x3d, 0x35, 0x9d, 0x3b, 0x52, 0xed, 0x54,
+		    0xca, 0xf4, 0x81, 0x01, 0x2d, 0x1b, 0xc3, 0xa7,
+		    0x00, 0x3d, 0x1a, 0x39, 0x54, 0xe1, 0xf6, 0xff,
+		    0xed, 0x6f, 0x0b, 0x5a, 0x68, 0xda, 0x58, 0xdd,
+		    0xa9, 0xcf, 0x5c, 0x4a, 0xe5, 0x09, 0x4e, 0xde,
+		    0x9d, 0xbc, 0x3e, 0xee, 0x5a, 0x00, 0x3b, 0x2c,
+		    0x87, 0x10, 0x65, 0x60, 0xdd, 0xd7, 0x56, 0xd1,
+		    0x4c, 0x64, 0x45, 0xe4, 0x21, 0xec, 0x78, 0xf8,
+		    0x25, 0x7a, 0x3e, 0x16, 0x5d, 0x09, 0x53, 0x14,
+		    0xbe, 0x4f, 0xae, 0x87, 0xd8, 0xd1, 0xaa, 0x3c,
+		    0xf6, 0x3e, 0xa4, 0x70, 0x8c, 0x5e, 0x70, 0xa4,
+		    0xb3, 0x6b, 0x66, 0x73, 0xd3, 0xbf, 0x31, 0x06,
+		    0x19, 0x62, 0x93, 0x15, 0xf2, 0x86, 0xe4, 0x52,
+		    0x7e, 0x53, 0x4c, 0x12, 0x38, 0xcc, 0x34, 0x7d,
+		    0x57, 0xf6, 0x42, 0x93, 0x8a, 0xc4, 0xee, 0x5c,
+		    0x8a, 0xe1, 0x52, 0x8f, 0x56, 0x64, 0xf6, 0xa6,
+		    0xd1, 0x91, 0x57, 0x70, 0xcd, 0x11, 0x76, 0xf5,
+		    0x59, 0x60, 0x60, 0x3c, 0xc1, 0xc3, 0x0b, 0x7f,
+		    0x58, 0x1a, 0x50, 0x91, 0xf1, 0x68, 0x8f, 0x6e,
+		    0x74, 0x74, 0xa8, 0x51, 0x0b, 0xf7, 0x7a, 0x98,
+		    0x37, 0xf2, 0x0a, 0x0e, 0xa4, 0x97, 0x04, 0xb8,
+		    0x9b, 0xfd, 0xa0, 0xea, 0xf7, 0x0d, 0xe1, 0xdb,
+		    0x03, 0xf0, 0x31, 0x29, 0xf8, 0xdd, 0x6b, 0x8b,
+		    0x5d, 0xd8, 0x59, 0xa9, 0x29, 0xcf, 0x9a, 0x79,
+		    0x89, 0x19, 0x63, 0x46, 0x09, 0x79, 0x6a, 0x11,
+		    0xda, 0x63, 0x68, 0x48, 0x77, 0x23, 0xfb, 0x7d,
+		    0x3a, 0x43, 0xcb, 0x02, 0x3b, 0x7a, 0x6d, 0x10,
+		    0x2a, 0x9e, 0xac, 0xf1, 0xd4, 0x19, 0xf8, 0x23,
+		    0x64, 0x1d, 0x2c, 0x5f, 0xf2, 0xb0, 0x5c, 0x23,
+		    0x27, 0xf7, 0x27, 0x30, 0x16, 0x37, 0xb1, 0x90,
+		    0xab, 0x38, 0xfb, 0x55, 0xcd, 0x78, 0x58, 0xd4,
+		    0x7d, 0x43, 0xf6, 0x45, 0x5e, 0x55, 0x8d, 0xb1,
+		    0x02, 0x65, 0x58, 0xb4, 0x13, 0x4b, 0x36, 0xf7,
+		    0xcc, 0xfe, 0x3d, 0x0b, 0x82, 0xe2, 0x12, 0x11,
+		    0xbb, 0xe6, 0xb8, 0x3a, 0x48, 0x71, 0xc7, 0x50,
+		    0x06, 0x16, 0x3a, 0xe6, 0x7c, 0x05, 0xc7, 0xc8,
+		    0x4d, 0x2f, 0x08, 0x6a, 0x17, 0x9a, 0x95, 0x97,
+		    0x50, 0x68, 0xdc, 0x28, 0x18, 0xc4, 0x61, 0x38,
+		    0xb9, 0xe0, 0x3e, 0x78, 0xdb, 0x29, 0xe0, 0x9f,
+		    0x52, 0xdd, 0xf8, 0x4f, 0x91, 0xc1, 0xd0, 0x33,
+		    0xa1, 0x7a, 0x8e, 0x30, 0x13, 0x82, 0x07, 0x9f,
+		    0xd3, 0x31, 0x0f, 0x23, 0xbe, 0x32, 0x5a, 0x75,
+		    0xcf, 0x96, 0xb2, 0xec, 0xb5, 0x32, 0xac, 0x21,
+		    0xd1, 0x82, 0x33, 0xd3, 0x15, 0x74, 0xbd, 0x90,
+		    0xf1, 0x2c, 0xe6, 0x5f, 0x8d, 0xe3, 0x02, 0xe8,
+		    0xe9, 0xc4, 0xca, 0x96, 0xeb, 0x0e, 0xbc, 0x91,
+		    0xf4, 0xb9, 0xea, 0xd9, 0x1b, 0x75, 0xbd, 0xe1,
+		    0xac, 0x2a, 0x05, 0x37, 0x52, 0x9b, 0x1b, 0x3f,
+		    0x5a, 0xdc, 0x21, 0xc3, 0x98, 0xbb, 0xaf, 0xa3,
+		    0xf2, 0x00, 0xbf, 0x0d, 0x30, 0x89, 0x05, 0xcc,
+		    0xa5, 0x76, 0xf5, 0x06, 0xf0, 0xc6, 0x54, 0x8a,
+		    0x5d, 0xd4, 0x1e, 0xc1, 0xf2, 0xce, 0xb0, 0x62,
+		    0xc8, 0xfc, 0x59, 0x42, 0x9a, 0x90, 0x60, 0x55,
+		    0xfe, 0x88, 0xa5, 0x8b, 0xb8, 0x33, 0x0c, 0x23,
+		    0x24, 0x0d, 0x15, 0x70, 0x37, 0x1e, 0x3d, 0xf6,
+		    0xd2, 0xea, 0x92, 0x10, 0xb2, 0xc4, 0x51, 0xac,
+		    0xf2, 0xac, 0xf3, 0x6b, 0x6c, 0xaa, 0xcf, 0x12,
+		    0xc5, 0x6c, 0x90, 0x50, 0xb5, 0x0c, 0xfc, 0x1a,
+		    0x15, 0x52, 0xe9, 0x26, 0xc6, 0x52, 0xa4, 0xe7,
+		    0x81, 0x69, 0xe1, 0xe7, 0x9e, 0x30, 0x01, 0xec,
+		    0x84, 0x89, 0xb2, 0x0d, 0x66, 0xdd, 0xce, 0x28,
+		    0x5c, 0xec, 0x98, 0x46, 0x68, 0x21, 0x9f, 0x88,
+		    0x3f, 0x1f, 0x42, 0x77, 0xce, 0xd0, 0x61, 0xd4,
+		    0x20, 0xa7, 0xff, 0x53, 0xad, 0x37, 0xd0, 0x17,
+		    0x35, 0xc9, 0xfc, 0xba, 0x0a, 0x78, 0x3f, 0xf2,
+		    0xcc, 0x86, 0x89, 0xe8, 0x4b, 0x3c, 0x48, 0x33,
+		    0x09, 0x7f, 0xc6, 0xc0, 0xdd, 0xb8, 0xfd, 0x7a,
+		    0x66, 0x66, 0x65, 0xeb, 0x47, 0xa7, 0x04, 0x28,
+		    0xa3, 0x19, 0x8e, 0xa9, 0xb1, 0x13, 0x67, 0x62,
+		    0x70, 0xcf, 0xd6 }
+}, { /* wycheproof - rfc7539 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x07, 0x00, 0x00, 0x00, 0x40, 0x41, 0x42, 0x43,
+		    0x44, 0x45, 0x46, 0x47 },
+	.nlen	= 12,
+	.assoc	= { 0x50, 0x51, 0x52, 0x53, 0xc0, 0xc1, 0xc2, 0xc3,
+		    0xc4, 0xc5, 0xc6, 0xc7 },
+	.alen	= 12,
+	.input	= { 0x4c, 0x61, 0x64, 0x69, 0x65, 0x73, 0x20, 0x61,
+		    0x6e, 0x64, 0x20, 0x47, 0x65, 0x6e, 0x74, 0x6c,
+		    0x65, 0x6d, 0x65, 0x6e, 0x20, 0x6f, 0x66, 0x20,
+		    0x74, 0x68, 0x65, 0x20, 0x63, 0x6c, 0x61, 0x73,
+		    0x73, 0x20, 0x6f, 0x66, 0x20, 0x27, 0x39, 0x39,
+		    0x3a, 0x20, 0x49, 0x66, 0x20, 0x49, 0x20, 0x63,
+		    0x6f, 0x75, 0x6c, 0x64, 0x20, 0x6f, 0x66, 0x66,
+		    0x65, 0x72, 0x20, 0x79, 0x6f, 0x75, 0x20, 0x6f,
+		    0x6e, 0x6c, 0x79, 0x20, 0x6f, 0x6e, 0x65, 0x20,
+		    0x74, 0x69, 0x70, 0x20, 0x66, 0x6f, 0x72, 0x20,
+		    0x74, 0x68, 0x65, 0x20, 0x66, 0x75, 0x74, 0x75,
+		    0x72, 0x65, 0x2c, 0x20, 0x73, 0x75, 0x6e, 0x73,
+		    0x63, 0x72, 0x65, 0x65, 0x6e, 0x20, 0x77, 0x6f,
+		    0x75, 0x6c, 0x64, 0x20, 0x62, 0x65, 0x20, 0x69,
+		    0x74, 0x2e },
+	.ilen	= 114,
+	.result	= { 0xd3, 0x1a, 0x8d, 0x34, 0x64, 0x8e, 0x60, 0xdb,
+		    0x7b, 0x86, 0xaf, 0xbc, 0x53, 0xef, 0x7e, 0xc2,
+		    0xa4, 0xad, 0xed, 0x51, 0x29, 0x6e, 0x08, 0xfe,
+		    0xa9, 0xe2, 0xb5, 0xa7, 0x36, 0xee, 0x62, 0xd6,
+		    0x3d, 0xbe, 0xa4, 0x5e, 0x8c, 0xa9, 0x67, 0x12,
+		    0x82, 0xfa, 0xfb, 0x69, 0xda, 0x92, 0x72, 0x8b,
+		    0x1a, 0x71, 0xde, 0x0a, 0x9e, 0x06, 0x0b, 0x29,
+		    0x05, 0xd6, 0xa5, 0xb6, 0x7e, 0xcd, 0x3b, 0x36,
+		    0x92, 0xdd, 0xbd, 0x7f, 0x2d, 0x77, 0x8b, 0x8c,
+		    0x98, 0x03, 0xae, 0xe3, 0x28, 0x09, 0x1b, 0x58,
+		    0xfa, 0xb3, 0x24, 0xe4, 0xfa, 0xd6, 0x75, 0x94,
+		    0x55, 0x85, 0x80, 0x8b, 0x48, 0x31, 0xd7, 0xbc,
+		    0x3f, 0xf4, 0xde, 0xf0, 0x8e, 0x4b, 0x7a, 0x9d,
+		    0xe5, 0x76, 0xd2, 0x65, 0x86, 0xce, 0xc6, 0x4b,
+		    0x61, 0x16, 0x1a, 0xe1, 0x0b, 0x59, 0x4f, 0x09,
+		    0xe2, 0x6a, 0x7e, 0x90, 0x2e, 0xcb, 0xd0, 0x60,
+		    0x06, 0x91 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0xba, 0x31, 0x92, 0xc8, 0x03, 0xce, 0x96,
+		    0x5e, 0xa3, 0x71, 0xd5, 0xff, 0x07, 0x3c, 0xf0,
+		    0xf4, 0x3b, 0x6a, 0x2a, 0xb5, 0x76, 0xb2, 0x08,
+		    0x42, 0x6e, 0x11, 0x40, 0x9c, 0x09, 0xb9, 0xb0 },
+	.nonce	= { 0x4d, 0xa5, 0xbf, 0x8d, 0xfd, 0x58, 0x52, 0xc1,
+		    0xea, 0x12, 0x37, 0x9d },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= "",
+	.ilen	= 0,
+	.result	= { 0x76, 0xac, 0xb3, 0x42, 0xcf, 0x31, 0x66, 0xa5,
+		    0xb6, 0x3c, 0x0c, 0x0e, 0xa1, 0x38, 0x3c, 0x8d }
+}, { /* wycheproof - misc */
+	.key	= { 0x7a, 0x4c, 0xd7, 0x59, 0x17, 0x2e, 0x02, 0xeb,
+		    0x20, 0x4d, 0xb2, 0xc3, 0xf5, 0xc7, 0x46, 0x22,
+		    0x7d, 0xf5, 0x84, 0xfc, 0x13, 0x45, 0x19, 0x63,
+		    0x91, 0xdb, 0xb9, 0x57, 0x7a, 0x25, 0x07, 0x42 },
+	.nonce	= { 0xa9, 0x2e, 0xf0, 0xac, 0x99, 0x1d, 0xd5, 0x16,
+		    0xa3, 0xc6, 0xf6, 0x89 },
+	.nlen	= 12,
+	.assoc	= { 0xbd, 0x50, 0x67, 0x64, 0xf2, 0xd2, 0xc4, 0x10 },
+	.alen	= 8,
+	.input	= "",
+	.ilen	= 0,
+	.result	= { 0x90, 0x6f, 0xa6, 0x28, 0x4b, 0x52, 0xf8, 0x7b,
+		    0x73, 0x59, 0xcb, 0xaa, 0x75, 0x63, 0xc7, 0x09 }
+}, { /* wycheproof - misc */
+	.key	= { 0xcc, 0x56, 0xb6, 0x80, 0x55, 0x2e, 0xb7, 0x50,
+		    0x08, 0xf5, 0x48, 0x4b, 0x4c, 0xb8, 0x03, 0xfa,
+		    0x50, 0x63, 0xeb, 0xd6, 0xea, 0xb9, 0x1f, 0x6a,
+		    0xb6, 0xae, 0xf4, 0x91, 0x6a, 0x76, 0x62, 0x73 },
+	.nonce	= { 0x99, 0xe2, 0x3e, 0xc4, 0x89, 0x85, 0xbc, 0xcd,
+		    0xee, 0xab, 0x60, 0xf1 },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x2a },
+	.ilen	= 1,
+	.result	= { 0x3a, 0xca, 0xc2, 0x7d, 0xec, 0x09, 0x68, 0x80,
+		    0x1e, 0x9f, 0x6e, 0xde, 0xd6, 0x9d, 0x80, 0x75,
+		    0x22 }
+}, { /* wycheproof - misc */
+	.key	= { 0x46, 0xf0, 0x25, 0x49, 0x65, 0xf7, 0x69, 0xd5,
+		    0x2b, 0xdb, 0x4a, 0x70, 0xb4, 0x43, 0x19, 0x9f,
+		    0x8e, 0xf2, 0x07, 0x52, 0x0d, 0x12, 0x20, 0xc5,
+		    0x5e, 0x4b, 0x70, 0xf0, 0xfd, 0xa6, 0x20, 0xee },
+	.nonce	= { 0xab, 0x0d, 0xca, 0x71, 0x6e, 0xe0, 0x51, 0xd2,
+		    0x78, 0x2f, 0x44, 0x03 },
+	.nlen	= 12,
+	.assoc	= { 0x91, 0xca, 0x6c, 0x59, 0x2c, 0xbc, 0xca, 0x53 },
+	.alen	= 8,
+	.input	= { 0x51 },
+	.ilen	= 1,
+	.result	= { 0xc4, 0x16, 0x83, 0x10, 0xca, 0x45, 0xb1, 0xf7,
+		    0xc6, 0x6c, 0xad, 0x4e, 0x99, 0xe4, 0x3f, 0x72,
+		    0xb9 }
+}, { /* wycheproof - misc */
+	.key	= { 0x2f, 0x7f, 0x7e, 0x4f, 0x59, 0x2b, 0xb3, 0x89,
+		    0x19, 0x49, 0x89, 0x74, 0x35, 0x07, 0xbf, 0x3e,
+		    0xe9, 0xcb, 0xde, 0x17, 0x86, 0xb6, 0x69, 0x5f,
+		    0xe6, 0xc0, 0x25, 0xfd, 0x9b, 0xa4, 0xc1, 0x00 },
+	.nonce	= { 0x46, 0x1a, 0xf1, 0x22, 0xe9, 0xf2, 0xe0, 0x34,
+		    0x7e, 0x03, 0xf2, 0xdb },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x5c, 0x60 },
+	.ilen	= 2,
+	.result	= { 0x4d, 0x13, 0x91, 0xe8, 0xb6, 0x1e, 0xfb, 0x39,
+		    0xc1, 0x22, 0x19, 0x54, 0x53, 0x07, 0x7b, 0x22,
+		    0xe5, 0xe2 }
+}, { /* wycheproof - misc */
+	.key	= { 0xc8, 0x83, 0x3d, 0xce, 0x5e, 0xa9, 0xf2, 0x48,
+		    0xaa, 0x20, 0x30, 0xea, 0xcf, 0xe7, 0x2b, 0xff,
+		    0xe6, 0x9a, 0x62, 0x0c, 0xaf, 0x79, 0x33, 0x44,
+		    0xe5, 0x71, 0x8f, 0xe0, 0xd7, 0xab, 0x1a, 0x58 },
+	.nonce	= { 0x61, 0x54, 0x6b, 0xa5, 0xf1, 0x72, 0x05, 0x90,
+		    0xb6, 0x04, 0x0a, 0xc6 },
+	.nlen	= 12,
+	.assoc	= { 0x88, 0x36, 0x4f, 0xc8, 0x06, 0x05, 0x18, 0xbf },
+	.alen	= 8,
+	.input	= { 0xdd, 0xf2 },
+	.ilen	= 2,
+	.result	= { 0xb6, 0x0d, 0xea, 0xd0, 0xfd, 0x46, 0x97, 0xec,
+		    0x2e, 0x55, 0x58, 0x23, 0x77, 0x19, 0xd0, 0x24,
+		    0x37, 0xa2 }
+}, { /* wycheproof - misc */
+	.key	= { 0x55, 0x56, 0x81, 0x58, 0xd3, 0xa6, 0x48, 0x3f,
+		    0x1f, 0x70, 0x21, 0xea, 0xb6, 0x9b, 0x70, 0x3f,
+		    0x61, 0x42, 0x51, 0xca, 0xdc, 0x1a, 0xf5, 0xd3,
+		    0x4a, 0x37, 0x4f, 0xdb, 0xfc, 0x5a, 0xda, 0xc7 },
+	.nonce	= { 0x3c, 0x4e, 0x65, 0x4d, 0x66, 0x3f, 0xa4, 0x59,
+		    0x6d, 0xc5, 0x5b, 0xb7 },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xab, 0x85, 0xe9, 0xc1, 0x57, 0x17, 0x31 },
+	.ilen	= 7,
+	.result	= { 0x5d, 0xfe, 0x34, 0x40, 0xdb, 0xb3, 0xc3, 0xed,
+		    0x7a, 0x43, 0x4e, 0x26, 0x02, 0xd3, 0x94, 0x28,
+		    0x1e, 0x0a, 0xfa, 0x9f, 0xb7, 0xaa, 0x42 }
+}, { /* wycheproof - misc */
+	.key	= { 0xe3, 0xc0, 0x9e, 0x7f, 0xab, 0x1a, 0xef, 0xb5,
+		    0x16, 0xda, 0x6a, 0x33, 0x02, 0x2a, 0x1d, 0xd4,
+		    0xeb, 0x27, 0x2c, 0x80, 0xd5, 0x40, 0xc5, 0xda,
+		    0x52, 0xa7, 0x30, 0xf3, 0x4d, 0x84, 0x0d, 0x7f },
+	.nonce	= { 0x58, 0x38, 0x93, 0x75, 0xc6, 0x9e, 0xe3, 0x98,
+		    0xde, 0x94, 0x83, 0x96 },
+	.nlen	= 12,
+	.assoc	= { 0x84, 0xe4, 0x6b, 0xe8, 0xc0, 0x91, 0x90, 0x53 },
+	.alen	= 8,
+	.input	= { 0x4e, 0xe5, 0xcd, 0xa2, 0x0d, 0x42, 0x90 },
+	.ilen	= 7,
+	.result	= { 0x4b, 0xd4, 0x72, 0x12, 0x94, 0x1c, 0xe3, 0x18,
+		    0x5f, 0x14, 0x08, 0xee, 0x7f, 0xbf, 0x18, 0xf5,
+		    0xab, 0xad, 0x6e, 0x22, 0x53, 0xa1, 0xba }
+}, { /* wycheproof - misc */
+	.key	= { 0x51, 0xe4, 0xbf, 0x2b, 0xad, 0x92, 0xb7, 0xaf,
+		    0xf1, 0xa4, 0xbc, 0x05, 0x55, 0x0b, 0xa8, 0x1d,
+		    0xf4, 0xb9, 0x6f, 0xab, 0xf4, 0x1c, 0x12, 0xc7,
+		    0xb0, 0x0e, 0x60, 0xe4, 0x8d, 0xb7, 0xe1, 0x52 },
+	.nonce	= { 0x4f, 0x07, 0xaf, 0xed, 0xfd, 0xc3, 0xb6, 0xc2,
+		    0x36, 0x18, 0x23, 0xd3 },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xbe, 0x33, 0x08, 0xf7, 0x2a, 0x2c, 0x6a, 0xed },
+	.ilen	= 8,
+	.result	= { 0x8e, 0x94, 0x39, 0xa5, 0x6e, 0xee, 0xc8, 0x17,
+		    0xfb, 0xe8, 0xa6, 0xed, 0x8f, 0xab, 0xb1, 0x93,
+		    0x75, 0x39, 0xdd, 0x6c, 0x00, 0xe9, 0x00, 0x21 }
+}, { /* wycheproof - misc */
+	.key	= { 0x11, 0x31, 0xc1, 0x41, 0x85, 0x77, 0xa0, 0x54,
+		    0xde, 0x7a, 0x4a, 0xc5, 0x51, 0x95, 0x0f, 0x1a,
+		    0x05, 0x3f, 0x9a, 0xe4, 0x6e, 0x5b, 0x75, 0xfe,
+		    0x4a, 0xbd, 0x56, 0x08, 0xd7, 0xcd, 0xda, 0xdd },
+	.nonce	= { 0xb4, 0xea, 0x66, 0x6e, 0xe1, 0x19, 0x56, 0x33,
+		    0x66, 0x48, 0x4a, 0x78 },
+	.nlen	= 12,
+	.assoc	= { 0x66, 0xc0, 0xae, 0x70, 0x07, 0x6c, 0xb1, 0x4d },
+	.alen	= 8,
+	.input	= { 0xa4, 0xc9, 0xc2, 0x80, 0x1b, 0x71, 0xf7, 0xdf },
+	.ilen	= 8,
+	.result	= { 0xb9, 0xb9, 0x10, 0x43, 0x3a, 0xf0, 0x52, 0xb0,
+		    0x45, 0x30, 0xf5, 0x1a, 0xee, 0xe0, 0x24, 0xe0,
+		    0xa4, 0x45, 0xa6, 0x32, 0x8f, 0xa6, 0x7a, 0x18 }
+}, { /* wycheproof - misc */
+	.key	= { 0x99, 0xb6, 0x2b, 0xd5, 0xaf, 0xbe, 0x3f, 0xb0,
+		    0x15, 0xbd, 0xe9, 0x3f, 0x0a, 0xbf, 0x48, 0x39,
+		    0x57, 0xa1, 0xc3, 0xeb, 0x3c, 0xa5, 0x9c, 0xb5,
+		    0x0b, 0x39, 0xf7, 0xf8, 0xa9, 0xcc, 0x51, 0xbe },
+	.nonce	= { 0x9a, 0x59, 0xfc, 0xe2, 0x6d, 0xf0, 0x00, 0x5e,
+		    0x07, 0x53, 0x86, 0x56 },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x42, 0xba, 0xae, 0x59, 0x78, 0xfe, 0xaf, 0x5c,
+		    0x36, 0x8d, 0x14, 0xe0 },
+	.ilen	= 12,
+	.result	= { 0xff, 0x7d, 0xc2, 0x03, 0xb2, 0x6c, 0x46, 0x7a,
+		    0x6b, 0x50, 0xdb, 0x33, 0x57, 0x8c, 0x0f, 0x27,
+		    0x58, 0xc2, 0xe1, 0x4e, 0x36, 0xd4, 0xfc, 0x10,
+		    0x6d, 0xcb, 0x29, 0xb4 }
+}, { /* wycheproof - misc */
+	.key	= { 0x85, 0xf3, 0x5b, 0x62, 0x82, 0xcf, 0xf4, 0x40,
+		    0xbc, 0x10, 0x20, 0xc8, 0x13, 0x6f, 0xf2, 0x70,
+		    0x31, 0x11, 0x0f, 0xa6, 0x3e, 0xc1, 0x6f, 0x1e,
+		    0x82, 0x51, 0x18, 0xb0, 0x06, 0xb9, 0x12, 0x57 },
+	.nonce	= { 0x58, 0xdb, 0xd4, 0xad, 0x2c, 0x4a, 0xd3, 0x5d,
+		    0xd9, 0x06, 0xe9, 0xce },
+	.nlen	= 12,
+	.assoc	= { 0xa5, 0x06, 0xe1, 0xa5, 0xc6, 0x90, 0x93, 0xf9 },
+	.alen	= 8,
+	.input	= { 0xfd, 0xc8, 0x5b, 0x94, 0xa4, 0xb2, 0xa6, 0xb7,
+		    0x59, 0xb1, 0xa0, 0xda },
+	.ilen	= 12,
+	.result	= { 0x9f, 0x88, 0x16, 0xde, 0x09, 0x94, 0xe9, 0x38,
+		    0xd9, 0xe5, 0x3f, 0x95, 0xd0, 0x86, 0xfc, 0x6c,
+		    0x9d, 0x8f, 0xa9, 0x15, 0xfd, 0x84, 0x23, 0xa7,
+		    0xcf, 0x05, 0x07, 0x2f }
+}, { /* wycheproof - misc */
+	.key	= { 0x67, 0x11, 0x96, 0x27, 0xbd, 0x98, 0x8e, 0xda,
+		    0x90, 0x62, 0x19, 0xe0, 0x8c, 0x0d, 0x0d, 0x77,
+		    0x9a, 0x07, 0xd2, 0x08, 0xce, 0x8a, 0x4f, 0xe0,
+		    0x70, 0x9a, 0xf7, 0x55, 0xee, 0xec, 0x6d, 0xcb },
+	.nonce	= { 0x68, 0xab, 0x7f, 0xdb, 0xf6, 0x19, 0x01, 0xda,
+		    0xd4, 0x61, 0xd2, 0x3c },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x51, 0xf8, 0xc1, 0xf7, 0x31, 0xea, 0x14, 0xac,
+		    0xdb, 0x21, 0x0a, 0x6d, 0x97, 0x3e, 0x07 },
+	.ilen	= 15,
+	.result	= { 0x0b, 0x29, 0x63, 0x8e, 0x1f, 0xbd, 0xd6, 0xdf,
+		    0x53, 0x97, 0x0b, 0xe2, 0x21, 0x00, 0x42, 0x2a,
+		    0x91, 0x34, 0x08, 0x7d, 0x67, 0xa4, 0x6e, 0x79,
+		    0x17, 0x8d, 0x0a, 0x93, 0xf5, 0xe1, 0xd2 }
+}, { /* wycheproof - misc */
+	.key	= { 0xe6, 0xf1, 0x11, 0x8d, 0x41, 0xe4, 0xb4, 0x3f,
+		    0xb5, 0x82, 0x21, 0xb7, 0xed, 0x79, 0x67, 0x38,
+		    0x34, 0xe0, 0xd8, 0xac, 0x5c, 0x4f, 0xa6, 0x0b,
+		    0xbc, 0x8b, 0xc4, 0x89, 0x3a, 0x58, 0x89, 0x4d },
+	.nonce	= { 0xd9, 0x5b, 0x32, 0x43, 0xaf, 0xae, 0xf7, 0x14,
+		    0xc5, 0x03, 0x5b, 0x6a },
+	.nlen	= 12,
+	.assoc	= { 0x64, 0x53, 0xa5, 0x33, 0x84, 0x63, 0x22, 0x12 },
+	.alen	= 8,
+	.input	= { 0x97, 0x46, 0x9d, 0xa6, 0x67, 0xd6, 0x11, 0x0f,
+		    0x9c, 0xbd, 0xa1, 0xd1, 0xa2, 0x06, 0x73 },
+	.ilen	= 15,
+	.result	= { 0x32, 0xdb, 0x66, 0xc4, 0xa3, 0x81, 0x9d, 0x81,
+		    0x55, 0x74, 0x55, 0xe5, 0x98, 0x0f, 0xed, 0xfe,
+		    0xae, 0x30, 0xde, 0xc9, 0x4e, 0x6a, 0xd3, 0xa9,
+		    0xee, 0xa0, 0x6a, 0x0d, 0x70, 0x39, 0x17 }
+}, { /* wycheproof - misc */
+	.key	= { 0x59, 0xd4, 0xea, 0xfb, 0x4d, 0xe0, 0xcf, 0xc7,
+		    0xd3, 0xdb, 0x99, 0xa8, 0xf5, 0x4b, 0x15, 0xd7,
+		    0xb3, 0x9f, 0x0a, 0xcc, 0x8d, 0xa6, 0x97, 0x63,
+		    0xb0, 0x19, 0xc1, 0x69, 0x9f, 0x87, 0x67, 0x4a },
+	.nonce	= { 0x2f, 0xcb, 0x1b, 0x38, 0xa9, 0x9e, 0x71, 0xb8,
+		    0x47, 0x40, 0xad, 0x9b },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x54, 0x9b, 0x36, 0x5a, 0xf9, 0x13, 0xf3, 0xb0,
+		    0x81, 0x13, 0x1c, 0xcb, 0x6b, 0x82, 0x55, 0x88 },
+	.ilen	= 16,
+	.result	= { 0xe9, 0x11, 0x0e, 0x9f, 0x56, 0xab, 0x3c, 0xa4,
+		    0x83, 0x50, 0x0c, 0xea, 0xba, 0xb6, 0x7a, 0x13,
+		    0x83, 0x6c, 0xca, 0xbf, 0x15, 0xa6, 0xa2, 0x2a,
+		    0x51, 0xc1, 0x07, 0x1c, 0xfa, 0x68, 0xfa, 0x0c }
+}, { /* wycheproof - misc */
+	.key	= { 0xb9, 0x07, 0xa4, 0x50, 0x75, 0x51, 0x3f, 0xe8,
+		    0xa8, 0x01, 0x9e, 0xde, 0xe3, 0xf2, 0x59, 0x14,
+		    0x87, 0xb2, 0xa0, 0x30, 0xb0, 0x3c, 0x6e, 0x1d,
+		    0x77, 0x1c, 0x86, 0x25, 0x71, 0xd2, 0xea, 0x1e },
+	.nonce	= { 0x11, 0x8a, 0x69, 0x64, 0xc2, 0xd3, 0xe3, 0x80,
+		    0x07, 0x1f, 0x52, 0x66 },
+	.nlen	= 12,
+	.assoc	= { 0x03, 0x45, 0x85, 0x62, 0x1a, 0xf8, 0xd7, 0xff },
+	.alen	= 8,
+	.input	= { 0x55, 0xa4, 0x65, 0x64, 0x4f, 0x5b, 0x65, 0x09,
+		    0x28, 0xcb, 0xee, 0x7c, 0x06, 0x32, 0x14, 0xd6 },
+	.ilen	= 16,
+	.result	= { 0xe4, 0xb1, 0x13, 0xcb, 0x77, 0x59, 0x45, 0xf3,
+		    0xd3, 0xa8, 0xae, 0x9e, 0xc1, 0x41, 0xc0, 0x0c,
+		    0x7c, 0x43, 0xf1, 0x6c, 0xe0, 0x96, 0xd0, 0xdc,
+		    0x27, 0xc9, 0x58, 0x49, 0xdc, 0x38, 0x3b, 0x7d }
+}, { /* wycheproof - misc */
+	.key	= { 0x3b, 0x24, 0x58, 0xd8, 0x17, 0x6e, 0x16, 0x21,
+		    0xc0, 0xcc, 0x24, 0xc0, 0xc0, 0xe2, 0x4c, 0x1e,
+		    0x80, 0xd7, 0x2f, 0x7e, 0xe9, 0x14, 0x9a, 0x4b,
+		    0x16, 0x61, 0x76, 0x62, 0x96, 0x16, 0xd0, 0x11 },
+	.nonce	= { 0x45, 0xaa, 0xa3, 0xe5, 0xd1, 0x6d, 0x2d, 0x42,
+		    0xdc, 0x03, 0x44, 0x5d },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x3f, 0xf1, 0x51, 0x4b, 0x1c, 0x50, 0x39, 0x15,
+		    0x91, 0x8f, 0x0c, 0x0c, 0x31, 0x09, 0x4a, 0x6e,
+		    0x1f },
+	.ilen	= 17,
+	.result	= { 0x02, 0xcc, 0x3a, 0xcb, 0x5e, 0xe1, 0xfc, 0xdd,
+		    0x12, 0xa0, 0x3b, 0xb8, 0x57, 0x97, 0x64, 0x74,
+		    0xd3, 0xd8, 0x3b, 0x74, 0x63, 0xa2, 0xc3, 0x80,
+		    0x0f, 0xe9, 0x58, 0xc2, 0x8e, 0xaa, 0x29, 0x08,
+		    0x13 }
+}, { /* wycheproof - misc */
+	.key	= { 0xf6, 0x0c, 0x6a, 0x1b, 0x62, 0x57, 0x25, 0xf7,
+		    0x6c, 0x70, 0x37, 0xb4, 0x8f, 0xe3, 0x57, 0x7f,
+		    0xa7, 0xf7, 0xb8, 0x7b, 0x1b, 0xd5, 0xa9, 0x82,
+		    0x17, 0x6d, 0x18, 0x23, 0x06, 0xff, 0xb8, 0x70 },
+	.nonce	= { 0xf0, 0x38, 0x4f, 0xb8, 0x76, 0x12, 0x14, 0x10,
+		    0x63, 0x3d, 0x99, 0x3d },
+	.nlen	= 12,
+	.assoc	= { 0x9a, 0xaf, 0x29, 0x9e, 0xee, 0xa7, 0x8f, 0x79 },
+	.alen	= 8,
+	.input	= { 0x63, 0x85, 0x8c, 0xa3, 0xe2, 0xce, 0x69, 0x88,
+		    0x7b, 0x57, 0x8a, 0x3c, 0x16, 0x7b, 0x42, 0x1c,
+		    0x9c },
+	.ilen	= 17,
+	.result	= { 0x35, 0x76, 0x64, 0x88, 0xd2, 0xbc, 0x7c, 0x2b,
+		    0x8d, 0x17, 0xcb, 0xbb, 0x9a, 0xbf, 0xad, 0x9e,
+		    0x6d, 0x1f, 0x39, 0x1e, 0x65, 0x7b, 0x27, 0x38,
+		    0xdd, 0xa0, 0x84, 0x48, 0xcb, 0xa2, 0x81, 0x1c,
+		    0xeb }
+}, { /* wycheproof - misc */
+	.key	= { 0x02, 0x12, 0xa8, 0xde, 0x50, 0x07, 0xed, 0x87,
+		    0xb3, 0x3f, 0x1a, 0x70, 0x90, 0xb6, 0x11, 0x4f,
+		    0x9e, 0x08, 0xce, 0xfd, 0x96, 0x07, 0xf2, 0xc2,
+		    0x76, 0xbd, 0xcf, 0xdb, 0xc5, 0xce, 0x9c, 0xd7 },
+	.nonce	= { 0xe6, 0xb1, 0xad, 0xf2, 0xfd, 0x58, 0xa8, 0x76,
+		    0x2c, 0x65, 0xf3, 0x1b },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x10, 0xf1, 0xec, 0xf9, 0xc6, 0x05, 0x84, 0x66,
+		    0x5d, 0x9a, 0xe5, 0xef, 0xe2, 0x79, 0xe7, 0xf7,
+		    0x37, 0x7e, 0xea, 0x69, 0x16, 0xd2, 0xb1, 0x11 },
+	.ilen	= 24,
+	.result	= { 0x42, 0xf2, 0x6c, 0x56, 0xcb, 0x4b, 0xe2, 0x1d,
+		    0x9d, 0x8d, 0x0c, 0x80, 0xfc, 0x99, 0xdd, 0xe0,
+		    0x0d, 0x75, 0xf3, 0x80, 0x74, 0xbf, 0xe7, 0x64,
+		    0x54, 0xaa, 0x7e, 0x13, 0xd4, 0x8f, 0xff, 0x7d,
+		    0x75, 0x57, 0x03, 0x94, 0x57, 0x04, 0x0a, 0x3a }
+}, { /* wycheproof - misc */
+	.key	= { 0xc5, 0xbc, 0x09, 0x56, 0x56, 0x46, 0xe7, 0xed,
+		    0xda, 0x95, 0x4f, 0x1f, 0x73, 0x92, 0x23, 0xda,
+		    0xda, 0x20, 0xb9, 0x5c, 0x44, 0xab, 0x03, 0x3d,
+		    0x0f, 0xae, 0x4b, 0x02, 0x83, 0xd1, 0x8b, 0xe3 },
+	.nonce	= { 0x6b, 0x28, 0x2e, 0xbe, 0xcc, 0x54, 0x1b, 0xcd,
+		    0x78, 0x34, 0xed, 0x55 },
+	.nlen	= 12,
+	.assoc	= { 0x3e, 0x8b, 0xc5, 0xad, 0xe1, 0x82, 0xff, 0x08 },
+	.alen	= 8,
+	.input	= { 0x92, 0x22, 0xf9, 0x01, 0x8e, 0x54, 0xfd, 0x6d,
+		    0xe1, 0x20, 0x08, 0x06, 0xa9, 0xee, 0x8e, 0x4c,
+		    0xc9, 0x04, 0xd2, 0x9f, 0x25, 0xcb, 0xa1, 0x93 },
+	.ilen	= 24,
+	.result	= { 0x12, 0x30, 0x32, 0x43, 0x7b, 0x4b, 0xfd, 0x69,
+		    0x20, 0xe8, 0xf7, 0xe7, 0xe0, 0x08, 0x7a, 0xe4,
+		    0x88, 0x9e, 0xbe, 0x7a, 0x0a, 0xd0, 0xe9, 0x00,
+		    0x3c, 0xf6, 0x8f, 0x17, 0x95, 0x50, 0xda, 0x63,
+		    0xd3, 0xb9, 0x6c, 0x2d, 0x55, 0x41, 0x18, 0x65 }
+}, { /* wycheproof - misc */
+	.key	= { 0x2e, 0xb5, 0x1c, 0x46, 0x9a, 0xa8, 0xeb, 0x9e,
+		    0x6c, 0x54, 0xa8, 0x34, 0x9b, 0xae, 0x50, 0xa2,
+		    0x0f, 0x0e, 0x38, 0x27, 0x11, 0xbb, 0xa1, 0x15,
+		    0x2c, 0x42, 0x4f, 0x03, 0xb6, 0x67, 0x1d, 0x71 },
+	.nonce	= { 0x04, 0xa9, 0xbe, 0x03, 0x50, 0x8a, 0x5f, 0x31,
+		    0x37, 0x1a, 0x6f, 0xd2 },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xb0, 0x53, 0x99, 0x92, 0x86, 0xa2, 0x82, 0x4f,
+		    0x42, 0xcc, 0x8c, 0x20, 0x3a, 0xb2, 0x4e, 0x2c,
+		    0x97, 0xa6, 0x85, 0xad, 0xcc, 0x2a, 0xd3, 0x26,
+		    0x62, 0x55, 0x8e, 0x55, 0xa5, 0xc7, 0x29 },
+	.ilen	= 31,
+	.result	= { 0x45, 0xc7, 0xd6, 0xb5, 0x3a, 0xca, 0xd4, 0xab,
+		    0xb6, 0x88, 0x76, 0xa6, 0xe9, 0x6a, 0x48, 0xfb,
+		    0x59, 0x52, 0x4d, 0x2c, 0x92, 0xc9, 0xd8, 0xa1,
+		    0x89, 0xc9, 0xfd, 0x2d, 0xb9, 0x17, 0x46, 0x56,
+		    0x6d, 0x3c, 0xa1, 0x0e, 0x31, 0x1b, 0x69, 0x5f,
+		    0x3e, 0xae, 0x15, 0x51, 0x65, 0x24, 0x93 }
+}, { /* wycheproof - misc */
+	.key	= { 0x7f, 0x5b, 0x74, 0xc0, 0x7e, 0xd1, 0xb4, 0x0f,
+		    0xd1, 0x43, 0x58, 0xfe, 0x2f, 0xf2, 0xa7, 0x40,
+		    0xc1, 0x16, 0xc7, 0x70, 0x65, 0x10, 0xe6, 0xa4,
+		    0x37, 0xf1, 0x9e, 0xa4, 0x99, 0x11, 0xce, 0xc4 },
+	.nonce	= { 0x47, 0x0a, 0x33, 0x9e, 0xcb, 0x32, 0x19, 0xb8,
+		    0xb8, 0x1a, 0x1f, 0x8b },
+	.nlen	= 12,
+	.assoc	= { 0x37, 0x46, 0x18, 0xa0, 0x6e, 0xa9, 0x8a, 0x48 },
+	.alen	= 8,
+	.input	= { 0xf4, 0x52, 0x06, 0xab, 0xc2, 0x55, 0x52, 0xb2,
+		    0xab, 0xc9, 0xab, 0x7f, 0xa2, 0x43, 0x03, 0x5f,
+		    0xed, 0xaa, 0xdd, 0xc3, 0xb2, 0x29, 0x39, 0x56,
+		    0xf1, 0xea, 0x6e, 0x71, 0x56, 0xe7, 0xeb },
+	.ilen	= 31,
+	.result	= { 0x46, 0xa8, 0x0c, 0x41, 0x87, 0x02, 0x47, 0x20,
+		    0x08, 0x46, 0x27, 0x58, 0x00, 0x80, 0xdd, 0xe5,
+		    0xa3, 0xf4, 0xa1, 0x10, 0x93, 0xa7, 0x07, 0x6e,
+		    0xd6, 0xf3, 0xd3, 0x26, 0xbc, 0x7b, 0x70, 0x53,
+		    0x4d, 0x4a, 0xa2, 0x83, 0x5a, 0x52, 0xe7, 0x2d,
+		    0x14, 0xdf, 0x0e, 0x4f, 0x47, 0xf2, 0x5f }
+}, { /* wycheproof - misc */
+	.key	= { 0xe1, 0x73, 0x1d, 0x58, 0x54, 0xe1, 0xb7, 0x0c,
+		    0xb3, 0xff, 0xe8, 0xb7, 0x86, 0xa2, 0xb3, 0xeb,
+		    0xf0, 0x99, 0x43, 0x70, 0x95, 0x47, 0x57, 0xb9,
+		    0xdc, 0x8c, 0x7b, 0xc5, 0x35, 0x46, 0x34, 0xa3 },
+	.nonce	= { 0x72, 0xcf, 0xd9, 0x0e, 0xf3, 0x02, 0x6c, 0xa2,
+		    0x2b, 0x7e, 0x6e, 0x6a },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xb9, 0xc5, 0x54, 0xcb, 0xc3, 0x6a, 0xc1, 0x8a,
+		    0xe8, 0x97, 0xdf, 0x7b, 0xee, 0xca, 0xc1, 0xdb,
+		    0xeb, 0x4e, 0xaf, 0xa1, 0x56, 0xbb, 0x60, 0xce,
+		    0x2e, 0x5d, 0x48, 0xf0, 0x57, 0x15, 0xe6, 0x78 },
+	.ilen	= 32,
+	.result	= { 0xea, 0x29, 0xaf, 0xa4, 0x9d, 0x36, 0xe8, 0x76,
+		    0x0f, 0x5f, 0xe1, 0x97, 0x23, 0xb9, 0x81, 0x1e,
+		    0xd5, 0xd5, 0x19, 0x93, 0x4a, 0x44, 0x0f, 0x50,
+		    0x81, 0xac, 0x43, 0x0b, 0x95, 0x3b, 0x0e, 0x21,
+		    0x22, 0x25, 0x41, 0xaf, 0x46, 0xb8, 0x65, 0x33,
+		    0xc6, 0xb6, 0x8d, 0x2f, 0xf1, 0x08, 0xa7, 0xea }
+}, { /* wycheproof - misc */
+	.key	= { 0x27, 0xd8, 0x60, 0x63, 0x1b, 0x04, 0x85, 0xa4,
+		    0x10, 0x70, 0x2f, 0xea, 0x61, 0xbc, 0x87, 0x3f,
+		    0x34, 0x42, 0x26, 0x0c, 0xad, 0xed, 0x4a, 0xbd,
+		    0xe2, 0x5b, 0x78, 0x6a, 0x2d, 0x97, 0xf1, 0x45 },
+	.nonce	= { 0x26, 0x28, 0x80, 0xd4, 0x75, 0xf3, 0xda, 0xc5,
+		    0x34, 0x0d, 0xd1, 0xb8 },
+	.nlen	= 12,
+	.assoc	= { 0x23, 0x33, 0xe5, 0xce, 0x0f, 0x93, 0xb0, 0x59 },
+	.alen	= 8,
+	.input	= { 0x6b, 0x26, 0x04, 0x99, 0x6c, 0xd3, 0x0c, 0x14,
+		    0xa1, 0x3a, 0x52, 0x57, 0xed, 0x6c, 0xff, 0xd3,
+		    0xbc, 0x5e, 0x29, 0xd6, 0xb9, 0x7e, 0xb1, 0x79,
+		    0x9e, 0xb3, 0x35, 0xe2, 0x81, 0xea, 0x45, 0x1e },
+	.ilen	= 32,
+	.result	= { 0x6d, 0xad, 0x63, 0x78, 0x97, 0x54, 0x4d, 0x8b,
+		    0xf6, 0xbe, 0x95, 0x07, 0xed, 0x4d, 0x1b, 0xb2,
+		    0xe9, 0x54, 0xbc, 0x42, 0x7e, 0x5d, 0xe7, 0x29,
+		    0xda, 0xf5, 0x07, 0x62, 0x84, 0x6f, 0xf2, 0xf4,
+		    0x7b, 0x99, 0x7d, 0x93, 0xc9, 0x82, 0x18, 0x9d,
+		    0x70, 0x95, 0xdc, 0x79, 0x4c, 0x74, 0x62, 0x32 }
+}, { /* wycheproof - misc */
+	.key	= { 0xcf, 0x0d, 0x40, 0xa4, 0x64, 0x4e, 0x5f, 0x51,
+		    0x81, 0x51, 0x65, 0xd5, 0x30, 0x1b, 0x22, 0x63,
+		    0x1f, 0x45, 0x44, 0xc4, 0x9a, 0x18, 0x78, 0xe3,
+		    0xa0, 0xa5, 0xe8, 0xe1, 0xaa, 0xe0, 0xf2, 0x64 },
+	.nonce	= { 0xe7, 0x4a, 0x51, 0x5e, 0x7e, 0x21, 0x02, 0xb9,
+		    0x0b, 0xef, 0x55, 0xd2 },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x97, 0x3d, 0x0c, 0x75, 0x38, 0x26, 0xba, 0xe4,
+		    0x66, 0xcf, 0x9a, 0xbb, 0x34, 0x93, 0x15, 0x2e,
+		    0x9d, 0xe7, 0x81, 0x9e, 0x2b, 0xd0, 0xc7, 0x11,
+		    0x71, 0x34, 0x6b, 0x4d, 0x2c, 0xeb, 0xf8, 0x04,
+		    0x1a, 0xa3, 0xce, 0xdc, 0x0d, 0xfd, 0x7b, 0x46,
+		    0x7e, 0x26, 0x22, 0x8b, 0xc8, 0x6c, 0x9a },
+	.ilen	= 47,
+	.result	= { 0xfb, 0xa7, 0x8a, 0xe4, 0xf9, 0xd8, 0x08, 0xa6,
+		    0x2e, 0x3d, 0xa4, 0x0b, 0xe2, 0xcb, 0x77, 0x00,
+		    0xc3, 0x61, 0x3d, 0x9e, 0xb2, 0xc5, 0x29, 0xc6,
+		    0x52, 0xe7, 0x6a, 0x43, 0x2c, 0x65, 0x8d, 0x27,
+		    0x09, 0x5f, 0x0e, 0xb8, 0xf9, 0x40, 0xc3, 0x24,
+		    0x98, 0x1e, 0xa9, 0x35, 0xe5, 0x07, 0xf9, 0x8f,
+		    0x04, 0x69, 0x56, 0xdb, 0x3a, 0x51, 0x29, 0x08,
+		    0xbd, 0x7a, 0xfc, 0x8f, 0x2a, 0xb0, 0xa9 }
+}, { /* wycheproof - misc */
+	.key	= { 0x6c, 0xbf, 0xd7, 0x1c, 0x64, 0x5d, 0x18, 0x4c,
+		    0xf5, 0xd2, 0x3c, 0x40, 0x2b, 0xdb, 0x0d, 0x25,
+		    0xec, 0x54, 0x89, 0x8c, 0x8a, 0x02, 0x73, 0xd4,
+		    0x2e, 0xb5, 0xbe, 0x10, 0x9f, 0xdc, 0xb2, 0xac },
+	.nonce	= { 0xd4, 0xd8, 0x07, 0x34, 0x16, 0x83, 0x82, 0x5b,
+		    0x31, 0xcd, 0x4d, 0x95 },
+	.nlen	= 12,
+	.assoc	= { 0xb3, 0xe4, 0x06, 0x46, 0x83, 0xb0, 0x2d, 0x84 },
+	.alen	= 8,
+	.input	= { 0xa9, 0x89, 0x95, 0x50, 0x4d, 0xf1, 0x6f, 0x74,
+		    0x8b, 0xfb, 0x77, 0x85, 0xff, 0x91, 0xee, 0xb3,
+		    0xb6, 0x60, 0xea, 0x9e, 0xd3, 0x45, 0x0c, 0x3d,
+		    0x5e, 0x7b, 0x0e, 0x79, 0xef, 0x65, 0x36, 0x59,
+		    0xa9, 0x97, 0x8d, 0x75, 0x54, 0x2e, 0xf9, 0x1c,
+		    0x45, 0x67, 0x62, 0x21, 0x56, 0x40, 0xb9 },
+	.ilen	= 47,
+	.result	= { 0xa1, 0xff, 0xed, 0x80, 0x76, 0x18, 0x29, 0xec,
+		    0xce, 0x24, 0x2e, 0x0e, 0x88, 0xb1, 0x38, 0x04,
+		    0x90, 0x16, 0xbc, 0xa0, 0x18, 0xda, 0x2b, 0x6e,
+		    0x19, 0x98, 0x6b, 0x3e, 0x31, 0x8c, 0xae, 0x8d,
+		    0x80, 0x61, 0x98, 0xfb, 0x4c, 0x52, 0x7c, 0xc3,
+		    0x93, 0x50, 0xeb, 0xdd, 0xea, 0xc5, 0x73, 0xc4,
+		    0xcb, 0xf0, 0xbe, 0xfd, 0xa0, 0xb7, 0x02, 0x42,
+		    0xc6, 0x40, 0xd7, 0xcd, 0x02, 0xd7, 0xa3 }
+}, { /* wycheproof - misc */
+	.key	= { 0x5b, 0x1d, 0x10, 0x35, 0xc0, 0xb1, 0x7e, 0xe0,
+		    0xb0, 0x44, 0x47, 0x67, 0xf8, 0x0a, 0x25, 0xb8,
+		    0xc1, 0xb7, 0x41, 0xf4, 0xb5, 0x0a, 0x4d, 0x30,
+		    0x52, 0x22, 0x6b, 0xaa, 0x1c, 0x6f, 0xb7, 0x01 },
+	.nonce	= { 0xd6, 0x10, 0x40, 0xa3, 0x13, 0xed, 0x49, 0x28,
+		    0x23, 0xcc, 0x06, 0x5b },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xd0, 0x96, 0x80, 0x31, 0x81, 0xbe, 0xef, 0x9e,
+		    0x00, 0x8f, 0xf8, 0x5d, 0x5d, 0xdc, 0x38, 0xdd,
+		    0xac, 0xf0, 0xf0, 0x9e, 0xe5, 0xf7, 0xe0, 0x7f,
+		    0x1e, 0x40, 0x79, 0xcb, 0x64, 0xd0, 0xdc, 0x8f,
+		    0x5e, 0x67, 0x11, 0xcd, 0x49, 0x21, 0xa7, 0x88,
+		    0x7d, 0xe7, 0x6e, 0x26, 0x78, 0xfd, 0xc6, 0x76,
+		    0x18, 0xf1, 0x18, 0x55, 0x86, 0xbf, 0xea, 0x9d,
+		    0x4c, 0x68, 0x5d, 0x50, 0xe4, 0xbb, 0x9a, 0x82 },
+	.ilen	= 64,
+	.result	= { 0x9a, 0x4e, 0xf2, 0x2b, 0x18, 0x16, 0x77, 0xb5,
+		    0x75, 0x5c, 0x08, 0xf7, 0x47, 0xc0, 0xf8, 0xd8,
+		    0xe8, 0xd4, 0xc1, 0x8a, 0x9c, 0xc2, 0x40, 0x5c,
+		    0x12, 0xbb, 0x51, 0xbb, 0x18, 0x72, 0xc8, 0xe8,
+		    0xb8, 0x77, 0x67, 0x8b, 0xec, 0x44, 0x2c, 0xfc,
+		    0xbb, 0x0f, 0xf4, 0x64, 0xa6, 0x4b, 0x74, 0x33,
+		    0x2c, 0xf0, 0x72, 0x89, 0x8c, 0x7e, 0x0e, 0xdd,
+		    0xf6, 0x23, 0x2e, 0xa6, 0xe2, 0x7e, 0xfe, 0x50,
+		    0x9f, 0xf3, 0x42, 0x7a, 0x0f, 0x32, 0xfa, 0x56,
+		    0x6d, 0x9c, 0xa0, 0xa7, 0x8a, 0xef, 0xc0, 0x13 }
+}, { /* wycheproof - misc */
+	.key	= { 0x97, 0xd6, 0x35, 0xc4, 0xf4, 0x75, 0x74, 0xd9,
+		    0x99, 0x8a, 0x90, 0x87, 0x5d, 0xa1, 0xd3, 0xa2,
+		    0x84, 0xb7, 0x55, 0xb2, 0xd3, 0x92, 0x97, 0xa5,
+		    0x72, 0x52, 0x35, 0x19, 0x0e, 0x10, 0xa9, 0x7e },
+	.nonce	= { 0xd3, 0x1c, 0x21, 0xab, 0xa1, 0x75, 0xb7, 0x0d,
+		    0xe4, 0xeb, 0xb1, 0x9c },
+	.nlen	= 12,
+	.assoc	= { 0x71, 0x93, 0xf6, 0x23, 0x66, 0x33, 0x21, 0xa2 },
+	.alen	= 8,
+	.input	= { 0x94, 0xee, 0x16, 0x6d, 0x6d, 0x6e, 0xcf, 0x88,
+		    0x32, 0x43, 0x71, 0x36, 0xb4, 0xae, 0x80, 0x5d,
+		    0x42, 0x88, 0x64, 0x35, 0x95, 0x86, 0xd9, 0x19,
+		    0x3a, 0x25, 0x01, 0x62, 0x93, 0xed, 0xba, 0x44,
+		    0x3c, 0x58, 0xe0, 0x7e, 0x7b, 0x71, 0x95, 0xec,
+		    0x5b, 0xd8, 0x45, 0x82, 0xa9, 0xd5, 0x6c, 0x8d,
+		    0x4a, 0x10, 0x8c, 0x7d, 0x7c, 0xe3, 0x4e, 0x6c,
+		    0x6f, 0x8e, 0xa1, 0xbe, 0xc0, 0x56, 0x73, 0x17 },
+	.ilen	= 64,
+	.result	= { 0x5f, 0xbb, 0xde, 0xcc, 0x34, 0xbe, 0x20, 0x16,
+		    0x14, 0xf6, 0x36, 0x03, 0x1e, 0xeb, 0x42, 0xf1,
+		    0xca, 0xce, 0x3c, 0x79, 0xa1, 0x2c, 0xff, 0xd8,
+		    0x71, 0xee, 0x8e, 0x73, 0x82, 0x0c, 0x82, 0x97,
+		    0x49, 0xf1, 0xab, 0xb4, 0x29, 0x43, 0x67, 0x84,
+		    0x9f, 0xb6, 0xc2, 0xaa, 0x56, 0xbd, 0xa8, 0xa3,
+		    0x07, 0x8f, 0x72, 0x3d, 0x7c, 0x1c, 0x85, 0x20,
+		    0x24, 0xb0, 0x17, 0xb5, 0x89, 0x73, 0xfb, 0x1e,
+		    0x09, 0x26, 0x3d, 0xa7, 0xb4, 0xcb, 0x92, 0x14,
+		    0x52, 0xf9, 0x7d, 0xca, 0x40, 0xf5, 0x80, 0xec }
+}, { /* wycheproof - misc */
+	.key	= { 0xfe, 0x6e, 0x55, 0xbd, 0xae, 0xd1, 0xf7, 0x28,
+		    0x4c, 0xa5, 0xfc, 0x0f, 0x8c, 0x5f, 0x2b, 0x8d,
+		    0xf5, 0x6d, 0xc0, 0xf4, 0x9e, 0x8c, 0xa6, 0x6a,
+		    0x41, 0x99, 0x5e, 0x78, 0x33, 0x51, 0xf9, 0x01 },
+	.nonce	= { 0x17, 0xc8, 0x6a, 0x8a, 0xbb, 0xb7, 0xe0, 0x03,
+		    0xac, 0xde, 0x27, 0x99 },
+	.nlen	= 12,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xb4, 0x29, 0xeb, 0x80, 0xfb, 0x8f, 0xe8, 0xba,
+		    0xed, 0xa0, 0xc8, 0x5b, 0x9c, 0x33, 0x34, 0x58,
+		    0xe7, 0xc2, 0x99, 0x2e, 0x55, 0x84, 0x75, 0x06,
+		    0x9d, 0x12, 0xd4, 0x5c, 0x22, 0x21, 0x75, 0x64,
+		    0x12, 0x15, 0x88, 0x03, 0x22, 0x97, 0xef, 0xf5,
+		    0x67, 0x83, 0x74, 0x2a, 0x5f, 0xc2, 0x2d, 0x74,
+		    0x10, 0xff, 0xb2, 0x9d, 0x66, 0x09, 0x86, 0x61,
+		    0xd7, 0x6f, 0x12, 0x6c, 0x3c, 0x27, 0x68, 0x9e,
+		    0x43, 0xb3, 0x72, 0x67, 0xca, 0xc5, 0xa3, 0xa6,
+		    0xd3, 0xab, 0x49, 0xe3, 0x91, 0xda, 0x29, 0xcd,
+		    0x30, 0x54, 0xa5, 0x69, 0x2e, 0x28, 0x07, 0xe4,
+		    0xc3, 0xea, 0x46, 0xc8, 0x76, 0x1d, 0x50, 0xf5,
+		    0x92 },
+	.ilen	= 97,
+	.result	= { 0xd0, 0x10, 0x2f, 0x6c, 0x25, 0x8b, 0xf4, 0x97,
+		    0x42, 0xce, 0xc3, 0x4c, 0xf2, 0xd0, 0xfe, 0xdf,
+		    0x23, 0xd1, 0x05, 0xfb, 0x4c, 0x84, 0xcf, 0x98,
+		    0x51, 0x5e, 0x1b, 0xc9, 0xa6, 0x4f, 0x8a, 0xd5,
+		    0xbe, 0x8f, 0x07, 0x21, 0xbd, 0xe5, 0x06, 0x45,
+		    0xd0, 0x00, 0x83, 0xc3, 0xa2, 0x63, 0xa3, 0x10,
+		    0x53, 0xb7, 0x60, 0x24, 0x5f, 0x52, 0xae, 0x28,
+		    0x66, 0xa5, 0xec, 0x83, 0xb1, 0x9f, 0x61, 0xbe,
+		    0x1d, 0x30, 0xd5, 0xc5, 0xd9, 0xfe, 0xcc, 0x4c,
+		    0xbb, 0xe0, 0x8f, 0xd3, 0x85, 0x81, 0x3a, 0x2a,
+		    0xa3, 0x9a, 0x00, 0xff, 0x9c, 0x10, 0xf7, 0xf2,
+		    0x37, 0x02, 0xad, 0xd1, 0xe4, 0xb2, 0xff, 0xa3,
+		    0x1c, 0x41, 0x86, 0x5f, 0xc7, 0x1d, 0xe1, 0x2b,
+		    0x19, 0x61, 0x21, 0x27, 0xce, 0x49, 0x99, 0x3b,
+		    0xb0 }
+}, { /* wycheproof - misc */
+	.key	= { 0xaa, 0xbc, 0x06, 0x34, 0x74, 0xe6, 0x5c, 0x4c,
+		    0x3e, 0x9b, 0xdc, 0x48, 0x0d, 0xea, 0x97, 0xb4,
+		    0x51, 0x10, 0xc8, 0x61, 0x88, 0x46, 0xff, 0x6b,
+		    0x15, 0xbd, 0xd2, 0xa4, 0xa5, 0x68, 0x2c, 0x4e },
+	.nonce	= { 0x46, 0x36, 0x2f, 0x45, 0xd6, 0x37, 0x9e, 0x63,
+		    0xe5, 0x22, 0x94, 0x60 },
+	.nlen	= 12,
+	.assoc	= { 0xa1, 0x1c, 0x40, 0xb6, 0x03, 0x76, 0x73, 0x30 },
+	.alen	= 8,
+	.input	= { 0xce, 0xb5, 0x34, 0xce, 0x50, 0xdc, 0x23, 0xff,
+		    0x63, 0x8a, 0xce, 0x3e, 0xf6, 0x3a, 0xb2, 0xcc,
+		    0x29, 0x73, 0xee, 0xad, 0xa8, 0x07, 0x85, 0xfc,
+		    0x16, 0x5d, 0x06, 0xc2, 0xf5, 0x10, 0x0f, 0xf5,
+		    0xe8, 0xab, 0x28, 0x82, 0xc4, 0x75, 0xaf, 0xcd,
+		    0x05, 0xcc, 0xd4, 0x9f, 0x2e, 0x7d, 0x8f, 0x55,
+		    0xef, 0x3a, 0x72, 0xe3, 0xdc, 0x51, 0xd6, 0x85,
+		    0x2b, 0x8e, 0x6b, 0x9e, 0x7a, 0xec, 0xe5, 0x7b,
+		    0xe6, 0x55, 0x6b, 0x0b, 0x6d, 0x94, 0x13, 0xe3,
+		    0x3f, 0xc5, 0xfc, 0x24, 0xa9, 0xa2, 0x05, 0xad,
+		    0x59, 0x57, 0x4b, 0xb3, 0x9d, 0x94, 0x4a, 0x92,
+		    0xdc, 0x47, 0x97, 0x0d, 0x84, 0xa6, 0xad, 0x31,
+		    0x76 },
+	.ilen	= 97,
+	.result	= { 0x75, 0x45, 0x39, 0x1b, 0x51, 0xde, 0x01, 0xd5,
+		    0xc5, 0x3d, 0xfa, 0xca, 0x77, 0x79, 0x09, 0x06,
+		    0x3e, 0x58, 0xed, 0xee, 0x4b, 0xb1, 0x22, 0x7e,
+		    0x71, 0x10, 0xac, 0x4d, 0x26, 0x20, 0xc2, 0xae,
+		    0xc2, 0xf8, 0x48, 0xf5, 0x6d, 0xee, 0xb0, 0x37,
+		    0xa8, 0xdc, 0xed, 0x75, 0xaf, 0xa8, 0xa6, 0xc8,
+		    0x90, 0xe2, 0xde, 0xe4, 0x2f, 0x95, 0x0b, 0xb3,
+		    0x3d, 0x9e, 0x24, 0x24, 0xd0, 0x8a, 0x50, 0x5d,
+		    0x89, 0x95, 0x63, 0x97, 0x3e, 0xd3, 0x88, 0x70,
+		    0xf3, 0xde, 0x6e, 0xe2, 0xad, 0xc7, 0xfe, 0x07,
+		    0x2c, 0x36, 0x6c, 0x14, 0xe2, 0xcf, 0x7c, 0xa6,
+		    0x2f, 0xb3, 0xd3, 0x6b, 0xee, 0x11, 0x68, 0x54,
+		    0x61, 0xb7, 0x0d, 0x44, 0xef, 0x8c, 0x66, 0xc5,
+		    0xc7, 0xbb, 0xf1, 0x0d, 0xca, 0xdd, 0x7f, 0xac,
+		    0xf6 }
+}, { /* wycheproof - misc */
+	.key	= { 0x7d, 0x00, 0xb4, 0x80, 0x95, 0xad, 0xfa, 0x32,
+		    0x72, 0x05, 0x06, 0x07, 0xb2, 0x64, 0x18, 0x50,
+		    0x02, 0xba, 0x99, 0x95, 0x7c, 0x49, 0x8b, 0xe0,
+		    0x22, 0x77, 0x0f, 0x2c, 0xe2, 0xf3, 0x14, 0x3c },
+	.nonce	= { 0x87, 0x34, 0x5f, 0x10, 0x55, 0xfd, 0x9e, 0x21,
+		    0x02, 0xd5, 0x06, 0x56 },
+	.nlen	= 12,
+	.assoc	= { 0x02 },
+	.alen	= 1,
+	.input	= { 0xe5, 0xcc, 0xaa, 0x44, 0x1b, 0xc8, 0x14, 0x68,
+		    0x8f, 0x8f, 0x6e, 0x8f, 0x28, 0xb5, 0x00, 0xb2 },
+	.ilen	= 16,
+	.result	= { 0x7e, 0x72, 0xf5, 0xa1, 0x85, 0xaf, 0x16, 0xa6,
+		    0x11, 0x92, 0x1b, 0x43, 0x8f, 0x74, 0x9f, 0x0b,
+		    0x12, 0x42, 0xc6, 0x70, 0x73, 0x23, 0x34, 0x02,
+		    0x9a, 0xdf, 0xe1, 0xc5, 0x00, 0x16, 0x51, 0xe4 }
+}, { /* wycheproof - misc */
+	.key	= { 0x64, 0x32, 0x71, 0x7f, 0x1d, 0xb8, 0x5e, 0x41,
+		    0xac, 0x78, 0x36, 0xbc, 0xe2, 0x51, 0x85, 0xa0,
+		    0x80, 0xd5, 0x76, 0x2b, 0x9e, 0x2b, 0x18, 0x44,
+		    0x4b, 0x6e, 0xc7, 0x2c, 0x3b, 0xd8, 0xe4, 0xdc },
+	.nonce	= { 0x87, 0xa3, 0x16, 0x3e, 0xc0, 0x59, 0x8a, 0xd9,
+		    0x5b, 0x3a, 0xa7, 0x13 },
+	.nlen	= 12,
+	.assoc	= { 0xb6, 0x48 },
+	.alen	= 2,
+	.input	= { 0x02, 0xcd, 0xe1, 0x68, 0xfb, 0xa3, 0xf5, 0x44,
+		    0xbb, 0xd0, 0x33, 0x2f, 0x7a, 0xde, 0xad, 0xa8 },
+	.ilen	= 16,
+	.result	= { 0x85, 0xf2, 0x9a, 0x71, 0x95, 0x57, 0xcd, 0xd1,
+		    0x4d, 0x1f, 0x8f, 0xff, 0xab, 0x6d, 0x9e, 0x60,
+		    0x73, 0x2c, 0xa3, 0x2b, 0xec, 0xd5, 0x15, 0xa1,
+		    0xed, 0x35, 0x3f, 0x54, 0x2e, 0x99, 0x98, 0x58 }
+}, { /* wycheproof - misc */
+	.key	= { 0x8e, 0x34, 0xcf, 0x73, 0xd2, 0x45, 0xa1, 0x08,
+		    0x2a, 0x92, 0x0b, 0x86, 0x36, 0x4e, 0xb8, 0x96,
+		    0xc4, 0x94, 0x64, 0x67, 0xbc, 0xb3, 0xd5, 0x89,
+		    0x29, 0xfc, 0xb3, 0x66, 0x90, 0xe6, 0x39, 0x4f },
+	.nonce	= { 0x6f, 0x57, 0x3a, 0xa8, 0x6b, 0xaa, 0x49, 0x2b,
+		    0xa4, 0x65, 0x96, 0xdf },
+	.nlen	= 12,
+	.assoc	= { 0xbd, 0x4c, 0xd0, 0x2f, 0xc7, 0x50, 0x2b, 0xbd,
+		    0xbd, 0xf6, 0xc9, 0xa3, 0xcb, 0xe8, 0xf0 },
+	.alen	= 15,
+	.input	= { 0x16, 0xdd, 0xd2, 0x3f, 0xf5, 0x3f, 0x3d, 0x23,
+		    0xc0, 0x63, 0x34, 0x48, 0x70, 0x40, 0xeb, 0x47 },
+	.ilen	= 16,
+	.result	= { 0xc1, 0xb2, 0x95, 0x93, 0x6d, 0x56, 0xfa, 0xda,
+		    0xc0, 0x3e, 0x5f, 0x74, 0x2b, 0xff, 0x73, 0xa1,
+		    0x39, 0xc4, 0x57, 0xdb, 0xab, 0x66, 0x38, 0x2b,
+		    0xab, 0xb3, 0xb5, 0x58, 0x00, 0xcd, 0xa5, 0xb8 }
+}, { /* wycheproof - misc */
+	.key	= { 0xcb, 0x55, 0x75, 0xf5, 0xc7, 0xc4, 0x5c, 0x91,
+		    0xcf, 0x32, 0x0b, 0x13, 0x9f, 0xb5, 0x94, 0x23,
+		    0x75, 0x60, 0xd0, 0xa3, 0xe6, 0xf8, 0x65, 0xa6,
+		    0x7d, 0x4f, 0x63, 0x3f, 0x2c, 0x08, 0xf0, 0x16 },
+	.nonce	= { 0x1a, 0x65, 0x18, 0xf0, 0x2e, 0xde, 0x1d, 0xa6,
+		    0x80, 0x92, 0x66, 0xd9 },
+	.nlen	= 12,
+	.assoc	= { 0x89, 0xcc, 0xe9, 0xfb, 0x47, 0x44, 0x1d, 0x07,
+		    0xe0, 0x24, 0x5a, 0x66, 0xfe, 0x8b, 0x77, 0x8b },
+	.alen	= 16,
+	.input	= { 0x62, 0x3b, 0x78, 0x50, 0xc3, 0x21, 0xe2, 0xcf,
+		    0x0c, 0x6f, 0xbc, 0xc8, 0xdf, 0xd1, 0xaf, 0xf2 },
+	.ilen	= 16,
+	.result	= { 0xc8, 0x4c, 0x9b, 0xb7, 0xc6, 0x1c, 0x1b, 0xcb,
+		    0x17, 0x77, 0x2a, 0x1c, 0x50, 0x0c, 0x50, 0x95,
+		    0xdb, 0xad, 0xf7, 0xa5, 0x13, 0x8c, 0xa0, 0x34,
+		    0x59, 0xa2, 0xcd, 0x65, 0x83, 0x1e, 0x09, 0x2f }
+}, { /* wycheproof - misc */
+	.key	= { 0xa5, 0x56, 0x9e, 0x72, 0x9a, 0x69, 0xb2, 0x4b,
+		    0xa6, 0xe0, 0xff, 0x15, 0xc4, 0x62, 0x78, 0x97,
+		    0x43, 0x68, 0x24, 0xc9, 0x41, 0xe9, 0xd0, 0x0b,
+		    0x2e, 0x93, 0xfd, 0xdc, 0x4b, 0xa7, 0x76, 0x57 },
+	.nonce	= { 0x56, 0x4d, 0xee, 0x49, 0xab, 0x00, 0xd2, 0x40,
+		    0xfc, 0x10, 0x68, 0xc3 },
+	.nlen	= 12,
+	.assoc	= { 0xd1, 0x9f, 0x2d, 0x98, 0x90, 0x95, 0xf7, 0xab,
+		    0x03, 0xa5, 0xfd, 0xe8, 0x44, 0x16, 0xe0, 0x0c,
+		    0x0e },
+	.alen	= 17,
+	.input	= { 0x87, 0xb3, 0xa4, 0xd7, 0xb2, 0x6d, 0x8d, 0x32,
+		    0x03, 0xa0, 0xde, 0x1d, 0x64, 0xef, 0x82, 0xe3 },
+	.ilen	= 16,
+	.result	= { 0x94, 0xbc, 0x80, 0x62, 0x1e, 0xd1, 0xe7, 0x1b,
+		    0x1f, 0xd2, 0xb5, 0xc3, 0xa1, 0x5e, 0x35, 0x68,
+		    0x33, 0x35, 0x11, 0x86, 0x17, 0x96, 0x97, 0x84,
+		    0x01, 0x59, 0x8b, 0x96, 0x37, 0x22, 0xf5, 0xb3 }
+}, { /* wycheproof - misc */
+	.key	= { 0x56, 0x20, 0x74, 0x65, 0xb4, 0xe4, 0x8e, 0x6d,
+		    0x04, 0x63, 0x0f, 0x4a, 0x42, 0xf3, 0x5c, 0xfc,
+		    0x16, 0x3a, 0xb2, 0x89, 0xc2, 0x2a, 0x2b, 0x47,
+		    0x84, 0xf6, 0xf9, 0x29, 0x03, 0x30, 0xbe, 0xe0 },
+	.nonce	= { 0xdf, 0x87, 0x13, 0xe8, 0x7e, 0xc3, 0xdb, 0xcf,
+		    0xad, 0x14, 0xd5, 0x3e },
+	.nlen	= 12,
+	.assoc	= { 0x5e, 0x64, 0x70, 0xfa, 0xcd, 0x99, 0xc1, 0xd8,
+		    0x1e, 0x37, 0xcd, 0x44, 0x01, 0x5f, 0xe1, 0x94,
+		    0x80, 0xa2, 0xa4, 0xd3, 0x35, 0x2a, 0x4f, 0xf5,
+		    0x60, 0xc0, 0x64, 0x0f, 0xdb, 0xda },
+	.alen	= 30,
+	.input	= { 0xe6, 0x01, 0xb3, 0x85, 0x57, 0x79, 0x7d, 0xa2,
+		    0xf8, 0xa4, 0x10, 0x6a, 0x08, 0x9d, 0x1d, 0xa6 },
+	.ilen	= 16,
+	.result	= { 0x29, 0x9b, 0x5d, 0x3f, 0x3d, 0x03, 0xc0, 0x87,
+		    0x20, 0x9a, 0x16, 0xe2, 0x85, 0x14, 0x31, 0x11,
+		    0x4b, 0x45, 0x4e, 0xd1, 0x98, 0xde, 0x11, 0x7e,
+		    0x83, 0xec, 0x49, 0xfa, 0x8d, 0x85, 0x08, 0xd6 }
+}, { /* wycheproof - misc */
+	.key	= { 0x39, 0x37, 0x98, 0x6a, 0xf8, 0x6d, 0xaf, 0xc1,
+		    0xba, 0x0c, 0x46, 0x72, 0xd8, 0xab, 0xc4, 0x6c,
+		    0x20, 0x70, 0x62, 0x68, 0x2d, 0x9c, 0x26, 0x4a,
+		    0xb0, 0x6d, 0x6c, 0x58, 0x07, 0x20, 0x51, 0x30 },
+	.nonce	= { 0x8d, 0xf4, 0xb1, 0x5a, 0x88, 0x8c, 0x33, 0x28,
+		    0x6a, 0x7b, 0x76, 0x51 },
+	.nlen	= 12,
+	.assoc	= { 0xba, 0x44, 0x6f, 0x6f, 0x9a, 0x0c, 0xed, 0x22,
+		    0x45, 0x0f, 0xeb, 0x10, 0x73, 0x7d, 0x90, 0x07,
+		    0xfd, 0x69, 0xab, 0xc1, 0x9b, 0x1d, 0x4d, 0x90,
+		    0x49, 0xa5, 0x55, 0x1e, 0x86, 0xec, 0x2b, 0x37 },
+	.alen	= 32,
+	.input	= { 0xdc, 0x9e, 0x9e, 0xaf, 0x11, 0xe3, 0x14, 0x18,
+		    0x2d, 0xf6, 0xa4, 0xeb, 0xa1, 0x7a, 0xec, 0x9c },
+	.ilen	= 16,
+	.result	= { 0x60, 0x5b, 0xbf, 0x90, 0xae, 0xb9, 0x74, 0xf6,
+		    0x60, 0x2b, 0xc7, 0x78, 0x05, 0x6f, 0x0d, 0xca,
+		    0x38, 0xea, 0x23, 0xd9, 0x90, 0x54, 0xb4, 0x6b,
+		    0x42, 0xff, 0xe0, 0x04, 0x12, 0x9d, 0x22, 0x04 }
+}, { /* wycheproof - misc */
+	.key	= { 0x36, 0x37, 0x2a, 0xbc, 0xdb, 0x78, 0xe0, 0x27,
+		    0x96, 0x46, 0xac, 0x3d, 0x17, 0x6b, 0x96, 0x74,
+		    0xe9, 0x15, 0x4e, 0xec, 0xf0, 0xd5, 0x46, 0x9c,
+		    0x65, 0x1e, 0xc7, 0xe1, 0x6b, 0x4c, 0x11, 0x99 },
+	.nonce	= { 0xbe, 0x40, 0xe5, 0xf1, 0xa1, 0x18, 0x17, 0xa0,
+		    0xa8, 0xfa, 0x89, 0x49 },
+	.nlen	= 12,
+	.assoc	= { 0xd4, 0x1a, 0x82, 0x8d, 0x5e, 0x71, 0x82, 0x92,
+		    0x47, 0x02, 0x19, 0x05, 0x40, 0x2e, 0xa2, 0x57,
+		    0xdc, 0xcb, 0xc3, 0xb8, 0x0f, 0xcd, 0x56, 0x75,
+		    0x05, 0x6b, 0x68, 0xbb, 0x59, 0xe6, 0x2e, 0x88,
+		    0x73 },
+	.alen	= 33,
+	.input	= { 0x81, 0xce, 0x84, 0xed, 0xe9, 0xb3, 0x58, 0x59,
+		    0xcc, 0x8c, 0x49, 0xa8, 0xf6, 0xbe, 0x7d, 0xc6 },
+	.ilen	= 16,
+	.result	= { 0x7b, 0x7c, 0xe0, 0xd8, 0x24, 0x80, 0x9a, 0x70,
+		    0xde, 0x32, 0x56, 0x2c, 0xcf, 0x2c, 0x2b, 0xbd,
+		    0x15, 0xd4, 0x4a, 0x00, 0xce, 0x0d, 0x19, 0xb4,
+		    0x23, 0x1f, 0x92, 0x1e, 0x22, 0xbc, 0x0a, 0x43 }
+}, { /* wycheproof - misc */
+	.key	= { 0x9f, 0x14, 0x79, 0xed, 0x09, 0x7d, 0x7f, 0xe5,
+		    0x29, 0xc1, 0x1f, 0x2f, 0x5a, 0xdd, 0x9a, 0xaf,
+		    0xf4, 0xa1, 0xca, 0x0b, 0x68, 0x99, 0x7a, 0x2c,
+		    0xb7, 0xf7, 0x97, 0x49, 0xbd, 0x90, 0xaa, 0xf4 },
+	.nonce	= { 0x84, 0xc8, 0x7d, 0xae, 0x4e, 0xee, 0x27, 0x73,
+		    0x0e, 0xc3, 0x5d, 0x12 },
+	.nlen	= 12,
+	.assoc	= { 0x3f, 0x2d, 0xd4, 0x9b, 0xbf, 0x09, 0xd6, 0x9a,
+		    0x78, 0xa3, 0xd8, 0x0e, 0xa2, 0x56, 0x66, 0x14,
+		    0xfc, 0x37, 0x94, 0x74, 0x19, 0x6c, 0x1a, 0xae,
+		    0x84, 0x58, 0x3d, 0xa7, 0x3d, 0x7f, 0xf8, 0x5c,
+		    0x6f, 0x42, 0xca, 0x42, 0x05, 0x6a, 0x97, 0x92,
+		    0xcc, 0x1b, 0x9f, 0xb3, 0xc7, 0xd2, 0x61 },
+	.alen	= 47,
+	.input	= { 0xa6, 0x67, 0x47, 0xc8, 0x9e, 0x85, 0x7a, 0xf3,
+		    0xa1, 0x8e, 0x2c, 0x79, 0x50, 0x00, 0x87, 0xed },
+	.ilen	= 16,
+	.result	= { 0xca, 0x82, 0xbf, 0xf3, 0xe2, 0xf3, 0x10, 0xcc,
+		    0xc9, 0x76, 0x67, 0x2c, 0x44, 0x15, 0xe6, 0x9b,
+		    0x57, 0x63, 0x8c, 0x62, 0xa5, 0xd8, 0x5d, 0xed,
+		    0x77, 0x4f, 0x91, 0x3c, 0x81, 0x3e, 0xa0, 0x32 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x88, 0x80, 0x94, 0x17, 0x83,
+		    0x55, 0xd3, 0x04, 0x84, 0x64, 0x43, 0xfe, 0xe8,
+		    0xdf, 0x99, 0x47, 0x03, 0x03, 0xfb, 0x3b, 0x7b,
+		    0x80, 0xe0, 0x30, 0xbe, 0xeb, 0xd3, 0x29, 0xbe },
+	.ilen	= 32,
+	.result	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0xe6, 0xd3, 0xd7, 0x32, 0x4a, 0x1c, 0xbb, 0xa7,
+		    0x77, 0xbb, 0xb0, 0xec, 0xdd, 0xa3, 0x78, 0x07 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x88, 0x80, 0x94, 0x17, 0x83,
+		    0x55, 0xd3, 0x04, 0x84, 0x64, 0x43, 0xfe, 0xe8,
+		    0xdf, 0x99, 0x47, 0x03, 0x03, 0xfb, 0x3b, 0x7b,
+		    0x80, 0xe0, 0x30, 0xbe, 0xeb, 0xd3, 0x29, 0xbe,
+		    0xe3, 0xbc, 0xdb, 0x5b, 0x1e, 0xde, 0xfc, 0xfe,
+		    0x8b, 0xcd, 0xa1, 0xb6, 0xa1, 0x5c, 0x8c, 0x2b,
+		    0x08, 0x69, 0xff, 0xd2, 0xec, 0x5e, 0x26, 0xe5,
+		    0x53, 0xb7, 0xb2, 0x27, 0xfe, 0x87, 0xfd, 0xbd },
+	.ilen	= 64,
+	.result	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x06, 0x2d, 0xe6, 0x79, 0x5f, 0x27, 0x4f, 0xd2,
+		    0xa3, 0x05, 0xd7, 0x69, 0x80, 0xbc, 0x9c, 0xce }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x88, 0x80, 0x94, 0x17, 0x83,
+		    0x55, 0xd3, 0x04, 0x84, 0x64, 0x43, 0xfe, 0xe8,
+		    0xdf, 0x99, 0x47, 0x03, 0x03, 0xfb, 0x3b, 0x7b,
+		    0x80, 0xe0, 0x30, 0xbe, 0xeb, 0xd3, 0x29, 0xbe,
+		    0xe3, 0xbc, 0xdb, 0x5b, 0x1e, 0xde, 0xfc, 0xfe,
+		    0x8b, 0xcd, 0xa1, 0xb6, 0xa1, 0x5c, 0x8c, 0x2b,
+		    0x08, 0x69, 0xff, 0xd2, 0xec, 0x5e, 0x26, 0xe5,
+		    0x53, 0xb7, 0xb2, 0x27, 0xfe, 0x87, 0xfd, 0xbd,
+		    0x7a, 0xda, 0x44, 0x42, 0x42, 0x69, 0xbf, 0xfa,
+		    0x55, 0x27, 0xf2, 0x70, 0xac, 0xf6, 0x85, 0x02,
+		    0xb7, 0x4c, 0x5a, 0xe2, 0xe6, 0x0c, 0x05, 0x80,
+		    0x98, 0x1a, 0x49, 0x38, 0x45, 0x93, 0x92, 0xc4,
+		    0x9b, 0xb2, 0xf2, 0x84, 0xb6, 0x46, 0xef, 0xc7,
+		    0xf3, 0xf0, 0xb1, 0x36, 0x1d, 0xc3, 0x48, 0xed,
+		    0x77, 0xd3, 0x0b, 0xc5, 0x76, 0x92, 0xed, 0x38,
+		    0xfb, 0xac, 0x01, 0x88, 0x38, 0x04, 0x88, 0xc7 },
+	.ilen	= 128,
+	.result	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0xd8, 0xb4, 0x79, 0x02, 0xba, 0xae, 0xaf, 0xb3,
+		    0x42, 0x03, 0x05, 0x15, 0x29, 0xaf, 0x28, 0x2e }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0x77, 0x7f, 0x6b, 0xe8, 0x7c,
+		    0xaa, 0x2c, 0xfb, 0x7b, 0x9b, 0xbc, 0x01, 0x17,
+		    0x20, 0x66, 0xb8, 0xfc, 0xfc, 0x04, 0xc4, 0x84,
+		    0x7f, 0x1f, 0xcf, 0x41, 0x14, 0x2c, 0xd6, 0x41 },
+	.ilen	= 32,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xb3, 0x89, 0x1c, 0x84, 0x9c, 0xb5, 0x2c, 0x27,
+		    0x74, 0x7e, 0xdf, 0xcf, 0x31, 0x21, 0x3b, 0xb6 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0x77, 0x7f, 0x6b, 0xe8, 0x7c,
+		    0xaa, 0x2c, 0xfb, 0x7b, 0x9b, 0xbc, 0x01, 0x17,
+		    0x20, 0x66, 0xb8, 0xfc, 0xfc, 0x04, 0xc4, 0x84,
+		    0x7f, 0x1f, 0xcf, 0x41, 0x14, 0x2c, 0xd6, 0x41,
+		    0x1c, 0x43, 0x24, 0xa4, 0xe1, 0x21, 0x03, 0x01,
+		    0x74, 0x32, 0x5e, 0x49, 0x5e, 0xa3, 0x73, 0xd4,
+		    0xf7, 0x96, 0x00, 0x2d, 0x13, 0xa1, 0xd9, 0x1a,
+		    0xac, 0x48, 0x4d, 0xd8, 0x01, 0x78, 0x02, 0x42 },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xf0, 0xc1, 0x2d, 0x26, 0xef, 0x03, 0x02, 0x9b,
+		    0x62, 0xc0, 0x08, 0xda, 0x27, 0xc5, 0xdc, 0x68 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0x77, 0x7f, 0x6b, 0xe8, 0x7c,
+		    0xaa, 0x2c, 0xfb, 0x7b, 0x9b, 0xbc, 0x01, 0x17,
+		    0x20, 0x66, 0xb8, 0xfc, 0xfc, 0x04, 0xc4, 0x84,
+		    0x7f, 0x1f, 0xcf, 0x41, 0x14, 0x2c, 0xd6, 0x41,
+		    0x1c, 0x43, 0x24, 0xa4, 0xe1, 0x21, 0x03, 0x01,
+		    0x74, 0x32, 0x5e, 0x49, 0x5e, 0xa3, 0x73, 0xd4,
+		    0xf7, 0x96, 0x00, 0x2d, 0x13, 0xa1, 0xd9, 0x1a,
+		    0xac, 0x48, 0x4d, 0xd8, 0x01, 0x78, 0x02, 0x42,
+		    0x85, 0x25, 0xbb, 0xbd, 0xbd, 0x96, 0x40, 0x05,
+		    0xaa, 0xd8, 0x0d, 0x8f, 0x53, 0x09, 0x7a, 0xfd,
+		    0x48, 0xb3, 0xa5, 0x1d, 0x19, 0xf3, 0xfa, 0x7f,
+		    0x67, 0xe5, 0xb6, 0xc7, 0xba, 0x6c, 0x6d, 0x3b,
+		    0x64, 0x4d, 0x0d, 0x7b, 0x49, 0xb9, 0x10, 0x38,
+		    0x0c, 0x0f, 0x4e, 0xc9, 0xe2, 0x3c, 0xb7, 0x12,
+		    0x88, 0x2c, 0xf4, 0x3a, 0x89, 0x6d, 0x12, 0xc7,
+		    0x04, 0x53, 0xfe, 0x77, 0xc7, 0xfb, 0x77, 0x38 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xee, 0x65, 0x78, 0x30, 0x01, 0xc2, 0x56, 0x91,
+		    0xfa, 0x28, 0xd0, 0xf5, 0xf1, 0xc1, 0xd7, 0x62 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80 },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x08, 0x80, 0x94, 0x17, 0x03,
+		    0x55, 0xd3, 0x04, 0x04, 0x64, 0x43, 0xfe, 0x68,
+		    0xdf, 0x99, 0x47, 0x83, 0x03, 0xfb, 0x3b, 0xfb,
+		    0x80, 0xe0, 0x30, 0x3e, 0xeb, 0xd3, 0x29, 0x3e },
+	.ilen	= 32,
+	.result	= { 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x79, 0xba, 0x7a, 0x29, 0xf5, 0xa7, 0xbb, 0x75,
+		    0x79, 0x7a, 0xf8, 0x7a, 0x61, 0x01, 0x29, 0xa4 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80 },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x08, 0x80, 0x94, 0x17, 0x03,
+		    0x55, 0xd3, 0x04, 0x04, 0x64, 0x43, 0xfe, 0x68,
+		    0xdf, 0x99, 0x47, 0x83, 0x03, 0xfb, 0x3b, 0xfb,
+		    0x80, 0xe0, 0x30, 0x3e, 0xeb, 0xd3, 0x29, 0x3e,
+		    0xe3, 0xbc, 0xdb, 0xdb, 0x1e, 0xde, 0xfc, 0x7e,
+		    0x8b, 0xcd, 0xa1, 0x36, 0xa1, 0x5c, 0x8c, 0xab,
+		    0x08, 0x69, 0xff, 0x52, 0xec, 0x5e, 0x26, 0x65,
+		    0x53, 0xb7, 0xb2, 0xa7, 0xfe, 0x87, 0xfd, 0x3d },
+	.ilen	= 64,
+	.result	= { 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x36, 0xb1, 0x74, 0x38, 0x19, 0xe1, 0xb9, 0xba,
+		    0x15, 0x51, 0xe8, 0xed, 0x92, 0x2a, 0x95, 0x9a }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80 },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x08, 0x80, 0x94, 0x17, 0x03,
+		    0x55, 0xd3, 0x04, 0x04, 0x64, 0x43, 0xfe, 0x68,
+		    0xdf, 0x99, 0x47, 0x83, 0x03, 0xfb, 0x3b, 0xfb,
+		    0x80, 0xe0, 0x30, 0x3e, 0xeb, 0xd3, 0x29, 0x3e,
+		    0xe3, 0xbc, 0xdb, 0xdb, 0x1e, 0xde, 0xfc, 0x7e,
+		    0x8b, 0xcd, 0xa1, 0x36, 0xa1, 0x5c, 0x8c, 0xab,
+		    0x08, 0x69, 0xff, 0x52, 0xec, 0x5e, 0x26, 0x65,
+		    0x53, 0xb7, 0xb2, 0xa7, 0xfe, 0x87, 0xfd, 0x3d,
+		    0x7a, 0xda, 0x44, 0xc2, 0x42, 0x69, 0xbf, 0x7a,
+		    0x55, 0x27, 0xf2, 0xf0, 0xac, 0xf6, 0x85, 0x82,
+		    0xb7, 0x4c, 0x5a, 0x62, 0xe6, 0x0c, 0x05, 0x00,
+		    0x98, 0x1a, 0x49, 0xb8, 0x45, 0x93, 0x92, 0x44,
+		    0x9b, 0xb2, 0xf2, 0x04, 0xb6, 0x46, 0xef, 0x47,
+		    0xf3, 0xf0, 0xb1, 0xb6, 0x1d, 0xc3, 0x48, 0x6d,
+		    0x77, 0xd3, 0x0b, 0x45, 0x76, 0x92, 0xed, 0xb8,
+		    0xfb, 0xac, 0x01, 0x08, 0x38, 0x04, 0x88, 0x47 },
+	.ilen	= 128,
+	.result	= { 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0xfe, 0xac, 0x49, 0x55, 0x55, 0x4e, 0x80, 0x6f,
+		    0x3a, 0x19, 0x02, 0xe2, 0x44, 0x32, 0xc0, 0x8a }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0xf7, 0x7f, 0x6b, 0xe8, 0xfc,
+		    0xaa, 0x2c, 0xfb, 0xfb, 0x9b, 0xbc, 0x01, 0x97,
+		    0x20, 0x66, 0xb8, 0x7c, 0xfc, 0x04, 0xc4, 0x04,
+		    0x7f, 0x1f, 0xcf, 0xc1, 0x14, 0x2c, 0xd6, 0xc1 },
+	.ilen	= 32,
+	.result	= { 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0x20, 0xa3, 0x79, 0x8d, 0xf1, 0x29, 0x2c, 0x59,
+		    0x72, 0xbf, 0x97, 0x41, 0xae, 0xc3, 0x8a, 0x19 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0xf7, 0x7f, 0x6b, 0xe8, 0xfc,
+		    0xaa, 0x2c, 0xfb, 0xfb, 0x9b, 0xbc, 0x01, 0x97,
+		    0x20, 0x66, 0xb8, 0x7c, 0xfc, 0x04, 0xc4, 0x04,
+		    0x7f, 0x1f, 0xcf, 0xc1, 0x14, 0x2c, 0xd6, 0xc1,
+		    0x1c, 0x43, 0x24, 0x24, 0xe1, 0x21, 0x03, 0x81,
+		    0x74, 0x32, 0x5e, 0xc9, 0x5e, 0xa3, 0x73, 0x54,
+		    0xf7, 0x96, 0x00, 0xad, 0x13, 0xa1, 0xd9, 0x9a,
+		    0xac, 0x48, 0x4d, 0x58, 0x01, 0x78, 0x02, 0xc2 },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xc0, 0x3d, 0x9f, 0x67, 0x35, 0x4a, 0x97, 0xb2,
+		    0xf0, 0x74, 0xf7, 0x55, 0x15, 0x57, 0xe4, 0x9c }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0xf7, 0x7f, 0x6b, 0xe8, 0xfc,
+		    0xaa, 0x2c, 0xfb, 0xfb, 0x9b, 0xbc, 0x01, 0x97,
+		    0x20, 0x66, 0xb8, 0x7c, 0xfc, 0x04, 0xc4, 0x04,
+		    0x7f, 0x1f, 0xcf, 0xc1, 0x14, 0x2c, 0xd6, 0xc1,
+		    0x1c, 0x43, 0x24, 0x24, 0xe1, 0x21, 0x03, 0x81,
+		    0x74, 0x32, 0x5e, 0xc9, 0x5e, 0xa3, 0x73, 0x54,
+		    0xf7, 0x96, 0x00, 0xad, 0x13, 0xa1, 0xd9, 0x9a,
+		    0xac, 0x48, 0x4d, 0x58, 0x01, 0x78, 0x02, 0xc2,
+		    0x85, 0x25, 0xbb, 0x3d, 0xbd, 0x96, 0x40, 0x85,
+		    0xaa, 0xd8, 0x0d, 0x0f, 0x53, 0x09, 0x7a, 0x7d,
+		    0x48, 0xb3, 0xa5, 0x9d, 0x19, 0xf3, 0xfa, 0xff,
+		    0x67, 0xe5, 0xb6, 0x47, 0xba, 0x6c, 0x6d, 0xbb,
+		    0x64, 0x4d, 0x0d, 0xfb, 0x49, 0xb9, 0x10, 0xb8,
+		    0x0c, 0x0f, 0x4e, 0x49, 0xe2, 0x3c, 0xb7, 0x92,
+		    0x88, 0x2c, 0xf4, 0xba, 0x89, 0x6d, 0x12, 0x47,
+		    0x04, 0x53, 0xfe, 0xf7, 0xc7, 0xfb, 0x77, 0xb8 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xc8, 0x6d, 0xa8, 0xdd, 0x65, 0x22, 0x86, 0xd5,
+		    0x02, 0x13, 0xd3, 0x28, 0xd6, 0x3e, 0x40, 0x06 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0x5a, 0x92, 0xbf, 0x77, 0xff, 0x6b, 0xe8, 0x7c,
+		    0x2a, 0x2c, 0xfb, 0x7b, 0x1b, 0xbc, 0x01, 0x17,
+		    0xa0, 0x66, 0xb8, 0xfc, 0x7c, 0x04, 0xc4, 0x84,
+		    0xff, 0x1f, 0xcf, 0x41, 0x94, 0x2c, 0xd6, 0x41 },
+	.ilen	= 32,
+	.result	= { 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0xbe, 0xde, 0x90, 0x83, 0xce, 0xb3, 0x6d, 0xdf,
+		    0xe5, 0xfa, 0x81, 0x1f, 0x95, 0x47, 0x1c, 0x67 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0x5a, 0x92, 0xbf, 0x77, 0xff, 0x6b, 0xe8, 0x7c,
+		    0x2a, 0x2c, 0xfb, 0x7b, 0x1b, 0xbc, 0x01, 0x17,
+		    0xa0, 0x66, 0xb8, 0xfc, 0x7c, 0x04, 0xc4, 0x84,
+		    0xff, 0x1f, 0xcf, 0x41, 0x94, 0x2c, 0xd6, 0x41,
+		    0x9c, 0x43, 0x24, 0xa4, 0x61, 0x21, 0x03, 0x01,
+		    0xf4, 0x32, 0x5e, 0x49, 0xde, 0xa3, 0x73, 0xd4,
+		    0x77, 0x96, 0x00, 0x2d, 0x93, 0xa1, 0xd9, 0x1a,
+		    0x2c, 0x48, 0x4d, 0xd8, 0x81, 0x78, 0x02, 0x42 },
+	.ilen	= 64,
+	.result	= { 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x30, 0x08, 0x74, 0xbb, 0x06, 0x92, 0xb6, 0x89,
+		    0xde, 0xad, 0x9a, 0xe1, 0x5b, 0x06, 0x73, 0x90 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0x5a, 0x92, 0xbf, 0x77, 0xff, 0x6b, 0xe8, 0x7c,
+		    0x2a, 0x2c, 0xfb, 0x7b, 0x1b, 0xbc, 0x01, 0x17,
+		    0xa0, 0x66, 0xb8, 0xfc, 0x7c, 0x04, 0xc4, 0x84,
+		    0xff, 0x1f, 0xcf, 0x41, 0x94, 0x2c, 0xd6, 0x41,
+		    0x9c, 0x43, 0x24, 0xa4, 0x61, 0x21, 0x03, 0x01,
+		    0xf4, 0x32, 0x5e, 0x49, 0xde, 0xa3, 0x73, 0xd4,
+		    0x77, 0x96, 0x00, 0x2d, 0x93, 0xa1, 0xd9, 0x1a,
+		    0x2c, 0x48, 0x4d, 0xd8, 0x81, 0x78, 0x02, 0x42,
+		    0x05, 0x25, 0xbb, 0xbd, 0x3d, 0x96, 0x40, 0x05,
+		    0x2a, 0xd8, 0x0d, 0x8f, 0xd3, 0x09, 0x7a, 0xfd,
+		    0xc8, 0xb3, 0xa5, 0x1d, 0x99, 0xf3, 0xfa, 0x7f,
+		    0xe7, 0xe5, 0xb6, 0xc7, 0x3a, 0x6c, 0x6d, 0x3b,
+		    0xe4, 0x4d, 0x0d, 0x7b, 0xc9, 0xb9, 0x10, 0x38,
+		    0x8c, 0x0f, 0x4e, 0xc9, 0x62, 0x3c, 0xb7, 0x12,
+		    0x08, 0x2c, 0xf4, 0x3a, 0x09, 0x6d, 0x12, 0xc7,
+		    0x84, 0x53, 0xfe, 0x77, 0x47, 0xfb, 0x77, 0x38 },
+	.ilen	= 128,
+	.result	= { 0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x7f, 0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff,
+		    0x99, 0xca, 0xd8, 0x5f, 0x45, 0xca, 0x40, 0x94,
+		    0x2d, 0x0d, 0x4d, 0x5e, 0x95, 0x0a, 0xde, 0x22 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x88, 0x7f, 0x6b, 0xe8, 0x7c,
+		    0x55, 0xd3, 0x04, 0x84, 0x9b, 0xbc, 0x01, 0x17,
+		    0xdf, 0x99, 0x47, 0x03, 0xfc, 0x04, 0xc4, 0x84,
+		    0x80, 0xe0, 0x30, 0xbe, 0x14, 0x2c, 0xd6, 0x41 },
+	.ilen	= 32,
+	.result	= { 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x8b, 0xbe, 0x14, 0x52, 0x72, 0xe7, 0xc2, 0xd9,
+		    0xa1, 0x89, 0x1a, 0x3a, 0xb0, 0x98, 0x3d, 0x9d }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x88, 0x7f, 0x6b, 0xe8, 0x7c,
+		    0x55, 0xd3, 0x04, 0x84, 0x9b, 0xbc, 0x01, 0x17,
+		    0xdf, 0x99, 0x47, 0x03, 0xfc, 0x04, 0xc4, 0x84,
+		    0x80, 0xe0, 0x30, 0xbe, 0x14, 0x2c, 0xd6, 0x41,
+		    0xe3, 0xbc, 0xdb, 0x5b, 0xe1, 0x21, 0x03, 0x01,
+		    0x8b, 0xcd, 0xa1, 0xb6, 0x5e, 0xa3, 0x73, 0xd4,
+		    0x08, 0x69, 0xff, 0xd2, 0x13, 0xa1, 0xd9, 0x1a,
+		    0x53, 0xb7, 0xb2, 0x27, 0x01, 0x78, 0x02, 0x42 },
+	.ilen	= 64,
+	.result	= { 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x3b, 0x41, 0x86, 0x19, 0x13, 0xa8, 0xf6, 0xde,
+		    0x7f, 0x61, 0xe2, 0x25, 0x63, 0x1b, 0xc3, 0x82 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 16,
+	.input	= { 0x25, 0x6d, 0x40, 0x88, 0x7f, 0x6b, 0xe8, 0x7c,
+		    0x55, 0xd3, 0x04, 0x84, 0x9b, 0xbc, 0x01, 0x17,
+		    0xdf, 0x99, 0x47, 0x03, 0xfc, 0x04, 0xc4, 0x84,
+		    0x80, 0xe0, 0x30, 0xbe, 0x14, 0x2c, 0xd6, 0x41,
+		    0xe3, 0xbc, 0xdb, 0x5b, 0xe1, 0x21, 0x03, 0x01,
+		    0x8b, 0xcd, 0xa1, 0xb6, 0x5e, 0xa3, 0x73, 0xd4,
+		    0x08, 0x69, 0xff, 0xd2, 0x13, 0xa1, 0xd9, 0x1a,
+		    0x53, 0xb7, 0xb2, 0x27, 0x01, 0x78, 0x02, 0x42,
+		    0x7a, 0xda, 0x44, 0x42, 0xbd, 0x96, 0x40, 0x05,
+		    0x55, 0x27, 0xf2, 0x70, 0x53, 0x09, 0x7a, 0xfd,
+		    0xb7, 0x4c, 0x5a, 0xe2, 0x19, 0xf3, 0xfa, 0x7f,
+		    0x98, 0x1a, 0x49, 0x38, 0xba, 0x6c, 0x6d, 0x3b,
+		    0x9b, 0xb2, 0xf2, 0x84, 0x49, 0xb9, 0x10, 0x38,
+		    0xf3, 0xf0, 0xb1, 0x36, 0xe2, 0x3c, 0xb7, 0x12,
+		    0x77, 0xd3, 0x0b, 0xc5, 0x89, 0x6d, 0x12, 0xc7,
+		    0xfb, 0xac, 0x01, 0x88, 0xc7, 0xfb, 0x77, 0x38 },
+	.ilen	= 128,
+	.result	= { 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+		    0x84, 0x28, 0xbc, 0xf0, 0x23, 0xec, 0x6b, 0xf3,
+		    0x1f, 0xd9, 0xef, 0xb2, 0x03, 0xff, 0x08, 0x71 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00 },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0x77, 0x80, 0x94, 0x17, 0x83,
+		    0xaa, 0x2c, 0xfb, 0x7b, 0x64, 0x43, 0xfe, 0xe8,
+		    0x20, 0x66, 0xb8, 0xfc, 0x03, 0xfb, 0x3b, 0x7b,
+		    0x7f, 0x1f, 0xcf, 0x41, 0xeb, 0xd3, 0x29, 0xbe },
+	.ilen	= 32,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0x13, 0x9f, 0xdf, 0x64, 0x74, 0xea, 0x24, 0xf5,
+		    0x49, 0xb0, 0x75, 0x82, 0x5f, 0x2c, 0x76, 0x20 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00 },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0x77, 0x80, 0x94, 0x17, 0x83,
+		    0xaa, 0x2c, 0xfb, 0x7b, 0x64, 0x43, 0xfe, 0xe8,
+		    0x20, 0x66, 0xb8, 0xfc, 0x03, 0xfb, 0x3b, 0x7b,
+		    0x7f, 0x1f, 0xcf, 0x41, 0xeb, 0xd3, 0x29, 0xbe,
+		    0x1c, 0x43, 0x24, 0xa4, 0x1e, 0xde, 0xfc, 0xfe,
+		    0x74, 0x32, 0x5e, 0x49, 0xa1, 0x5c, 0x8c, 0x2b,
+		    0xf7, 0x96, 0x00, 0x2d, 0xec, 0x5e, 0x26, 0xe5,
+		    0xac, 0x48, 0x4d, 0xd8, 0xfe, 0x87, 0xfd, 0xbd },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xbb, 0xad, 0x8d, 0x86, 0x3b, 0x83, 0x5a, 0x8e,
+		    0x86, 0x64, 0xfd, 0x1d, 0x45, 0x66, 0xb6, 0xb4 }
+}, { /* wycheproof - misc */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0xee, 0x32, 0x00 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00 },
+	.alen	= 16,
+	.input	= { 0xda, 0x92, 0xbf, 0x77, 0x80, 0x94, 0x17, 0x83,
+		    0xaa, 0x2c, 0xfb, 0x7b, 0x64, 0x43, 0xfe, 0xe8,
+		    0x20, 0x66, 0xb8, 0xfc, 0x03, 0xfb, 0x3b, 0x7b,
+		    0x7f, 0x1f, 0xcf, 0x41, 0xeb, 0xd3, 0x29, 0xbe,
+		    0x1c, 0x43, 0x24, 0xa4, 0x1e, 0xde, 0xfc, 0xfe,
+		    0x74, 0x32, 0x5e, 0x49, 0xa1, 0x5c, 0x8c, 0x2b,
+		    0xf7, 0x96, 0x00, 0x2d, 0xec, 0x5e, 0x26, 0xe5,
+		    0xac, 0x48, 0x4d, 0xd8, 0xfe, 0x87, 0xfd, 0xbd,
+		    0x85, 0x25, 0xbb, 0xbd, 0x42, 0x69, 0xbf, 0xfa,
+		    0xaa, 0xd8, 0x0d, 0x8f, 0xac, 0xf6, 0x85, 0x02,
+		    0x48, 0xb3, 0xa5, 0x1d, 0xe6, 0x0c, 0x05, 0x80,
+		    0x67, 0xe5, 0xb6, 0xc7, 0x45, 0x93, 0x92, 0xc4,
+		    0x64, 0x4d, 0x0d, 0x7b, 0xb6, 0x46, 0xef, 0xc7,
+		    0x0c, 0x0f, 0x4e, 0xc9, 0x1d, 0xc3, 0x48, 0xed,
+		    0x88, 0x2c, 0xf4, 0x3a, 0x76, 0x92, 0xed, 0x38,
+		    0x04, 0x53, 0xfe, 0x77, 0x38, 0x04, 0x88, 0xc7 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0x42, 0xf2, 0x35, 0x42, 0x97, 0x84, 0x9a, 0x51,
+		    0x1d, 0x53, 0xe5, 0x57, 0x17, 0x72, 0xf7, 0x1f }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30 },
+	.nonce	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x00, 0x02, 0x50, 0x6e },
+	.nlen	= 12,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0xd4, 0x50, 0x0b, 0xf0, 0x09, 0x49, 0x35, 0x51,
+		    0xc3, 0x80, 0xad, 0xf5, 0x2c, 0x57, 0x3a, 0x69,
+		    0xdf, 0x7e, 0x8b, 0x76, 0x24, 0x63, 0x33, 0x0f,
+		    0xac, 0xc1, 0x6a, 0x57, 0x26, 0xbe, 0x71, 0x90,
+		    0xc6, 0x3c, 0x5a, 0x1c, 0x92, 0x65, 0x84, 0xa0,
+		    0x96, 0x75, 0x68, 0x28, 0xdc, 0xdc, 0x64, 0xac,
+		    0xdf, 0x96, 0x3d, 0x93, 0x1b, 0xf1, 0xda, 0xe2,
+		    0x38, 0xf3, 0xf1, 0x57, 0x22, 0x4a, 0xc4, 0xb5,
+		    0x42, 0xd7, 0x85, 0xb0, 0xdd, 0x84, 0xdb, 0x6b,
+		    0xe3, 0xbc, 0x5a, 0x36, 0x63, 0xe8, 0x41, 0x49,
+		    0xff, 0xbe, 0xd0, 0x9e, 0x54, 0xf7, 0x8f, 0x16,
+		    0xa8, 0x22, 0x3b, 0x24, 0xcb, 0x01, 0x9f, 0x58,
+		    0xb2, 0x1b, 0x0e, 0x55, 0x1e, 0x7a, 0xa0, 0x73,
+		    0x27, 0x62, 0x95, 0x51, 0x37, 0x6c, 0xcb, 0xc3,
+		    0x93, 0x76, 0x71, 0xa0, 0x62, 0x9b, 0xd9, 0x5c,
+		    0x99, 0x15, 0xc7, 0x85, 0x55, 0x77, 0x1e, 0x7a },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x0b, 0x30, 0x0d, 0x8d, 0xa5, 0x6c, 0x21, 0x85,
+		    0x75, 0x52, 0x79, 0x55, 0x3c, 0x4c, 0x82, 0xca }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30 },
+	.nonce	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x00, 0x03, 0x18, 0xa5 },
+	.nlen	= 12,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0x7d, 0xe8, 0x7f, 0x67, 0x29, 0x94, 0x52, 0x75,
+		    0xd0, 0x65, 0x5d, 0xa4, 0xc7, 0xfd, 0xe4, 0x56,
+		    0x9e, 0x16, 0xf1, 0x11, 0xb5, 0xeb, 0x26, 0xc2,
+		    0x2d, 0x85, 0x9e, 0x3f, 0xf8, 0x22, 0xec, 0xed,
+		    0x3a, 0x6d, 0xd9, 0xa6, 0x0f, 0x22, 0x95, 0x7f,
+		    0x7b, 0x7c, 0x85, 0x7e, 0x88, 0x22, 0xeb, 0x9f,
+		    0xe0, 0xb8, 0xd7, 0x02, 0x21, 0x41, 0xf2, 0xd0,
+		    0xb4, 0x8f, 0x4b, 0x56, 0x12, 0xd3, 0x22, 0xa8,
+		    0x8d, 0xd0, 0xfe, 0x0b, 0x4d, 0x91, 0x79, 0x32,
+		    0x4f, 0x7c, 0x6c, 0x9e, 0x99, 0x0e, 0xfb, 0xd8,
+		    0x0e, 0x5e, 0xd6, 0x77, 0x58, 0x26, 0x49, 0x8b,
+		    0x1e, 0xfe, 0x0f, 0x71, 0xa0, 0xf3, 0xec, 0x5b,
+		    0x29, 0xcb, 0x28, 0xc2, 0x54, 0x0a, 0x7d, 0xcd,
+		    0x51, 0xb7, 0xda, 0xae, 0xe0, 0xff, 0x4a, 0x7f,
+		    0x3a, 0xc1, 0xee, 0x54, 0xc2, 0x9e, 0xe4, 0xc1,
+		    0x70, 0xde, 0x40, 0x8f, 0x66, 0x69, 0x21, 0x94 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xc5, 0x78, 0xe2, 0xaa, 0x44, 0xd3, 0x09, 0xb7,
+		    0xb6, 0xa5, 0x19, 0x3b, 0xdc, 0x61, 0x18, 0xf5 }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30 },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x07, 0xb4, 0xf0 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0x1b, 0x99, 0x6f, 0x9a, 0x3c, 0xcc, 0x67, 0x85,
+		    0xde, 0x22, 0xff, 0x5b, 0x8a, 0xdd, 0x95, 0x02,
+		    0xce, 0x03, 0xa0, 0xfa, 0xf5, 0x99, 0x2a, 0x09,
+		    0x52, 0x2c, 0xdd, 0x12, 0x06, 0xd2, 0x20, 0xb8,
+		    0xf8, 0xbd, 0x07, 0xd1, 0xf1, 0xf5, 0xa1, 0xbd,
+		    0x9a, 0x71, 0xd1, 0x1c, 0x7f, 0x57, 0x9b, 0x85,
+		    0x58, 0x18, 0xc0, 0x8d, 0x4d, 0xe0, 0x36, 0x39,
+		    0x31, 0x83, 0xb7, 0xf5, 0x90, 0xb3, 0x35, 0xae,
+		    0xd8, 0xde, 0x5b, 0x57, 0xb1, 0x3c, 0x5f, 0xed,
+		    0xe2, 0x44, 0x1c, 0x3e, 0x18, 0x4a, 0xa9, 0xd4,
+		    0x6e, 0x61, 0x59, 0x85, 0x06, 0xb3, 0xe1, 0x1c,
+		    0x43, 0xc6, 0x2c, 0xbc, 0xac, 0xec, 0xed, 0x33,
+		    0x19, 0x08, 0x75, 0xb0, 0x12, 0x21, 0x8b, 0x19,
+		    0x30, 0xfb, 0x7c, 0x38, 0xec, 0x45, 0xac, 0x11,
+		    0xc3, 0x53, 0xd0, 0xcf, 0x93, 0x8d, 0xcc, 0xb9,
+		    0xef, 0xad, 0x8f, 0xed, 0xbe, 0x46, 0xda, 0xa5 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x4b, 0x0b, 0xda, 0x8a, 0xd0, 0x43, 0x83, 0x0d,
+		    0x83, 0x19, 0xab, 0x82, 0xc5, 0x0c, 0x76, 0x63 }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30 },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x20, 0xfb, 0x66 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0x86, 0xcb, 0xac, 0xae, 0x4d, 0x3f, 0x74, 0xae,
+		    0x01, 0x21, 0x3e, 0x05, 0x51, 0xcc, 0x15, 0x16,
+		    0x0e, 0xa1, 0xbe, 0x84, 0x08, 0xe3, 0xd5, 0xd7,
+		    0x4f, 0x01, 0x46, 0x49, 0x95, 0xa6, 0x9e, 0x61,
+		    0x76, 0xcb, 0x9e, 0x02, 0xb2, 0x24, 0x7e, 0xd2,
+		    0x99, 0x89, 0x2f, 0x91, 0x82, 0xa4, 0x5c, 0xaf,
+		    0x4c, 0x69, 0x40, 0x56, 0x11, 0x76, 0x6e, 0xdf,
+		    0xaf, 0xdc, 0x28, 0x55, 0x19, 0xea, 0x30, 0x48,
+		    0x0c, 0x44, 0xf0, 0x5e, 0x78, 0x1e, 0xac, 0xf8,
+		    0xfc, 0xec, 0xc7, 0x09, 0x0a, 0xbb, 0x28, 0xfa,
+		    0x5f, 0xd5, 0x85, 0xac, 0x8c, 0xda, 0x7e, 0x87,
+		    0x72, 0xe5, 0x94, 0xe4, 0xce, 0x6c, 0x88, 0x32,
+		    0x81, 0x93, 0x2e, 0x0f, 0x89, 0xf8, 0x77, 0xa1,
+		    0xf0, 0x4d, 0x9c, 0x32, 0xb0, 0x6c, 0xf9, 0x0b,
+		    0x0e, 0x76, 0x2b, 0x43, 0x0c, 0x4d, 0x51, 0x7c,
+		    0x97, 0x10, 0x70, 0x68, 0xf4, 0x98, 0xef, 0x7f },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x4b, 0xc9, 0x8f, 0x72, 0xc4, 0x94, 0xc2, 0xa4,
+		    0x3c, 0x2b, 0x15, 0xa1, 0x04, 0x3f, 0x1c, 0xfa }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30 },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x38, 0xbb, 0x90 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0xfa, 0xb1, 0xcd, 0xdf, 0x4f, 0xe1, 0x98, 0xef,
+		    0x63, 0xad, 0xd8, 0x81, 0xd6, 0xea, 0xd6, 0xc5,
+		    0x76, 0x37, 0xbb, 0xe9, 0x20, 0x18, 0xca, 0x7c,
+		    0x0b, 0x96, 0xfb, 0xa0, 0x87, 0x1e, 0x93, 0x2d,
+		    0xb1, 0xfb, 0xf9, 0x07, 0x61, 0xbe, 0x25, 0xdf,
+		    0x8d, 0xfa, 0xf9, 0x31, 0xce, 0x57, 0x57, 0xe6,
+		    0x17, 0xb3, 0xd7, 0xa9, 0xf0, 0xbf, 0x0f, 0xfe,
+		    0x5d, 0x59, 0x1a, 0x33, 0xc1, 0x43, 0xb8, 0xf5,
+		    0x3f, 0xd0, 0xb5, 0xa1, 0x96, 0x09, 0xfd, 0x62,
+		    0xe5, 0xc2, 0x51, 0xa4, 0x28, 0x1a, 0x20, 0x0c,
+		    0xfd, 0xc3, 0x4f, 0x28, 0x17, 0x10, 0x40, 0x6f,
+		    0x4e, 0x37, 0x62, 0x54, 0x46, 0xff, 0x6e, 0xf2,
+		    0x24, 0x91, 0x3d, 0xeb, 0x0d, 0x89, 0xaf, 0x33,
+		    0x71, 0x28, 0xe3, 0xd1, 0x55, 0xd1, 0x6d, 0x3e,
+		    0xc3, 0x24, 0x60, 0x41, 0x43, 0x21, 0x43, 0xe9,
+		    0xab, 0x3a, 0x6d, 0x2c, 0xcc, 0x2f, 0x4d, 0x62 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xf7, 0xe9, 0xe1, 0x51, 0xb0, 0x25, 0x33, 0xc7,
+		    0x46, 0x58, 0xbf, 0xc7, 0x73, 0x7c, 0x68, 0x0d }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30 },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x48, 0x4a },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0x22, 0x72, 0x02, 0xbe, 0x7f, 0x35, 0x15, 0xe9,
+		    0xd1, 0xc0, 0x2e, 0xea, 0x2f, 0x19, 0x50, 0xb6,
+		    0x48, 0x1b, 0x04, 0x8a, 0x4c, 0x91, 0x50, 0x6c,
+		    0xb4, 0x0d, 0x50, 0x4e, 0x6c, 0x94, 0x9f, 0x82,
+		    0xd1, 0x97, 0xc2, 0x5a, 0xd1, 0x7d, 0xc7, 0x21,
+		    0x65, 0x11, 0x25, 0x78, 0x2a, 0xc7, 0xa7, 0x12,
+		    0x47, 0xfe, 0xae, 0xf3, 0x2f, 0x1f, 0x25, 0x0c,
+		    0xe4, 0xbb, 0x8f, 0x79, 0xac, 0xaa, 0x17, 0x9d,
+		    0x45, 0xa7, 0xb0, 0x54, 0x5f, 0x09, 0x24, 0x32,
+		    0x5e, 0xfa, 0x87, 0xd5, 0xe4, 0x41, 0xd2, 0x84,
+		    0x78, 0xc6, 0x1f, 0x22, 0x23, 0xee, 0x67, 0xc3,
+		    0xb4, 0x1f, 0x43, 0x94, 0x53, 0x5e, 0x2a, 0x24,
+		    0x36, 0x9a, 0x2e, 0x16, 0x61, 0x3c, 0x45, 0x94,
+		    0x90, 0xc1, 0x4f, 0xb1, 0xd7, 0x55, 0xfe, 0x53,
+		    0xfb, 0xe1, 0xee, 0x45, 0xb1, 0xb2, 0x1f, 0x71,
+		    0x62, 0xe2, 0xfc, 0xaa, 0x74, 0x2a, 0xbe, 0xfd },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x79, 0x5b, 0xcf, 0xf6, 0x47, 0xc5, 0x53, 0xc2,
+		    0xe4, 0xeb, 0x6e, 0x0e, 0xaf, 0xd9, 0xe0, 0x4e }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30 },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x93, 0x2f, 0x40 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0xfa, 0xe5, 0x83, 0x45, 0xc1, 0x6c, 0xb0, 0xf5,
+		    0xcc, 0x53, 0x7f, 0x2b, 0x1b, 0x34, 0x69, 0xc9,
+		    0x69, 0x46, 0x3b, 0x3e, 0xa7, 0x1b, 0xcf, 0x6b,
+		    0x98, 0xd6, 0x69, 0xa8, 0xe6, 0x0e, 0x04, 0xfc,
+		    0x08, 0xd5, 0xfd, 0x06, 0x9c, 0x36, 0x26, 0x38,
+		    0xe3, 0x40, 0x0e, 0xf4, 0xcb, 0x24, 0x2e, 0x27,
+		    0xe2, 0x24, 0x5e, 0x68, 0xcb, 0x9e, 0xc5, 0x83,
+		    0xda, 0x53, 0x40, 0xb1, 0x2e, 0xdf, 0x42, 0x3b,
+		    0x73, 0x26, 0xad, 0x20, 0xfe, 0xeb, 0x57, 0xda,
+		    0xca, 0x2e, 0x04, 0x67, 0xa3, 0x28, 0x99, 0xb4,
+		    0x2d, 0xf8, 0xe5, 0x6d, 0x84, 0xe0, 0x06, 0xbc,
+		    0x8a, 0x7a, 0xcc, 0x73, 0x1e, 0x7c, 0x1f, 0x6b,
+		    0xec, 0xb5, 0x71, 0x9f, 0x70, 0x77, 0xf0, 0xd4,
+		    0xf4, 0xc6, 0x1a, 0xb1, 0x1e, 0xba, 0xc1, 0x00,
+		    0x18, 0x01, 0xce, 0x33, 0xc4, 0xe4, 0xa7, 0x7d,
+		    0x83, 0x1d, 0x3c, 0xe3, 0x4e, 0x84, 0x10, 0xe1 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x19, 0x46, 0xd6, 0x53, 0x96, 0x0f, 0x94, 0x7a,
+		    0x74, 0xd3, 0xe8, 0x09, 0x3c, 0xf4, 0x85, 0x02 }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30,
+		    0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30, 0x30 },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0xe2, 0x93, 0x35 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0xeb, 0xb2, 0x16, 0xdd, 0xd7, 0xca, 0x70, 0x92,
+		    0x15, 0xf5, 0x03, 0xdf, 0x9c, 0xe6, 0x3c, 0x5c,
+		    0xd2, 0x19, 0x4e, 0x7d, 0x90, 0x99, 0xe8, 0xa9,
+		    0x0b, 0x2a, 0xfa, 0xad, 0x5e, 0xba, 0x35, 0x06,
+		    0x99, 0x25, 0xa6, 0x03, 0xfd, 0xbc, 0x34, 0x1a,
+		    0xae, 0xd4, 0x15, 0x05, 0xb1, 0x09, 0x41, 0xfa,
+		    0x38, 0x56, 0xa7, 0xe2, 0x47, 0xb1, 0x04, 0x07,
+		    0x09, 0x74, 0x6c, 0xfc, 0x20, 0x96, 0xca, 0xa6,
+		    0x31, 0xb2, 0xff, 0xf4, 0x1c, 0x25, 0x05, 0x06,
+		    0xd8, 0x89, 0xc1, 0xc9, 0x06, 0x71, 0xad, 0xe8,
+		    0x53, 0xee, 0x63, 0x94, 0xc1, 0x91, 0x92, 0xa5,
+		    0xcf, 0x37, 0x10, 0xd1, 0x07, 0x30, 0x99, 0xe5,
+		    0xbc, 0x94, 0x65, 0x82, 0xfc, 0x0f, 0xab, 0x9f,
+		    0x54, 0x3c, 0x71, 0x6a, 0xe2, 0x48, 0x6a, 0x86,
+		    0x83, 0xfd, 0xca, 0x39, 0xd2, 0xe1, 0x4f, 0x23,
+		    0xd0, 0x0a, 0x58, 0x26, 0x64, 0xf4, 0xec, 0xb1 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x36, 0xc3, 0x00, 0x29, 0x85, 0xdd, 0x21, 0xba,
+		    0xf8, 0x95, 0xd6, 0x33, 0x57, 0x3f, 0x12, 0xc0 }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x0e, 0xf7, 0xd5 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0x40, 0x8a, 0xe6, 0xef, 0x1c, 0x7e, 0xf0, 0xfb,
+		    0x2c, 0x2d, 0x61, 0x08, 0x16, 0xfc, 0x78, 0x49,
+		    0xef, 0xa5, 0x8f, 0x78, 0x27, 0x3f, 0x5f, 0x16,
+		    0x6e, 0xa6, 0x5f, 0x81, 0xb5, 0x75, 0x74, 0x7d,
+		    0x03, 0x5b, 0x30, 0x40, 0xfe, 0xde, 0x1e, 0xb9,
+		    0x45, 0x97, 0x88, 0x66, 0x97, 0x88, 0x40, 0x8e,
+		    0x00, 0x41, 0x3b, 0x3e, 0x37, 0x6d, 0x15, 0x2d,
+		    0x20, 0x4a, 0xa2, 0xb7, 0xa8, 0x35, 0x58, 0xfc,
+		    0xd4, 0x8a, 0x0e, 0xf7, 0xa2, 0x6b, 0x1c, 0xd6,
+		    0xd3, 0x5d, 0x23, 0xb3, 0xf5, 0xdf, 0xe0, 0xca,
+		    0x77, 0xa4, 0xce, 0x32, 0xb9, 0x4a, 0xbf, 0x83,
+		    0xda, 0x2a, 0xef, 0xca, 0xf0, 0x68, 0x38, 0x08,
+		    0x79, 0xe8, 0x9f, 0xb0, 0xa3, 0x82, 0x95, 0x95,
+		    0xcf, 0x44, 0xc3, 0x85, 0x2a, 0xe2, 0xcc, 0x66,
+		    0x2b, 0x68, 0x9f, 0x93, 0x55, 0xd9, 0xc1, 0x83,
+		    0x80, 0x1f, 0x6a, 0xcc, 0x31, 0x3f, 0x89, 0x07 },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x65, 0x14, 0x51, 0x8e, 0x0a, 0x26, 0x41, 0x42,
+		    0xe0, 0xb7, 0x35, 0x1f, 0x96, 0x7f, 0xc2, 0xae }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x00, 0x3d, 0xfc, 0xe4 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0x0a, 0x0a, 0x24, 0x49, 0x9b, 0xca, 0xde, 0x58,
+		    0xcf, 0x15, 0x76, 0xc3, 0x12, 0xac, 0xa9, 0x84,
+		    0x71, 0x8c, 0xb4, 0xcc, 0x7e, 0x01, 0x53, 0xf5,
+		    0xa9, 0x01, 0x58, 0x10, 0x85, 0x96, 0x44, 0xdf,
+		    0xc0, 0x21, 0x17, 0x4e, 0x0b, 0x06, 0x0a, 0x39,
+		    0x74, 0x48, 0xde, 0x8b, 0x48, 0x4a, 0x86, 0x03,
+		    0xbe, 0x68, 0x0a, 0x69, 0x34, 0xc0, 0x90, 0x6f,
+		    0x30, 0xdd, 0x17, 0xea, 0xe2, 0xd4, 0xc5, 0xfa,
+		    0xa7, 0x77, 0xf8, 0xca, 0x53, 0x37, 0x0e, 0x08,
+		    0x33, 0x1b, 0x88, 0xc3, 0x42, 0xba, 0xc9, 0x59,
+		    0x78, 0x7b, 0xbb, 0x33, 0x93, 0x0e, 0x3b, 0x56,
+		    0xbe, 0x86, 0xda, 0x7f, 0x2a, 0x6e, 0xb1, 0xf9,
+		    0x40, 0x89, 0xd1, 0xd1, 0x81, 0x07, 0x4d, 0x43,
+		    0x02, 0xf8, 0xe0, 0x55, 0x2d, 0x0d, 0xe1, 0xfa,
+		    0xb3, 0x06, 0xa2, 0x1b, 0x42, 0xd4, 0xc3, 0xba,
+		    0x6e, 0x6f, 0x0c, 0xbc, 0xc8, 0x1e, 0x87, 0x7a },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x4c, 0x19, 0x4d, 0xa6, 0xa9, 0x9f, 0xd6, 0x5b,
+		    0x40, 0xe9, 0xca, 0xd7, 0x98, 0xf4, 0x4b, 0x19 }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x01, 0x84, 0x86, 0xa8 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0x4a, 0x0a, 0xaf, 0xf8, 0x49, 0x47, 0x29, 0x18,
+		    0x86, 0x91, 0x70, 0x13, 0x40, 0xf3, 0xce, 0x2b,
+		    0x8a, 0x78, 0xee, 0xd3, 0xa0, 0xf0, 0x65, 0x99,
+		    0x4b, 0x72, 0x48, 0x4e, 0x79, 0x91, 0xd2, 0x5c,
+		    0x29, 0xaa, 0x07, 0x5e, 0xb1, 0xfc, 0x16, 0xde,
+		    0x93, 0xfe, 0x06, 0x90, 0x58, 0x11, 0x2a, 0xb2,
+		    0x84, 0xa3, 0xed, 0x18, 0x78, 0x03, 0x26, 0xd1,
+		    0x25, 0x8a, 0x47, 0x22, 0x2f, 0xa6, 0x33, 0xd8,
+		    0xb2, 0x9f, 0x3b, 0xd9, 0x15, 0x0b, 0x23, 0x9b,
+		    0x15, 0x46, 0xc2, 0xbb, 0x9b, 0x9f, 0x41, 0x0f,
+		    0xeb, 0xea, 0xd3, 0x96, 0x00, 0x0e, 0xe4, 0x77,
+		    0x70, 0x15, 0x32, 0xc3, 0xd0, 0xf5, 0xfb, 0xf8,
+		    0x95, 0xd2, 0x80, 0x19, 0x6d, 0x2f, 0x73, 0x7c,
+		    0x5e, 0x9f, 0xec, 0x50, 0xd9, 0x2b, 0xb0, 0xdf,
+		    0x5d, 0x7e, 0x51, 0x3b, 0xe5, 0xb8, 0xea, 0x97,
+		    0x13, 0x10, 0xd5, 0xbf, 0x16, 0xba, 0x7a, 0xee },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xc8, 0xae, 0x77, 0x88, 0xcd, 0x28, 0x74, 0xab,
+		    0xc1, 0x38, 0x54, 0x1e, 0x11, 0xfd, 0x05, 0x87 }
+}, { /* wycheproof - checking for int overflows */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+	.alen	= 64,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x78, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x9f, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0x9c, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0x47, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0xd4, 0xd2, 0x06, 0x61, 0x6f, 0x92, 0x93, 0xf6,
+		    0x5b, 0x45, 0xdb, 0xbc, 0x74, 0xe7, 0xc2, 0xed,
+		    0xfb, 0xcb, 0xbf, 0x1c, 0xfb, 0x67, 0x9b, 0xb7,
+		    0x39, 0xa5, 0x86, 0x2d, 0xe2, 0xbc, 0xb9, 0x37,
+		    0xf7, 0x4d, 0x5b, 0xf8, 0x67, 0x1c, 0x5a, 0x8a,
+		    0x50, 0x92, 0xf6, 0x1d, 0x54, 0xc9, 0xaa, 0x5b },
+	.ilen	= 128,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x93, 0x3a, 0x51, 0x63, 0xc7, 0xf6, 0x23, 0x68,
+		    0x32, 0x7b, 0x3f, 0xbc, 0x10, 0x36, 0xc9, 0x43 }
+}, { /* wycheproof - special case tag */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b },
+	.nlen	= 12,
+	.assoc	= { 0x85, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xa6, 0x90, 0x2f, 0xcb, 0xc8, 0x83, 0xbb, 0xc1,
+		    0x80, 0xb2, 0x56, 0xae, 0x34, 0xad, 0x7f, 0x00 },
+	.alen	= 32,
+	.input	= { 0x9a, 0x49, 0xc4, 0x0f, 0x8b, 0x48, 0xd7, 0xc6,
+		    0x6d, 0x1d, 0xb4, 0xe5, 0x3f, 0x20, 0xf2, 0xdd,
+		    0x4a, 0xaa, 0x24, 0x1d, 0xda, 0xb2, 0x6b, 0x5b,
+		    0xc0, 0xe2, 0x18, 0xb7, 0x2c, 0x33, 0x90, 0xf2,
+		    0xdf, 0x3e, 0xbd, 0x01, 0x76, 0x70, 0x44, 0x19,
+		    0x97, 0x2b, 0xcd, 0xbc, 0x6b, 0xbc, 0xb3, 0xe4,
+		    0xe7, 0x4a, 0x71, 0x52, 0x8e, 0xf5, 0x12, 0x63,
+		    0xce, 0x24, 0xe0, 0xd5, 0x75, 0xe0, 0xe4, 0x4d },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f }
+}, { /* wycheproof - special case tag */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b },
+	.nlen	= 12,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x24, 0x7e, 0x50, 0x64, 0x2a, 0x1c, 0x0a, 0x2f,
+		    0x8f, 0x77, 0x21, 0x96, 0x09, 0xdb, 0xa9, 0x58 },
+	.alen	= 32,
+	.input	= { 0x9a, 0x49, 0xc4, 0x0f, 0x8b, 0x48, 0xd7, 0xc6,
+		    0x6d, 0x1d, 0xb4, 0xe5, 0x3f, 0x20, 0xf2, 0xdd,
+		    0x4a, 0xaa, 0x24, 0x1d, 0xda, 0xb2, 0x6b, 0x5b,
+		    0xc0, 0xe2, 0x18, 0xb7, 0x2c, 0x33, 0x90, 0xf2,
+		    0xdf, 0x3e, 0xbd, 0x01, 0x76, 0x70, 0x44, 0x19,
+		    0x97, 0x2b, 0xcd, 0xbc, 0x6b, 0xbc, 0xb3, 0xe4,
+		    0xe7, 0x4a, 0x71, 0x52, 0x8e, 0xf5, 0x12, 0x63,
+		    0xce, 0x24, 0xe0, 0xd5, 0x75, 0xe0, 0xe4, 0x4d },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }
+}, { /* wycheproof - special case tag */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b },
+	.nlen	= 12,
+	.assoc	= { 0x7c, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xd9, 0xe7, 0x2c, 0x06, 0x4a, 0xc8, 0x96, 0x1f,
+		    0x3f, 0xa5, 0x85, 0xe0, 0xe2, 0xab, 0xd6, 0x00 },
+	.alen	= 32,
+	.input	= { 0x9a, 0x49, 0xc4, 0x0f, 0x8b, 0x48, 0xd7, 0xc6,
+		    0x6d, 0x1d, 0xb4, 0xe5, 0x3f, 0x20, 0xf2, 0xdd,
+		    0x4a, 0xaa, 0x24, 0x1d, 0xda, 0xb2, 0x6b, 0x5b,
+		    0xc0, 0xe2, 0x18, 0xb7, 0x2c, 0x33, 0x90, 0xf2,
+		    0xdf, 0x3e, 0xbd, 0x01, 0x76, 0x70, 0x44, 0x19,
+		    0x97, 0x2b, 0xcd, 0xbc, 0x6b, 0xbc, 0xb3, 0xe4,
+		    0xe7, 0x4a, 0x71, 0x52, 0x8e, 0xf5, 0x12, 0x63,
+		    0xce, 0x24, 0xe0, 0xd5, 0x75, 0xe0, 0xe4, 0x4d },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }
+}, { /* wycheproof - special case tag */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b },
+	.nlen	= 12,
+	.assoc	= { 0x65, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x95, 0xaf, 0x0f, 0x4d, 0x0b, 0x68, 0x6e, 0xae,
+		    0xcc, 0xca, 0x43, 0x07, 0xd5, 0x96, 0xf5, 0x02 },
+	.alen	= 32,
+	.input	= { 0x9a, 0x49, 0xc4, 0x0f, 0x8b, 0x48, 0xd7, 0xc6,
+		    0x6d, 0x1d, 0xb4, 0xe5, 0x3f, 0x20, 0xf2, 0xdd,
+		    0x4a, 0xaa, 0x24, 0x1d, 0xda, 0xb2, 0x6b, 0x5b,
+		    0xc0, 0xe2, 0x18, 0xb7, 0x2c, 0x33, 0x90, 0xf2,
+		    0xdf, 0x3e, 0xbd, 0x01, 0x76, 0x70, 0x44, 0x19,
+		    0x97, 0x2b, 0xcd, 0xbc, 0x6b, 0xbc, 0xb3, 0xe4,
+		    0xe7, 0x4a, 0x71, 0x52, 0x8e, 0xf5, 0x12, 0x63,
+		    0xce, 0x24, 0xe0, 0xd5, 0x75, 0xe0, 0xe4, 0x4d },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80,
+		    0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x80 }
+}, { /* wycheproof - special case tag */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b },
+	.nlen	= 12,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x85, 0x40, 0xb4, 0x64, 0x35, 0x77, 0x07, 0xbe,
+		    0x3a, 0x39, 0xd5, 0x5c, 0x34, 0xf8, 0xbc, 0xb3 },
+	.alen	= 32,
+	.input	= { 0x9a, 0x49, 0xc4, 0x0f, 0x8b, 0x48, 0xd7, 0xc6,
+		    0x6d, 0x1d, 0xb4, 0xe5, 0x3f, 0x20, 0xf2, 0xdd,
+		    0x4a, 0xaa, 0x24, 0x1d, 0xda, 0xb2, 0x6b, 0x5b,
+		    0xc0, 0xe2, 0x18, 0xb7, 0x2c, 0x33, 0x90, 0xf2,
+		    0xdf, 0x3e, 0xbd, 0x01, 0x76, 0x70, 0x44, 0x19,
+		    0x97, 0x2b, 0xcd, 0xbc, 0x6b, 0xbc, 0xb3, 0xe4,
+		    0xe7, 0x4a, 0x71, 0x52, 0x8e, 0xf5, 0x12, 0x63,
+		    0xce, 0x24, 0xe0, 0xd5, 0x75, 0xe0, 0xe4, 0x4d },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f,
+		    0xff, 0xff, 0xff, 0x7f, 0xff, 0xff, 0xff, 0x7f }
+}, { /* wycheproof - special case tag */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b },
+	.nlen	= 12,
+	.assoc	= { 0x4f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x66, 0x23, 0xd9, 0x90, 0xb8, 0x98, 0xd8, 0x30,
+		    0xd2, 0x12, 0xaf, 0x23, 0x83, 0x33, 0x07, 0x01 },
+	.alen	= 32,
+	.input	= { 0x9a, 0x49, 0xc4, 0x0f, 0x8b, 0x48, 0xd7, 0xc6,
+		    0x6d, 0x1d, 0xb4, 0xe5, 0x3f, 0x20, 0xf2, 0xdd,
+		    0x4a, 0xaa, 0x24, 0x1d, 0xda, 0xb2, 0x6b, 0x5b,
+		    0xc0, 0xe2, 0x18, 0xb7, 0x2c, 0x33, 0x90, 0xf2,
+		    0xdf, 0x3e, 0xbd, 0x01, 0x76, 0x70, 0x44, 0x19,
+		    0x97, 0x2b, 0xcd, 0xbc, 0x6b, 0xbc, 0xb3, 0xe4,
+		    0xe7, 0x4a, 0x71, 0x52, 0x8e, 0xf5, 0x12, 0x63,
+		    0xce, 0x24, 0xe0, 0xd5, 0x75, 0xe0, 0xe4, 0x4d },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00,
+		    0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00 }
+}, { /* wycheproof - special case tag */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b },
+	.nlen	= 12,
+	.assoc	= { 0x83, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x5f, 0x16, 0xd0, 0x9f, 0x17, 0x78, 0x72, 0x11,
+		    0xb7, 0xd4, 0x84, 0xe0, 0x24, 0xf8, 0x97, 0x01 },
+	.alen	= 32,
+	.input	= { 0x9a, 0x49, 0xc4, 0x0f, 0x8b, 0x48, 0xd7, 0xc6,
+		    0x6d, 0x1d, 0xb4, 0xe5, 0x3f, 0x20, 0xf2, 0xdd,
+		    0x4a, 0xaa, 0x24, 0x1d, 0xda, 0xb2, 0x6b, 0x5b,
+		    0xc0, 0xe2, 0x18, 0xb7, 0x2c, 0x33, 0x90, 0xf2,
+		    0xdf, 0x3e, 0xbd, 0x01, 0x76, 0x70, 0x44, 0x19,
+		    0x97, 0x2b, 0xcd, 0xbc, 0x6b, 0xbc, 0xb3, 0xe4,
+		    0xe7, 0x4a, 0x71, 0x52, 0x8e, 0xf5, 0x12, 0x63,
+		    0xce, 0x24, 0xe0, 0xd5, 0x75, 0xe0, 0xe4, 0x4d },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0x00, 0x52, 0x35, 0xd2, 0xa9, 0x19, 0xf2, 0x8d,
+		    0x3d, 0xb7, 0x66, 0x4a, 0x34, 0xae, 0x6b, 0x44,
+		    0x4d, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x5b, 0x8b, 0x94, 0x50, 0x9e, 0x2b, 0x74, 0xa3,
+		    0x6d, 0x34, 0x6e, 0x33, 0xd5, 0x72, 0x65, 0x9b,
+		    0xa9, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0x83, 0xdc, 0xe9, 0xf3, 0x07, 0x3e, 0xfa, 0xdb,
+		    0x7d, 0x23, 0xb8, 0x7a, 0xce, 0x35, 0x16, 0x8c },
+	.ilen	= 80,
+	.result	= { 0x00, 0x39, 0xe2, 0xfd, 0x2f, 0xd3, 0x12, 0x14,
+		    0x9e, 0x98, 0x98, 0x80, 0x88, 0x48, 0x13, 0xe7,
+		    0xca, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x3b, 0x0e, 0x86, 0x9a, 0xaa, 0x8e, 0xa4, 0x96,
+		    0x32, 0xff, 0xff, 0x37, 0xb9, 0xe8, 0xce, 0x00,
+		    0xca, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x3b, 0x0e, 0x86, 0x9a, 0xaa, 0x8e, 0xa4, 0x96,
+		    0x32, 0xff, 0xff, 0x37, 0xb9, 0xe8, 0xce, 0x00,
+		    0xa5, 0x19, 0xac, 0x1a, 0x35, 0xb4, 0xa5, 0x77,
+		    0x87, 0x51, 0x0a, 0xf7, 0x8d, 0x8d, 0x20, 0x0a }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xd3, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0xe5, 0xda, 0x78, 0x76, 0x6f, 0xa1, 0x92, 0x90,
+		    0xc0, 0x31, 0xf7, 0x52, 0x08, 0x50, 0x67, 0x45,
+		    0xae, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0x49, 0x6d, 0xde, 0xb0, 0x55, 0x09, 0xc6, 0xef,
+		    0xff, 0xab, 0x75, 0xeb, 0x2d, 0xf4, 0xab, 0x09,
+		    0x76, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x01, 0x49, 0xef, 0x50, 0x4b, 0x71, 0xb1, 0x20,
+		    0xca, 0x4f, 0xf3, 0x95, 0x19, 0xc2, 0xc2, 0x10 },
+	.ilen	= 96,
+	.result	= { 0xd3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x62, 0x18, 0xb2, 0x7f, 0x83, 0xb8, 0xb4, 0x66,
+		    0x02, 0xf6, 0xe1, 0xd8, 0x34, 0x20, 0x7b, 0x02,
+		    0xce, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x2a, 0x64, 0x16, 0xce, 0xdb, 0x1c, 0xdd, 0x29,
+		    0x6e, 0xf5, 0xd7, 0xd6, 0x92, 0xda, 0xff, 0x02,
+		    0xce, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x2a, 0x64, 0x16, 0xce, 0xdb, 0x1c, 0xdd, 0x29,
+		    0x6e, 0xf5, 0xd7, 0xd6, 0x92, 0xda, 0xff, 0x02,
+		    0x30, 0x2f, 0xe8, 0x2a, 0xb0, 0xa0, 0x9a, 0xf6,
+		    0x44, 0x00, 0xd0, 0x15, 0xae, 0x83, 0xd9, 0xcc }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xe9, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x6d, 0xf1, 0x39, 0x4e, 0xdc, 0x53, 0x9b, 0x5b,
+		    0x3a, 0x09, 0x57, 0xbe, 0x0f, 0xb8, 0x59, 0x46,
+		    0x80, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0xd1, 0x76, 0x9f, 0xe8, 0x06, 0xbb, 0xfe, 0xb6,
+		    0xf5, 0x90, 0x95, 0x0f, 0x2e, 0xac, 0x9e, 0x0a,
+		    0x58, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x99, 0x52, 0xae, 0x08, 0x18, 0xc3, 0x89, 0x79,
+		    0xc0, 0x74, 0x13, 0x71, 0x1a, 0x9a, 0xf7, 0x13 },
+	.ilen	= 96,
+	.result	= { 0xe9, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xea, 0x33, 0xf3, 0x47, 0x30, 0x4a, 0xbd, 0xad,
+		    0xf8, 0xce, 0x41, 0x34, 0x33, 0xc8, 0x45, 0x01,
+		    0xe0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xb2, 0x7f, 0x57, 0x96, 0x88, 0xae, 0xe5, 0x70,
+		    0x64, 0xce, 0x37, 0x32, 0x91, 0x82, 0xca, 0x01,
+		    0xe0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xb2, 0x7f, 0x57, 0x96, 0x88, 0xae, 0xe5, 0x70,
+		    0x64, 0xce, 0x37, 0x32, 0x91, 0x82, 0xca, 0x01,
+		    0x98, 0xa7, 0xe8, 0x36, 0xe0, 0xee, 0x4d, 0x02,
+		    0x35, 0x00, 0xd0, 0x55, 0x7e, 0xc2, 0xcb, 0xe0 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x64, 0xf9, 0x0f, 0x5b, 0x26, 0x92, 0xb8, 0x60,
+		    0xd4, 0x59, 0x6f, 0xf4, 0xb3, 0x40, 0x2c, 0x5c,
+		    0x00, 0xb9, 0xbb, 0x53, 0x70, 0x7a, 0xa6, 0x67,
+		    0xd3, 0x56, 0xfe, 0x50, 0xc7, 0x19, 0x96, 0x94,
+		    0x03, 0x35, 0x61, 0xe7, 0xca, 0xca, 0x6d, 0x94,
+		    0x1d, 0xc3, 0xcd, 0x69, 0x14, 0xad, 0x69, 0x04 },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xe3, 0x3b, 0xc5, 0x52, 0xca, 0x8b, 0x9e, 0x96,
+		    0x16, 0x9e, 0x79, 0x7e, 0x8f, 0x30, 0x30, 0x1b,
+		    0x60, 0x3c, 0xa9, 0x99, 0x44, 0xdf, 0x76, 0x52,
+		    0x8c, 0x9d, 0x6f, 0x54, 0xab, 0x83, 0x3d, 0x0f,
+		    0x60, 0x3c, 0xa9, 0x99, 0x44, 0xdf, 0x76, 0x52,
+		    0x8c, 0x9d, 0x6f, 0x54, 0xab, 0x83, 0x3d, 0x0f,
+		    0x6a, 0xb8, 0xdc, 0xe2, 0xc5, 0x9d, 0xa4, 0x73,
+		    0x71, 0x30, 0xb0, 0x25, 0x2f, 0x68, 0xa8, 0xd8 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0x68, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0xb0, 0x8f, 0x25, 0x67, 0x5b, 0x9b, 0xcb, 0xf6,
+		    0xe3, 0x84, 0x07, 0xde, 0x2e, 0xc7, 0x5a, 0x47,
+		    0x9f, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0x2d, 0x2a, 0xf7, 0xcd, 0x6b, 0x08, 0x05, 0x01,
+		    0xd3, 0x1b, 0xa5, 0x4f, 0xb2, 0xeb, 0x75, 0x96,
+		    0x47, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x65, 0x0e, 0xc6, 0x2d, 0x75, 0x70, 0x72, 0xce,
+		    0xe6, 0xff, 0x23, 0x31, 0x86, 0xdd, 0x1c, 0x8f },
+	.ilen	= 96,
+	.result	= { 0x68, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x37, 0x4d, 0xef, 0x6e, 0xb7, 0x82, 0xed, 0x00,
+		    0x21, 0x43, 0x11, 0x54, 0x12, 0xb7, 0x46, 0x00,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x4e, 0x23, 0x3f, 0xb3, 0xe5, 0x1d, 0x1e, 0xc7,
+		    0x42, 0x45, 0x07, 0x72, 0x0d, 0xc5, 0x21, 0x9d,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x4e, 0x23, 0x3f, 0xb3, 0xe5, 0x1d, 0x1e, 0xc7,
+		    0x42, 0x45, 0x07, 0x72, 0x0d, 0xc5, 0x21, 0x9d,
+		    0x04, 0x4d, 0xea, 0x60, 0x88, 0x80, 0x41, 0x2b,
+		    0xfd, 0xff, 0xcf, 0x35, 0x57, 0x9e, 0x9b, 0x26 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0x6d, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0xa1, 0x61, 0xb5, 0xab, 0x04, 0x09, 0x00, 0x62,
+		    0x9e, 0xfe, 0xff, 0x78, 0xd7, 0xd8, 0x6b, 0x45,
+		    0x9f, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0xc6, 0xf8, 0x07, 0x8c, 0xc8, 0xef, 0x12, 0xa0,
+		    0xff, 0x65, 0x7d, 0x6d, 0x08, 0xdb, 0x10, 0xb8,
+		    0x47, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x8e, 0xdc, 0x36, 0x6c, 0xd6, 0x97, 0x65, 0x6f,
+		    0xca, 0x81, 0xfb, 0x13, 0x3c, 0xed, 0x79, 0xa1 },
+	.ilen	= 96,
+	.result	= { 0x6d, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x26, 0xa3, 0x7f, 0xa2, 0xe8, 0x10, 0x26, 0x94,
+		    0x5c, 0x39, 0xe9, 0xf2, 0xeb, 0xa8, 0x77, 0x02,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xa5, 0xf1, 0xcf, 0xf2, 0x46, 0xfa, 0x09, 0x66,
+		    0x6e, 0x3b, 0xdf, 0x50, 0xb7, 0xf5, 0x44, 0xb3,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xa5, 0xf1, 0xcf, 0xf2, 0x46, 0xfa, 0x09, 0x66,
+		    0x6e, 0x3b, 0xdf, 0x50, 0xb7, 0xf5, 0x44, 0xb3,
+		    0x1e, 0x6b, 0xea, 0x63, 0x14, 0x54, 0x2e, 0x2e,
+		    0xf9, 0xff, 0xcf, 0x45, 0x0b, 0x2e, 0x98, 0x2b }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0xfc, 0x01, 0xb8, 0x91, 0xe5, 0xf0, 0xf9, 0x12,
+		    0x8d, 0x7d, 0x1c, 0x57, 0x91, 0x92, 0xb6, 0x98,
+		    0x63, 0x41, 0x44, 0x15, 0xb6, 0x99, 0x68, 0x95,
+		    0x9a, 0x72, 0x91, 0xb7, 0xa5, 0xaf, 0x13, 0x48,
+		    0x60, 0xcd, 0x9e, 0xa1, 0x0c, 0x29, 0xa3, 0x66,
+		    0x54, 0xe7, 0xa2, 0x8e, 0x76, 0x1b, 0xec, 0xd8 },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x7b, 0xc3, 0x72, 0x98, 0x09, 0xe9, 0xdf, 0xe4,
+		    0x4f, 0xba, 0x0a, 0xdd, 0xad, 0xe2, 0xaa, 0xdf,
+		    0x03, 0xc4, 0x56, 0xdf, 0x82, 0x3c, 0xb8, 0xa0,
+		    0xc5, 0xb9, 0x00, 0xb3, 0xc9, 0x35, 0xb8, 0xd3,
+		    0x03, 0xc4, 0x56, 0xdf, 0x82, 0x3c, 0xb8, 0xa0,
+		    0xc5, 0xb9, 0x00, 0xb3, 0xc9, 0x35, 0xb8, 0xd3,
+		    0xed, 0x20, 0x17, 0xc8, 0xdb, 0xa4, 0x77, 0x56,
+		    0x29, 0x04, 0x9d, 0x78, 0x6e, 0x3b, 0xce, 0xb1 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x6b, 0x6d, 0xc9, 0xd2, 0x1a, 0x81, 0x9e, 0x70,
+		    0xb5, 0x77, 0xf4, 0x41, 0x37, 0xd3, 0xd6, 0xbd,
+		    0x13, 0x35, 0xf5, 0xeb, 0x44, 0x49, 0x40, 0x77,
+		    0xb2, 0x64, 0x49, 0xa5, 0x4b, 0x6c, 0x7c, 0x75,
+		    0x10, 0xb9, 0x2f, 0x5f, 0xfe, 0xf9, 0x8b, 0x84,
+		    0x7c, 0xf1, 0x7a, 0x9c, 0x98, 0xd8, 0x83, 0xe5 },
+	.ilen	= 64,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xec, 0xaf, 0x03, 0xdb, 0xf6, 0x98, 0xb8, 0x86,
+		    0x77, 0xb0, 0xe2, 0xcb, 0x0b, 0xa3, 0xca, 0xfa,
+		    0x73, 0xb0, 0xe7, 0x21, 0x70, 0xec, 0x90, 0x42,
+		    0xed, 0xaf, 0xd8, 0xa1, 0x27, 0xf6, 0xd7, 0xee,
+		    0x73, 0xb0, 0xe7, 0x21, 0x70, 0xec, 0x90, 0x42,
+		    0xed, 0xaf, 0xd8, 0xa1, 0x27, 0xf6, 0xd7, 0xee,
+		    0x07, 0x3f, 0x17, 0xcb, 0x67, 0x78, 0x64, 0x59,
+		    0x25, 0x04, 0x9d, 0x88, 0x22, 0xcb, 0xca, 0xb6 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0xcb, 0x2b, 0x11, 0x06, 0xf8, 0x23, 0x4c,
+		    0x5e, 0x99, 0xd4, 0xdb, 0x4c, 0x70, 0x48, 0xde,
+		    0x32, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x16, 0xe9, 0x88, 0x4a, 0x11, 0x4f, 0x0e, 0x92,
+		    0x66, 0xce, 0xa3, 0x88, 0x5f, 0xe3, 0x6b, 0x9f,
+		    0xd6, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0xce, 0xbe, 0xf5, 0xe9, 0x88, 0x5a, 0x80, 0xea,
+		    0x76, 0xd9, 0x75, 0xc1, 0x44, 0xa4, 0x18, 0x88 },
+	.ilen	= 80,
+	.result	= { 0xff, 0xa0, 0xfc, 0x3e, 0x80, 0x32, 0xc3, 0xd5,
+		    0xfd, 0xb6, 0x2a, 0x11, 0xf0, 0x96, 0x30, 0x7d,
+		    0xb5, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x76, 0x6c, 0x9a, 0x80, 0x25, 0xea, 0xde, 0xa7,
+		    0x39, 0x05, 0x32, 0x8c, 0x33, 0x79, 0xc0, 0x04,
+		    0xb5, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x76, 0x6c, 0x9a, 0x80, 0x25, 0xea, 0xde, 0xa7,
+		    0x39, 0x05, 0x32, 0x8c, 0x33, 0x79, 0xc0, 0x04,
+		    0x8b, 0x9b, 0xb4, 0xb4, 0x86, 0x12, 0x89, 0x65,
+		    0x8c, 0x69, 0x6a, 0x83, 0x40, 0x15, 0x04, 0x05 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0x6f, 0x9e, 0x70, 0xed, 0x3b, 0x8b, 0xac, 0xa0,
+		    0x26, 0xe4, 0x6a, 0x5a, 0x09, 0x43, 0x15, 0x8d,
+		    0x21, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x0c, 0x61, 0x2c, 0x5e, 0x8d, 0x89, 0xa8, 0x73,
+		    0xdb, 0xca, 0xad, 0x5b, 0x73, 0x46, 0x42, 0x9b,
+		    0xc5, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0xd4, 0x36, 0x51, 0xfd, 0x14, 0x9c, 0x26, 0x0b,
+		    0xcb, 0xdd, 0x7b, 0x12, 0x68, 0x01, 0x31, 0x8c },
+	.ilen	= 80,
+	.result	= { 0x6f, 0xf5, 0xa7, 0xc2, 0xbd, 0x41, 0x4c, 0x39,
+		    0x85, 0xcb, 0x94, 0x90, 0xb5, 0xa5, 0x6d, 0x2e,
+		    0xa6, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x6c, 0xe4, 0x3e, 0x94, 0xb9, 0x2c, 0x78, 0x46,
+		    0x84, 0x01, 0x3c, 0x5f, 0x1f, 0xdc, 0xe9, 0x00,
+		    0xa6, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x6c, 0xe4, 0x3e, 0x94, 0xb9, 0x2c, 0x78, 0x46,
+		    0x84, 0x01, 0x3c, 0x5f, 0x1f, 0xdc, 0xe9, 0x00,
+		    0x8b, 0x3b, 0xbd, 0x51, 0x64, 0x44, 0x59, 0x56,
+		    0x8d, 0x81, 0xca, 0x1f, 0xa7, 0x2c, 0xe4, 0x04 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0x41, 0x2b, 0x08, 0x0a, 0x3e, 0x19, 0xc1, 0x0d,
+		    0x44, 0xa1, 0xaf, 0x1e, 0xab, 0xde, 0xb4, 0xce,
+		    0x35, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x6b, 0x83, 0x94, 0x33, 0x09, 0x21, 0x48, 0x6c,
+		    0xa1, 0x1d, 0x29, 0x1c, 0x3e, 0x97, 0xee, 0x9a,
+		    0xd1, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0xb3, 0xd4, 0xe9, 0x90, 0x90, 0x34, 0xc6, 0x14,
+		    0xb1, 0x0a, 0xff, 0x55, 0x25, 0xd0, 0x9d, 0x8d },
+	.ilen	= 80,
+	.result	= { 0x41, 0x40, 0xdf, 0x25, 0xb8, 0xd3, 0x21, 0x94,
+		    0xe7, 0x8e, 0x51, 0xd4, 0x17, 0x38, 0xcc, 0x6d,
+		    0xb2, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x0b, 0x06, 0x86, 0xf9, 0x3d, 0x84, 0x98, 0x59,
+		    0xfe, 0xd6, 0xb8, 0x18, 0x52, 0x0d, 0x45, 0x01,
+		    0xb2, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x0b, 0x06, 0x86, 0xf9, 0x3d, 0x84, 0x98, 0x59,
+		    0xfe, 0xd6, 0xb8, 0x18, 0x52, 0x0d, 0x45, 0x01,
+		    0x86, 0xfb, 0xab, 0x2b, 0x4a, 0x94, 0xf4, 0x7a,
+		    0xa5, 0x6f, 0x0a, 0xea, 0x65, 0xd1, 0x10, 0x08 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xb2, 0x47, 0xa7, 0x47, 0x23, 0x49, 0x1a, 0xac,
+		    0xac, 0xaa, 0xd7, 0x09, 0xc9, 0x1e, 0x93, 0x2b,
+		    0x31, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x9a, 0xde, 0x04, 0xe7, 0x5b, 0xb7, 0x01, 0xd9,
+		    0x66, 0x06, 0x01, 0xb3, 0x47, 0x65, 0xde, 0x98,
+		    0xd5, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0x42, 0x89, 0x79, 0x44, 0xc2, 0xa2, 0x8f, 0xa1,
+		    0x76, 0x11, 0xd7, 0xfa, 0x5c, 0x22, 0xad, 0x8f },
+	.ilen	= 80,
+	.result	= { 0xb2, 0x2c, 0x70, 0x68, 0xa5, 0x83, 0xfa, 0x35,
+		    0x0f, 0x85, 0x29, 0xc3, 0x75, 0xf8, 0xeb, 0x88,
+		    0xb6, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xfa, 0x5b, 0x16, 0x2d, 0x6f, 0x12, 0xd1, 0xec,
+		    0x39, 0xcd, 0x90, 0xb7, 0x2b, 0xff, 0x75, 0x03,
+		    0xb6, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xfa, 0x5b, 0x16, 0x2d, 0x6f, 0x12, 0xd1, 0xec,
+		    0x39, 0xcd, 0x90, 0xb7, 0x2b, 0xff, 0x75, 0x03,
+		    0xa0, 0x19, 0xac, 0x2e, 0xd6, 0x67, 0xe1, 0x7d,
+		    0xa1, 0x6f, 0x0a, 0xfa, 0x19, 0x61, 0x0d, 0x0d }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0x74, 0x0f, 0x9e, 0x49, 0xf6, 0x10, 0xef, 0xa5,
+		    0x85, 0xb6, 0x59, 0xca, 0x6e, 0xd8, 0xb4, 0x99,
+		    0x2d, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x41, 0x2d, 0x96, 0xaf, 0xbe, 0x80, 0xec, 0x3e,
+		    0x79, 0xd4, 0x51, 0xb0, 0x0a, 0x2d, 0xb2, 0x9a,
+		    0xc9, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0x99, 0x7a, 0xeb, 0x0c, 0x27, 0x95, 0x62, 0x46,
+		    0x69, 0xc3, 0x87, 0xf9, 0x11, 0x6a, 0xc1, 0x8d },
+	.ilen	= 80,
+	.result	= { 0x74, 0x64, 0x49, 0x66, 0x70, 0xda, 0x0f, 0x3c,
+		    0x26, 0x99, 0xa7, 0x00, 0xd2, 0x3e, 0xcc, 0x3a,
+		    0xaa, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x21, 0xa8, 0x84, 0x65, 0x8a, 0x25, 0x3c, 0x0b,
+		    0x26, 0x1f, 0xc0, 0xb4, 0x66, 0xb7, 0x19, 0x01,
+		    0xaa, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x21, 0xa8, 0x84, 0x65, 0x8a, 0x25, 0x3c, 0x0b,
+		    0x26, 0x1f, 0xc0, 0xb4, 0x66, 0xb7, 0x19, 0x01,
+		    0x73, 0x6e, 0x18, 0x18, 0x16, 0x96, 0xa5, 0x88,
+		    0x9c, 0x31, 0x59, 0xfa, 0xab, 0xab, 0x20, 0xfd }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xad, 0xba, 0x5d, 0x10, 0x5b, 0xc8, 0xaa, 0x06,
+		    0x2c, 0x23, 0x36, 0xcb, 0x88, 0x9d, 0xdb, 0xd5,
+		    0x37, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x17, 0x7c, 0x5f, 0xfe, 0x28, 0x75, 0xf4, 0x68,
+		    0xf6, 0xc2, 0x96, 0x57, 0x48, 0xf3, 0x59, 0x9a,
+		    0xd3, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0xcf, 0x2b, 0x22, 0x5d, 0xb1, 0x60, 0x7a, 0x10,
+		    0xe6, 0xd5, 0x40, 0x1e, 0x53, 0xb4, 0x2a, 0x8d },
+	.ilen	= 80,
+	.result	= { 0xad, 0xd1, 0x8a, 0x3f, 0xdd, 0x02, 0x4a, 0x9f,
+		    0x8f, 0x0c, 0xc8, 0x01, 0x34, 0x7b, 0xa3, 0x76,
+		    0xb0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x77, 0xf9, 0x4d, 0x34, 0x1c, 0xd0, 0x24, 0x5d,
+		    0xa9, 0x09, 0x07, 0x53, 0x24, 0x69, 0xf2, 0x01,
+		    0xb0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x77, 0xf9, 0x4d, 0x34, 0x1c, 0xd0, 0x24, 0x5d,
+		    0xa9, 0x09, 0x07, 0x53, 0x24, 0x69, 0xf2, 0x01,
+		    0xba, 0xd5, 0x8f, 0x10, 0xa9, 0x1e, 0x6a, 0x88,
+		    0x9a, 0xba, 0x32, 0xfd, 0x17, 0xd8, 0x33, 0x1a }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xfe, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0xc0, 0x01, 0xed, 0xc5, 0xda, 0x44, 0x2e, 0x71,
+		    0x9b, 0xce, 0x9a, 0xbe, 0x27, 0x3a, 0xf1, 0x44,
+		    0xb4, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0x48, 0x02, 0x5f, 0x41, 0xfa, 0x4e, 0x33, 0x6c,
+		    0x78, 0x69, 0x57, 0xa2, 0xa7, 0xc4, 0x93, 0x0a,
+		    0x6c, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x00, 0x26, 0x6e, 0xa1, 0xe4, 0x36, 0x44, 0xa3,
+		    0x4d, 0x8d, 0xd1, 0xdc, 0x93, 0xf2, 0xfa, 0x13 },
+	.ilen	= 96,
+	.result	= { 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x47, 0xc3, 0x27, 0xcc, 0x36, 0x5d, 0x08, 0x87,
+		    0x59, 0x09, 0x8c, 0x34, 0x1b, 0x4a, 0xed, 0x03,
+		    0xd4, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x2b, 0x0b, 0x97, 0x3f, 0x74, 0x5b, 0x28, 0xaa,
+		    0xe9, 0x37, 0xf5, 0x9f, 0x18, 0xea, 0xc7, 0x01,
+		    0xd4, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x2b, 0x0b, 0x97, 0x3f, 0x74, 0x5b, 0x28, 0xaa,
+		    0xe9, 0x37, 0xf5, 0x9f, 0x18, 0xea, 0xc7, 0x01,
+		    0xd6, 0x8c, 0xe1, 0x74, 0x07, 0x9a, 0xdd, 0x02,
+		    0x8d, 0xd0, 0x5c, 0xf8, 0x14, 0x63, 0x04, 0x88 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xb5, 0x13, 0xb0, 0x6a, 0xb9, 0xac, 0x14, 0x43,
+		    0x5a, 0xcb, 0x8a, 0xa3, 0xa3, 0x7a, 0xfd, 0xb6,
+		    0x54, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x61, 0x95, 0x01, 0x93, 0xb1, 0xbf, 0x03, 0x11,
+		    0xff, 0x11, 0x79, 0x89, 0xae, 0xd9, 0xa9, 0x99,
+		    0xb0, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0xb9, 0xc2, 0x7c, 0x30, 0x28, 0xaa, 0x8d, 0x69,
+		    0xef, 0x06, 0xaf, 0xc0, 0xb5, 0x9e, 0xda, 0x8e },
+	.ilen	= 80,
+	.result	= { 0xb5, 0x78, 0x67, 0x45, 0x3f, 0x66, 0xf4, 0xda,
+		    0xf9, 0xe4, 0x74, 0x69, 0x1f, 0x9c, 0x85, 0x15,
+		    0xd3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x01, 0x10, 0x13, 0x59, 0x85, 0x1a, 0xd3, 0x24,
+		    0xa0, 0xda, 0xe8, 0x8d, 0xc2, 0x43, 0x02, 0x02,
+		    0xd3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x01, 0x10, 0x13, 0x59, 0x85, 0x1a, 0xd3, 0x24,
+		    0xa0, 0xda, 0xe8, 0x8d, 0xc2, 0x43, 0x02, 0x02,
+		    0xaa, 0x48, 0xa3, 0x88, 0x7d, 0x4b, 0x05, 0x96,
+		    0x99, 0xc2, 0xfd, 0xf9, 0xc6, 0x78, 0x7e, 0x0a }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0xd4, 0xf1, 0x09, 0xe8, 0x14, 0xce, 0xa8, 0x5a,
+		    0x08, 0xc0, 0x11, 0xd8, 0x50, 0xdd, 0x1d, 0xcb,
+		    0xcf, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0x53, 0x40, 0xb8, 0x5a, 0x9a, 0xa0, 0x82, 0x96,
+		    0xb7, 0x7a, 0x5f, 0xc3, 0x96, 0x1f, 0x66, 0x0f,
+		    0x17, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x1b, 0x64, 0x89, 0xba, 0x84, 0xd8, 0xf5, 0x59,
+		    0x82, 0x9e, 0xd9, 0xbd, 0xa2, 0x29, 0x0f, 0x16 },
+	.ilen	= 96,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x53, 0x33, 0xc3, 0xe1, 0xf8, 0xd7, 0x8e, 0xac,
+		    0xca, 0x07, 0x07, 0x52, 0x6c, 0xad, 0x01, 0x8c,
+		    0xaf, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x30, 0x49, 0x70, 0x24, 0x14, 0xb5, 0x99, 0x50,
+		    0x26, 0x24, 0xfd, 0xfe, 0x29, 0x31, 0x32, 0x04,
+		    0xaf, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x30, 0x49, 0x70, 0x24, 0x14, 0xb5, 0x99, 0x50,
+		    0x26, 0x24, 0xfd, 0xfe, 0x29, 0x31, 0x32, 0x04,
+		    0xb9, 0x36, 0xa8, 0x17, 0xf2, 0x21, 0x1a, 0xf1,
+		    0x29, 0xe2, 0xcf, 0x16, 0x0f, 0xd4, 0x2b, 0xcb }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0xdf, 0x4c, 0x62, 0x03, 0x2d, 0x41, 0x19, 0xb5,
+		    0x88, 0x47, 0x7e, 0x99, 0x92, 0x5a, 0x56, 0xd9,
+		    0xd6, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0xfa, 0x84, 0xf0, 0x64, 0x55, 0x36, 0x42, 0x1b,
+		    0x2b, 0xb9, 0x24, 0x6e, 0xc2, 0x19, 0xed, 0x0b,
+		    0x0e, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0xb2, 0xa0, 0xc1, 0x84, 0x4b, 0x4e, 0x35, 0xd4,
+		    0x1e, 0x5d, 0xa2, 0x10, 0xf6, 0x2f, 0x84, 0x12 },
+	.ilen	= 96,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x58, 0x8e, 0xa8, 0x0a, 0xc1, 0x58, 0x3f, 0x43,
+		    0x4a, 0x80, 0x68, 0x13, 0xae, 0x2a, 0x4a, 0x9e,
+		    0xb6, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x99, 0x8d, 0x38, 0x1a, 0xdb, 0x23, 0x59, 0xdd,
+		    0xba, 0xe7, 0x86, 0x53, 0x7d, 0x37, 0xb9, 0x00,
+		    0xb6, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x99, 0x8d, 0x38, 0x1a, 0xdb, 0x23, 0x59, 0xdd,
+		    0xba, 0xe7, 0x86, 0x53, 0x7d, 0x37, 0xb9, 0x00,
+		    0x9f, 0x7a, 0xc4, 0x35, 0x1f, 0x6b, 0x91, 0xe6,
+		    0x30, 0x97, 0xa7, 0x13, 0x11, 0x5d, 0x05, 0xbe }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x13, 0xf8, 0x0a, 0x00, 0x6d, 0xc1, 0xbb, 0xda,
+		    0xd6, 0x39, 0xa9, 0x2f, 0xc7, 0xec, 0xa6, 0x55,
+		    0xf7, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0x63, 0x48, 0xb8, 0xfd, 0x29, 0xbf, 0x96, 0xd5,
+		    0x63, 0xa5, 0x17, 0xe2, 0x7d, 0x7b, 0xfc, 0x0f,
+		    0x2f, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x2b, 0x6c, 0x89, 0x1d, 0x37, 0xc7, 0xe1, 0x1a,
+		    0x56, 0x41, 0x91, 0x9c, 0x49, 0x4d, 0x95, 0x16 },
+	.ilen	= 96,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x94, 0x3a, 0xc0, 0x09, 0x81, 0xd8, 0x9d, 0x2c,
+		    0x14, 0xfe, 0xbf, 0xa5, 0xfb, 0x9c, 0xba, 0x12,
+		    0x97, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x41, 0x70, 0x83, 0xa7, 0xaa, 0x8d, 0x13,
+		    0xf2, 0xfb, 0xb5, 0xdf, 0xc2, 0x55, 0xa8, 0x04,
+		    0x97, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x00, 0x41, 0x70, 0x83, 0xa7, 0xaa, 0x8d, 0x13,
+		    0xf2, 0xfb, 0xb5, 0xdf, 0xc2, 0x55, 0xa8, 0x04,
+		    0x9a, 0x18, 0xa8, 0x28, 0x07, 0x02, 0x69, 0xf4,
+		    0x47, 0x00, 0xd0, 0x09, 0xe7, 0x17, 0x1c, 0xc9 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x82, 0xe5, 0x9b, 0x45, 0x82, 0x91, 0x50, 0x38,
+		    0xf9, 0x33, 0x81, 0x1e, 0x65, 0x2d, 0xc6, 0x6a,
+		    0xfc, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0xb6, 0x71, 0xc8, 0xca, 0xc2, 0x70, 0xc2, 0x65,
+		    0xa0, 0xac, 0x2f, 0x53, 0x57, 0x99, 0x88, 0x0a,
+		    0x24, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0xfe, 0x55, 0xf9, 0x2a, 0xdc, 0x08, 0xb5, 0xaa,
+		    0x95, 0x48, 0xa9, 0x2d, 0x63, 0xaf, 0xe1, 0x13 },
+	.ilen	= 96,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x05, 0x27, 0x51, 0x4c, 0x6e, 0x88, 0x76, 0xce,
+		    0x3b, 0xf4, 0x97, 0x94, 0x59, 0x5d, 0xda, 0x2d,
+		    0x9c, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xd5, 0x78, 0x00, 0xb4, 0x4c, 0x65, 0xd9, 0xa3,
+		    0x31, 0xf2, 0x8d, 0x6e, 0xe8, 0xb7, 0xdc, 0x01,
+		    0x9c, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xd5, 0x78, 0x00, 0xb4, 0x4c, 0x65, 0xd9, 0xa3,
+		    0x31, 0xf2, 0x8d, 0x6e, 0xe8, 0xb7, 0xdc, 0x01,
+		    0xb4, 0x36, 0xa8, 0x2b, 0x93, 0xd5, 0x55, 0xf7,
+		    0x43, 0x00, 0xd0, 0x19, 0x9b, 0xa7, 0x18, 0xce }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xff, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0xf1, 0xd1, 0x28, 0x87, 0xb7, 0x21, 0x69, 0x86,
+		    0xa1, 0x2d, 0x79, 0x09, 0x8b, 0x6d, 0xe6, 0x0f,
+		    0xc0, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0xa7, 0xc7, 0x58, 0x99, 0xf3, 0xe6, 0x0a, 0xf1,
+		    0xfc, 0xb6, 0xc7, 0x30, 0x7d, 0x87, 0x59, 0x0f,
+		    0x18, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0xef, 0xe3, 0x69, 0x79, 0xed, 0x9e, 0x7d, 0x3e,
+		    0xc9, 0x52, 0x41, 0x4e, 0x49, 0xb1, 0x30, 0x16 },
+	.ilen	= 96,
+	.result	= { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x76, 0x13, 0xe2, 0x8e, 0x5b, 0x38, 0x4f, 0x70,
+		    0x63, 0xea, 0x6f, 0x83, 0xb7, 0x1d, 0xfa, 0x48,
+		    0xa0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xc4, 0xce, 0x90, 0xe7, 0x7d, 0xf3, 0x11, 0x37,
+		    0x6d, 0xe8, 0x65, 0x0d, 0xc2, 0xa9, 0x0d, 0x04,
+		    0xa0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xc4, 0xce, 0x90, 0xe7, 0x7d, 0xf3, 0x11, 0x37,
+		    0x6d, 0xe8, 0x65, 0x0d, 0xc2, 0xa9, 0x0d, 0x04,
+		    0xce, 0x54, 0xa8, 0x2e, 0x1f, 0xa9, 0x42, 0xfa,
+		    0x3f, 0x00, 0xd0, 0x29, 0x4f, 0x37, 0x15, 0xd3 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xcb, 0xf1, 0xda, 0x9e, 0x0b, 0xa9, 0x37, 0x73,
+		    0x74, 0xe6, 0x9e, 0x1c, 0x0e, 0x60, 0x0c, 0xfc,
+		    0x34, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0xbe, 0x3f, 0xa6, 0x6b, 0x6c, 0xe7, 0x80, 0x8a,
+		    0xa3, 0xe4, 0x59, 0x49, 0xf9, 0x44, 0x64, 0x9f,
+		    0xd0, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0x66, 0x68, 0xdb, 0xc8, 0xf5, 0xf2, 0x0e, 0xf2,
+		    0xb3, 0xf3, 0x8f, 0x00, 0xe2, 0x03, 0x17, 0x88 },
+	.ilen	= 80,
+	.result	= { 0xcb, 0x9a, 0x0d, 0xb1, 0x8d, 0x63, 0xd7, 0xea,
+		    0xd7, 0xc9, 0x60, 0xd6, 0xb2, 0x86, 0x74, 0x5f,
+		    0xb3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xde, 0xba, 0xb4, 0xa1, 0x58, 0x42, 0x50, 0xbf,
+		    0xfc, 0x2f, 0xc8, 0x4d, 0x95, 0xde, 0xcf, 0x04,
+		    0xb3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xde, 0xba, 0xb4, 0xa1, 0x58, 0x42, 0x50, 0xbf,
+		    0xfc, 0x2f, 0xc8, 0x4d, 0x95, 0xde, 0xcf, 0x04,
+		    0x23, 0x83, 0xab, 0x0b, 0x79, 0x92, 0x05, 0x69,
+		    0x9b, 0x51, 0x0a, 0xa7, 0x09, 0xbf, 0x31, 0xf1 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0x8f, 0x27, 0x86, 0x94, 0xc4, 0xe9, 0xda, 0xeb,
+		    0xd5, 0x8d, 0x3e, 0x5b, 0x96, 0x6e, 0x8b, 0x68,
+		    0x42, 0x3d, 0x35, 0xf6, 0x13, 0xe6, 0xd9, 0x09,
+		    0x3d, 0x38, 0xe9, 0x75, 0xc3, 0x8f, 0xe3, 0xb8,
+		    0x06, 0x53, 0xe7, 0xa3, 0x31, 0x71, 0x88, 0x33,
+		    0xac, 0xc3, 0xb9, 0xad, 0xff, 0x1c, 0x31, 0x98,
+		    0xa6, 0xf6, 0x37, 0x81, 0x71, 0xea, 0xe4, 0x39,
+		    0x6e, 0xa1, 0x5d, 0xc2, 0x40, 0xd1, 0xab, 0xf4,
+		    0xde, 0x04, 0x9a, 0x00, 0xa8, 0x64, 0x06, 0x4b,
+		    0xbc, 0xd4, 0x6f, 0xe4, 0xe4, 0x5b, 0x42, 0x8f },
+	.ilen	= 80,
+	.result	= { 0x8f, 0x4c, 0x51, 0xbb, 0x42, 0x23, 0x3a, 0x72,
+		    0x76, 0xa2, 0xc0, 0x91, 0x2a, 0x88, 0xf3, 0xcb,
+		    0xc5, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x66, 0xd6, 0xf5, 0x69, 0x05, 0xd4, 0x58, 0x06,
+		    0xf3, 0x08, 0x28, 0xa9, 0x93, 0x86, 0x9a, 0x03,
+		    0xc5, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x66, 0xd6, 0xf5, 0x69, 0x05, 0xd4, 0x58, 0x06,
+		    0xf3, 0x08, 0x28, 0xa9, 0x93, 0x86, 0x9a, 0x03,
+		    0x8b, 0xfb, 0xab, 0x17, 0xa9, 0xe0, 0xb8, 0x74,
+		    0x8b, 0x51, 0x0a, 0xe7, 0xd9, 0xfd, 0x23, 0x05 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xd5, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x9a, 0x22, 0xd7, 0x0a, 0x48, 0xe2, 0x4f, 0xdd,
+		    0xcd, 0xd4, 0x41, 0x9d, 0xe6, 0x4c, 0x8f, 0x44,
+		    0xfc, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0x77, 0xb5, 0xc9, 0x07, 0xd9, 0xc9, 0xe1, 0xea,
+		    0x51, 0x85, 0x1a, 0x20, 0x4a, 0xad, 0x9f, 0x0a,
+		    0x24, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x3f, 0x91, 0xf8, 0xe7, 0xc7, 0xb1, 0x96, 0x25,
+		    0x64, 0x61, 0x9c, 0x5e, 0x7e, 0x9b, 0xf6, 0x13 },
+	.ilen	= 96,
+	.result	= { 0xd5, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x1d, 0xe0, 0x1d, 0x03, 0xa4, 0xfb, 0x69, 0x2b,
+		    0x0f, 0x13, 0x57, 0x17, 0xda, 0x3c, 0x93, 0x03,
+		    0x9c, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x14, 0xbc, 0x01, 0x79, 0x57, 0xdc, 0xfa, 0x2c,
+		    0xc0, 0xdb, 0xb8, 0x1d, 0xf5, 0x83, 0xcb, 0x01,
+		    0x9c, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x14, 0xbc, 0x01, 0x79, 0x57, 0xdc, 0xfa, 0x2c,
+		    0xc0, 0xdb, 0xb8, 0x1d, 0xf5, 0x83, 0xcb, 0x01,
+		    0x49, 0xbc, 0x6e, 0x9f, 0xc5, 0x1c, 0x4d, 0x50,
+		    0x30, 0x36, 0x64, 0x4d, 0x84, 0x27, 0x73, 0xd2 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0xdb, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x75, 0xd5, 0x64, 0x3a, 0xa5, 0xaf, 0x93, 0x4d,
+		    0x8c, 0xce, 0x39, 0x2c, 0xc3, 0xee, 0xdb, 0x47,
+		    0xc0, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0x60, 0x1b, 0x5a, 0xd2, 0x06, 0x7f, 0x28, 0x06,
+		    0x6a, 0x8f, 0x32, 0x81, 0x71, 0x5b, 0xa8, 0x08,
+		    0x18, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x28, 0x3f, 0x6b, 0x32, 0x18, 0x07, 0x5f, 0xc9,
+		    0x5f, 0x6b, 0xb4, 0xff, 0x45, 0x6d, 0xc1, 0x11 },
+	.ilen	= 96,
+	.result	= { 0xdb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xf2, 0x17, 0xae, 0x33, 0x49, 0xb6, 0xb5, 0xbb,
+		    0x4e, 0x09, 0x2f, 0xa6, 0xff, 0x9e, 0xc7, 0x00,
+		    0xa0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x03, 0x12, 0x92, 0xac, 0x88, 0x6a, 0x33, 0xc0,
+		    0xfb, 0xd1, 0x90, 0xbc, 0xce, 0x75, 0xfc, 0x03,
+		    0xa0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0x03, 0x12, 0x92, 0xac, 0x88, 0x6a, 0x33, 0xc0,
+		    0xfb, 0xd1, 0x90, 0xbc, 0xce, 0x75, 0xfc, 0x03,
+		    0x63, 0xda, 0x6e, 0xa2, 0x51, 0xf0, 0x39, 0x53,
+		    0x2c, 0x36, 0x64, 0x5d, 0x38, 0xb7, 0x6f, 0xd7 }
+}, { /* wycheproof - edge case intermediate sums in poly1305 */
+	.key	= { 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+		    0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+		    0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
+		    0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f },
+	.nonce	= { 0x00, 0x00, 0x00, 0x00, 0x06, 0x4c, 0x2d, 0x52 },
+	.nlen	= 8,
+	.assoc	= { 0xff, 0xff, 0xff, 0xff },
+	.alen	= 4,
+	.input	= { 0x93, 0x94, 0x28, 0xd0, 0x79, 0x35, 0x1f, 0x66,
+		    0x5c, 0xd0, 0x01, 0x35, 0x43, 0x19, 0x87, 0x5c,
+		    0x62, 0x48, 0x39, 0x60, 0x42, 0x16, 0xe4, 0x03,
+		    0xeb, 0xcc, 0x6a, 0xf5, 0x59, 0xec, 0x8b, 0x43,
+		    0x97, 0x7a, 0xed, 0x35, 0xcb, 0x5a, 0x2f, 0xca,
+		    0xa0, 0x34, 0x6e, 0xfb, 0x93, 0x65, 0x54, 0x64,
+		    0xd8, 0xc8, 0xc3, 0xfa, 0x1a, 0x9e, 0x47, 0x4a,
+		    0xbe, 0x52, 0xd0, 0x2c, 0x81, 0x87, 0xe9, 0x0f,
+		    0x4f, 0x2d, 0x90, 0x96, 0x52, 0x4f, 0xa1, 0xb2,
+		    0xb0, 0x23, 0xb8, 0xb2, 0x88, 0x22, 0x27, 0x73,
+		    0x90, 0xec, 0xf2, 0x1a, 0x04, 0xe6, 0x30, 0x85,
+		    0x8b, 0xb6, 0x56, 0x52, 0xb5, 0xb1, 0x80, 0x16 },
+	.ilen	= 96,
+	.result	= { 0x93, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xe5, 0x8a, 0xf3, 0x69, 0xae, 0x0f, 0xc2, 0xf5,
+		    0x29, 0x0b, 0x7c, 0x7f, 0x65, 0x9c, 0x97, 0x04,
+		    0xf7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xbb, 0xc1, 0x0b, 0x84, 0x94, 0x8b, 0x5c, 0x8c,
+		    0x2f, 0x0c, 0x72, 0x11, 0x3e, 0xa9, 0xbd, 0x04,
+		    0xf7, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+		    0xbb, 0xc1, 0x0b, 0x84, 0x94, 0x8b, 0x5c, 0x8c,
+		    0x2f, 0x0c, 0x72, 0x11, 0x3e, 0xa9, 0xbd, 0x04,
+		    0x73, 0xeb, 0x27, 0x24, 0xb5, 0xc4, 0x05, 0xf0,
+		    0x4d, 0x00, 0xd0, 0xf1, 0x58, 0x40, 0xa1, 0xc1 }
+} };
+static const struct chacha20poly1305_testvec
+chacha20poly1305_dec_vectors[] __initconst = { {
+	.key	= { 0x1c, 0x92, 0x40, 0xa5, 0xeb, 0x55, 0xd3, 0x8a,
+		    0xf3, 0x33, 0x88, 0x86, 0x04, 0xf6, 0xb5, 0xf0,
+		    0x47, 0x39, 0x17, 0xc1, 0x40, 0x2b, 0x80, 0x09,
+		    0x9d, 0xca, 0x5c, 0xbc, 0x20, 0x70, 0x75, 0xc0 },
+	.nonce	= { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 },
+	.nlen	= 8,
+	.assoc	= { 0xf3, 0x33, 0x88, 0x86, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x4e, 0x91 },
+	.alen	= 12,
+	.input	= { 0x64, 0xa0, 0x86, 0x15, 0x75, 0x86, 0x1a, 0xf4,
+		    0x60, 0xf0, 0x62, 0xc7, 0x9b, 0xe6, 0x43, 0xbd,
+		    0x5e, 0x80, 0x5c, 0xfd, 0x34, 0x5c, 0xf3, 0x89,
+		    0xf1, 0x08, 0x67, 0x0a, 0xc7, 0x6c, 0x8c, 0xb2,
+		    0x4c, 0x6c, 0xfc, 0x18, 0x75, 0x5d, 0x43, 0xee,
+		    0xa0, 0x9e, 0xe9, 0x4e, 0x38, 0x2d, 0x26, 0xb0,
+		    0xbd, 0xb7, 0xb7, 0x3c, 0x32, 0x1b, 0x01, 0x00,
+		    0xd4, 0xf0, 0x3b, 0x7f, 0x35, 0x58, 0x94, 0xcf,
+		    0x33, 0x2f, 0x83, 0x0e, 0x71, 0x0b, 0x97, 0xce,
+		    0x98, 0xc8, 0xa8, 0x4a, 0xbd, 0x0b, 0x94, 0x81,
+		    0x14, 0xad, 0x17, 0x6e, 0x00, 0x8d, 0x33, 0xbd,
+		    0x60, 0xf9, 0x82, 0xb1, 0xff, 0x37, 0xc8, 0x55,
+		    0x97, 0x97, 0xa0, 0x6e, 0xf4, 0xf0, 0xef, 0x61,
+		    0xc1, 0x86, 0x32, 0x4e, 0x2b, 0x35, 0x06, 0x38,
+		    0x36, 0x06, 0x90, 0x7b, 0x6a, 0x7c, 0x02, 0xb0,
+		    0xf9, 0xf6, 0x15, 0x7b, 0x53, 0xc8, 0x67, 0xe4,
+		    0xb9, 0x16, 0x6c, 0x76, 0x7b, 0x80, 0x4d, 0x46,
+		    0xa5, 0x9b, 0x52, 0x16, 0xcd, 0xe7, 0xa4, 0xe9,
+		    0x90, 0x40, 0xc5, 0xa4, 0x04, 0x33, 0x22, 0x5e,
+		    0xe2, 0x82, 0xa1, 0xb0, 0xa0, 0x6c, 0x52, 0x3e,
+		    0xaf, 0x45, 0x34, 0xd7, 0xf8, 0x3f, 0xa1, 0x15,
+		    0x5b, 0x00, 0x47, 0x71, 0x8c, 0xbc, 0x54, 0x6a,
+		    0x0d, 0x07, 0x2b, 0x04, 0xb3, 0x56, 0x4e, 0xea,
+		    0x1b, 0x42, 0x22, 0x73, 0xf5, 0x48, 0x27, 0x1a,
+		    0x0b, 0xb2, 0x31, 0x60, 0x53, 0xfa, 0x76, 0x99,
+		    0x19, 0x55, 0xeb, 0xd6, 0x31, 0x59, 0x43, 0x4e,
+		    0xce, 0xbb, 0x4e, 0x46, 0x6d, 0xae, 0x5a, 0x10,
+		    0x73, 0xa6, 0x72, 0x76, 0x27, 0x09, 0x7a, 0x10,
+		    0x49, 0xe6, 0x17, 0xd9, 0x1d, 0x36, 0x10, 0x94,
+		    0xfa, 0x68, 0xf0, 0xff, 0x77, 0x98, 0x71, 0x30,
+		    0x30, 0x5b, 0xea, 0xba, 0x2e, 0xda, 0x04, 0xdf,
+		    0x99, 0x7b, 0x71, 0x4d, 0x6c, 0x6f, 0x2c, 0x29,
+		    0xa6, 0xad, 0x5c, 0xb4, 0x02, 0x2b, 0x02, 0x70,
+		    0x9b, 0xee, 0xad, 0x9d, 0x67, 0x89, 0x0c, 0xbb,
+		    0x22, 0x39, 0x23, 0x36, 0xfe, 0xa1, 0x85, 0x1f,
+		    0x38 },
+	.ilen	= 281,
+	.result	= { 0x49, 0x6e, 0x74, 0x65, 0x72, 0x6e, 0x65, 0x74,
+		    0x2d, 0x44, 0x72, 0x61, 0x66, 0x74, 0x73, 0x20,
+		    0x61, 0x72, 0x65, 0x20, 0x64, 0x72, 0x61, 0x66,
+		    0x74, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+		    0x6e, 0x74, 0x73, 0x20, 0x76, 0x61, 0x6c, 0x69,
+		    0x64, 0x20, 0x66, 0x6f, 0x72, 0x20, 0x61, 0x20,
+		    0x6d, 0x61, 0x78, 0x69, 0x6d, 0x75, 0x6d, 0x20,
+		    0x6f, 0x66, 0x20, 0x73, 0x69, 0x78, 0x20, 0x6d,
+		    0x6f, 0x6e, 0x74, 0x68, 0x73, 0x20, 0x61, 0x6e,
+		    0x64, 0x20, 0x6d, 0x61, 0x79, 0x20, 0x62, 0x65,
+		    0x20, 0x75, 0x70, 0x64, 0x61, 0x74, 0x65, 0x64,
+		    0x2c, 0x20, 0x72, 0x65, 0x70, 0x6c, 0x61, 0x63,
+		    0x65, 0x64, 0x2c, 0x20, 0x6f, 0x72, 0x20, 0x6f,
+		    0x62, 0x73, 0x6f, 0x6c, 0x65, 0x74, 0x65, 0x64,
+		    0x20, 0x62, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65,
+		    0x72, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+		    0x6e, 0x74, 0x73, 0x20, 0x61, 0x74, 0x20, 0x61,
+		    0x6e, 0x79, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x2e,
+		    0x20, 0x49, 0x74, 0x20, 0x69, 0x73, 0x20, 0x69,
+		    0x6e, 0x61, 0x70, 0x70, 0x72, 0x6f, 0x70, 0x72,
+		    0x69, 0x61, 0x74, 0x65, 0x20, 0x74, 0x6f, 0x20,
+		    0x75, 0x73, 0x65, 0x20, 0x49, 0x6e, 0x74, 0x65,
+		    0x72, 0x6e, 0x65, 0x74, 0x2d, 0x44, 0x72, 0x61,
+		    0x66, 0x74, 0x73, 0x20, 0x61, 0x73, 0x20, 0x72,
+		    0x65, 0x66, 0x65, 0x72, 0x65, 0x6e, 0x63, 0x65,
+		    0x20, 0x6d, 0x61, 0x74, 0x65, 0x72, 0x69, 0x61,
+		    0x6c, 0x20, 0x6f, 0x72, 0x20, 0x74, 0x6f, 0x20,
+		    0x63, 0x69, 0x74, 0x65, 0x20, 0x74, 0x68, 0x65,
+		    0x6d, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20,
+		    0x74, 0x68, 0x61, 0x6e, 0x20, 0x61, 0x73, 0x20,
+		    0x2f, 0xe2, 0x80, 0x9c, 0x77, 0x6f, 0x72, 0x6b,
+		    0x20, 0x69, 0x6e, 0x20, 0x70, 0x72, 0x6f, 0x67,
+		    0x72, 0x65, 0x73, 0x73, 0x2e, 0x2f, 0xe2, 0x80,
+		    0x9d }
+}, {
+	.key	= { 0x4c, 0xf5, 0x96, 0x83, 0x38, 0xe6, 0xae, 0x7f,
+		    0x2d, 0x29, 0x25, 0x76, 0xd5, 0x75, 0x27, 0x86,
+		    0x91, 0x9a, 0x27, 0x7a, 0xfb, 0x46, 0xc5, 0xef,
+		    0x94, 0x81, 0x79, 0x57, 0x14, 0x59, 0x40, 0x68 },
+	.nonce	= { 0xca, 0xbf, 0x33, 0x71, 0x32, 0x45, 0x77, 0x8e },
+	.nlen	= 8,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xea, 0xe0, 0x1e, 0x9e, 0x2c, 0x91, 0xaa, 0xe1,
+		    0xdb, 0x5d, 0x99, 0x3f, 0x8a, 0xf7, 0x69, 0x92 },
+	.ilen	= 16,
+	.result	= ""
+}, {
+	.key	= { 0x2d, 0xb0, 0x5d, 0x40, 0xc8, 0xed, 0x44, 0x88,
+		    0x34, 0xd1, 0x13, 0xaf, 0x57, 0xa1, 0xeb, 0x3a,
+		    0x2a, 0x80, 0x51, 0x36, 0xec, 0x5b, 0xbc, 0x08,
+		    0x93, 0x84, 0x21, 0xb5, 0x13, 0x88, 0x3c, 0x0d },
+	.nonce	= { 0x3d, 0x86, 0xb5, 0x6b, 0xc8, 0xa3, 0x1f, 0x1d },
+	.nlen	= 8,
+	.assoc	= { 0x33, 0x10, 0x41, 0x12, 0x1f, 0xf3, 0xd2, 0x6b },
+	.alen	= 8,
+	.input	= { 0xdd, 0x6b, 0x3b, 0x82, 0xce, 0x5a, 0xbd, 0xd6,
+		    0xa9, 0x35, 0x83, 0xd8, 0x8c, 0x3d, 0x85, 0x77 },
+	.ilen	= 16,
+	.result	= ""
+}, {
+	.key	= { 0x4b, 0x28, 0x4b, 0xa3, 0x7b, 0xbe, 0xe9, 0xf8,
+		    0x31, 0x80, 0x82, 0xd7, 0xd8, 0xe8, 0xb5, 0xa1,
+		    0xe2, 0x18, 0x18, 0x8a, 0x9c, 0xfa, 0xa3, 0x3d,
+		    0x25, 0x71, 0x3e, 0x40, 0xbc, 0x54, 0x7a, 0x3e },
+	.nonce	= { 0xd2, 0x32, 0x1f, 0x29, 0x28, 0xc6, 0xc4, 0xc4 },
+	.nlen	= 8,
+	.assoc	= { 0x6a, 0xe2, 0xad, 0x3f, 0x88, 0x39, 0x5a, 0x40 },
+	.alen	= 8,
+	.input	= { 0xb7, 0x1b, 0xb0, 0x73, 0x59, 0xb0, 0x84, 0xb2,
+		    0x6d, 0x8e, 0xab, 0x94, 0x31, 0xa1, 0xae, 0xac,
+		    0x89 },
+	.ilen	= 17,
+	.result	= { 0xa4 }
+}, {
+	.key	= { 0x66, 0xca, 0x9c, 0x23, 0x2a, 0x4b, 0x4b, 0x31,
+		    0x0e, 0x92, 0x89, 0x8b, 0xf4, 0x93, 0xc7, 0x87,
+		    0x98, 0xa3, 0xd8, 0x39, 0xf8, 0xf4, 0xa7, 0x01,
+		    0xc0, 0x2e, 0x0a, 0xa6, 0x7e, 0x5a, 0x78, 0x87 },
+	.nonce	= { 0x20, 0x1c, 0xaa, 0x5f, 0x9c, 0xbf, 0x92, 0x30 },
+	.nlen	= 8,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0xbf, 0xe1, 0x5b, 0x0b, 0xdb, 0x6b, 0xf5, 0x5e,
+		    0x6c, 0x5d, 0x84, 0x44, 0x39, 0x81, 0xc1, 0x9c,
+		    0xac },
+	.ilen	= 17,
+	.result	= { 0x2d }
+}, {
+	.key	= { 0x68, 0x7b, 0x8d, 0x8e, 0xe3, 0xc4, 0xdd, 0xae,
+		    0xdf, 0x72, 0x7f, 0x53, 0x72, 0x25, 0x1e, 0x78,
+		    0x91, 0xcb, 0x69, 0x76, 0x1f, 0x49, 0x93, 0xf9,
+		    0x6f, 0x21, 0xcc, 0x39, 0x9c, 0xad, 0xb1, 0x01 },
+	.nonce	= { 0xdf, 0x51, 0x84, 0x82, 0x42, 0x0c, 0x75, 0x9c },
+	.nlen	= 8,
+	.assoc	= { 0x70, 0xd3, 0x33, 0xf3, 0x8b, 0x18, 0x0b },
+	.alen	= 7,
+	.input	= { 0x8b, 0x06, 0xd3, 0x31, 0xb0, 0x93, 0x45, 0xb1,
+		    0x75, 0x6e, 0x26, 0xf9, 0x67, 0xbc, 0x90, 0x15,
+		    0x81, 0x2c, 0xb5, 0xf0, 0xc6, 0x2b, 0xc7, 0x8c,
+		    0x56, 0xd1, 0xbf, 0x69, 0x6c, 0x07, 0xa0, 0xda,
+		    0x65, 0x27, 0xc9, 0x90, 0x3d, 0xef, 0x4b, 0x11,
+		    0x0f, 0x19, 0x07, 0xfd, 0x29, 0x92, 0xd9, 0xc8,
+		    0xf7, 0x99, 0x2e, 0x4a, 0xd0, 0xb8, 0x2c, 0xdc,
+		    0x93, 0xf5, 0x9e, 0x33, 0x78, 0xd1, 0x37, 0xc3,
+		    0x66, 0xd7, 0x5e, 0xbc, 0x44, 0xbf, 0x53, 0xa5,
+		    0xbc, 0xc4, 0xcb, 0x7b, 0x3a, 0x8e, 0x7f, 0x02,
+		    0xbd, 0xbb, 0xe7, 0xca, 0xa6, 0x6c, 0x6b, 0x93,
+		    0x21, 0x93, 0x10, 0x61, 0xe7, 0x69, 0xd0, 0x78,
+		    0xf3, 0x07, 0x5a, 0x1a, 0x8f, 0x73, 0xaa, 0xb1,
+		    0x4e, 0xd3, 0xda, 0x4f, 0xf3, 0x32, 0xe1, 0x66,
+		    0x3e, 0x6c, 0xc6, 0x13, 0xba, 0x06, 0x5b, 0xfc,
+		    0x6a, 0xe5, 0x6f, 0x60, 0xfb, 0x07, 0x40, 0xb0,
+		    0x8c, 0x9d, 0x84, 0x43, 0x6b, 0xc1, 0xf7, 0x8d,
+		    0x8d, 0x31, 0xf7, 0x7a, 0x39, 0x4d, 0x8f, 0x9a,
+		    0xeb },
+	.ilen	= 145,
+	.result	= { 0x33, 0x2f, 0x94, 0xc1, 0xa4, 0xef, 0xcc, 0x2a,
+		    0x5b, 0xa6, 0xe5, 0x8f, 0x1d, 0x40, 0xf0, 0x92,
+		    0x3c, 0xd9, 0x24, 0x11, 0xa9, 0x71, 0xf9, 0x37,
+		    0x14, 0x99, 0xfa, 0xbe, 0xe6, 0x80, 0xde, 0x50,
+		    0xc9, 0x96, 0xd4, 0xb0, 0xec, 0x9e, 0x17, 0xec,
+		    0xd2, 0x5e, 0x72, 0x99, 0xfc, 0x0a, 0xe1, 0xcb,
+		    0x48, 0xd2, 0x85, 0xdd, 0x2f, 0x90, 0xe0, 0x66,
+		    0x3b, 0xe6, 0x20, 0x74, 0xbe, 0x23, 0x8f, 0xcb,
+		    0xb4, 0xe4, 0xda, 0x48, 0x40, 0xa6, 0xd1, 0x1b,
+		    0xc7, 0x42, 0xce, 0x2f, 0x0c, 0xa6, 0x85, 0x6e,
+		    0x87, 0x37, 0x03, 0xb1, 0x7c, 0x25, 0x96, 0xa3,
+		    0x05, 0xd8, 0xb0, 0xf4, 0xed, 0xea, 0xc2, 0xf0,
+		    0x31, 0x98, 0x6c, 0xd1, 0x14, 0x25, 0xc0, 0xcb,
+		    0x01, 0x74, 0xd0, 0x82, 0xf4, 0x36, 0xf5, 0x41,
+		    0xd5, 0xdc, 0xca, 0xc5, 0xbb, 0x98, 0xfe, 0xfc,
+		    0x69, 0x21, 0x70, 0xd8, 0xa4, 0x4b, 0xc8, 0xde,
+		    0x8f }
+}, {
+	.key	= { 0x8d, 0xb8, 0x91, 0x48, 0xf0, 0xe7, 0x0a, 0xbd,
+		    0xf9, 0x3f, 0xcd, 0xd9, 0xa0, 0x1e, 0x42, 0x4c,
+		    0xe7, 0xde, 0x25, 0x3d, 0xa3, 0xd7, 0x05, 0x80,
+		    0x8d, 0xf2, 0x82, 0xac, 0x44, 0x16, 0x51, 0x01 },
+	.nonce	= { 0xde, 0x7b, 0xef, 0xc3, 0x65, 0x1b, 0x68, 0xb0 },
+	.nlen	= 8,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x85, 0x04, 0xc2, 0xed, 0x8d, 0xfd, 0x97, 0x5c,
+		    0xd2, 0xb7, 0xe2, 0xc1, 0x6b, 0xa3, 0xba, 0xf8,
+		    0xc9, 0x50, 0xc3, 0xc6, 0xa5, 0xe3, 0xa4, 0x7c,
+		    0xc3, 0x23, 0x49, 0x5e, 0xa9, 0xb9, 0x32, 0xeb,
+		    0x8a, 0x7c, 0xca, 0xe5, 0xec, 0xfb, 0x7c, 0xc0,
+		    0xcb, 0x7d, 0xdc, 0x2c, 0x9d, 0x92, 0x55, 0x21,
+		    0x0a, 0xc8, 0x43, 0x63, 0x59, 0x0a, 0x31, 0x70,
+		    0x82, 0x67, 0x41, 0x03, 0xf8, 0xdf, 0xf2, 0xac,
+		    0xa7, 0x02, 0xd4, 0xd5, 0x8a, 0x2d, 0xc8, 0x99,
+		    0x19, 0x66, 0xd0, 0xf6, 0x88, 0x2c, 0x77, 0xd9,
+		    0xd4, 0x0d, 0x6c, 0xbd, 0x98, 0xde, 0xe7, 0x7f,
+		    0xad, 0x7e, 0x8a, 0xfb, 0xe9, 0x4b, 0xe5, 0xf7,
+		    0xe5, 0x50, 0xa0, 0x90, 0x3f, 0xd6, 0x22, 0x53,
+		    0xe3, 0xfe, 0x1b, 0xcc, 0x79, 0x3b, 0xec, 0x12,
+		    0x47, 0x52, 0xa7, 0xd6, 0x04, 0xe3, 0x52, 0xe6,
+		    0x93, 0x90, 0x91, 0x32, 0x73, 0x79, 0xb8, 0xd0,
+		    0x31, 0xde, 0x1f, 0x9f, 0x2f, 0x05, 0x38, 0x54,
+		    0x2f, 0x35, 0x04, 0x39, 0xe0, 0xa7, 0xba, 0xc6,
+		    0x52, 0xf6, 0x37, 0x65, 0x4c, 0x07, 0xa9, 0x7e,
+		    0xb3, 0x21, 0x6f, 0x74, 0x8c, 0xc9, 0xde, 0xdb,
+		    0x65, 0x1b, 0x9b, 0xaa, 0x60, 0xb1, 0x03, 0x30,
+		    0x6b, 0xb2, 0x03, 0xc4, 0x1c, 0x04, 0xf8, 0x0f,
+		    0x64, 0xaf, 0x46, 0xe4, 0x65, 0x99, 0x49, 0xe2,
+		    0xea, 0xce, 0x78, 0x00, 0xd8, 0x8b, 0xd5, 0x2e,
+		    0xcf, 0xfc, 0x40, 0x49, 0xe8, 0x58, 0xdc, 0x34,
+		    0x9c, 0x8c, 0x61, 0xbf, 0x0a, 0x8e, 0xec, 0x39,
+		    0xa9, 0x30, 0x05, 0x5a, 0xd2, 0x56, 0x01, 0xc7,
+		    0xda, 0x8f, 0x4e, 0xbb, 0x43, 0xa3, 0x3a, 0xf9,
+		    0x15, 0x2a, 0xd0, 0xa0, 0x7a, 0x87, 0x34, 0x82,
+		    0xfe, 0x8a, 0xd1, 0x2d, 0x5e, 0xc7, 0xbf, 0x04,
+		    0x53, 0x5f, 0x3b, 0x36, 0xd4, 0x25, 0x5c, 0x34,
+		    0x7a, 0x8d, 0xd5, 0x05, 0xce, 0x72, 0xca, 0xef,
+		    0x7a, 0x4b, 0xbc, 0xb0, 0x10, 0x5c, 0x96, 0x42,
+		    0x3a, 0x00, 0x98, 0xcd, 0x15, 0xe8, 0xb7, 0x53 },
+	.ilen	= 272,
+	.result	= { 0x9b, 0x18, 0xdb, 0xdd, 0x9a, 0x0f, 0x3e, 0xa5,
+		    0x15, 0x17, 0xde, 0xdf, 0x08, 0x9d, 0x65, 0x0a,
+		    0x67, 0x30, 0x12, 0xe2, 0x34, 0x77, 0x4b, 0xc1,
+		    0xd9, 0xc6, 0x1f, 0xab, 0xc6, 0x18, 0x50, 0x17,
+		    0xa7, 0x9d, 0x3c, 0xa6, 0xc5, 0x35, 0x8c, 0x1c,
+		    0xc0, 0xa1, 0x7c, 0x9f, 0x03, 0x89, 0xca, 0xe1,
+		    0xe6, 0xe9, 0xd4, 0xd3, 0x88, 0xdb, 0xb4, 0x51,
+		    0x9d, 0xec, 0xb4, 0xfc, 0x52, 0xee, 0x6d, 0xf1,
+		    0x75, 0x42, 0xc6, 0xfd, 0xbd, 0x7a, 0x8e, 0x86,
+		    0xfc, 0x44, 0xb3, 0x4f, 0xf3, 0xea, 0x67, 0x5a,
+		    0x41, 0x13, 0xba, 0xb0, 0xdc, 0xe1, 0xd3, 0x2a,
+		    0x7c, 0x22, 0xb3, 0xca, 0xac, 0x6a, 0x37, 0x98,
+		    0x3e, 0x1d, 0x40, 0x97, 0xf7, 0x9b, 0x1d, 0x36,
+		    0x6b, 0xb3, 0x28, 0xbd, 0x60, 0x82, 0x47, 0x34,
+		    0xaa, 0x2f, 0x7d, 0xe9, 0xa8, 0x70, 0x81, 0x57,
+		    0xd4, 0xb9, 0x77, 0x0a, 0x9d, 0x29, 0xa7, 0x84,
+		    0x52, 0x4f, 0xc2, 0x4a, 0x40, 0x3b, 0x3c, 0xd4,
+		    0xc9, 0x2a, 0xdb, 0x4a, 0x53, 0xc4, 0xbe, 0x80,
+		    0xe9, 0x51, 0x7f, 0x8f, 0xc7, 0xa2, 0xce, 0x82,
+		    0x5c, 0x91, 0x1e, 0x74, 0xd9, 0xd0, 0xbd, 0xd5,
+		    0xf3, 0xfd, 0xda, 0x4d, 0x25, 0xb4, 0xbb, 0x2d,
+		    0xac, 0x2f, 0x3d, 0x71, 0x85, 0x7b, 0xcf, 0x3c,
+		    0x7b, 0x3e, 0x0e, 0x22, 0x78, 0x0c, 0x29, 0xbf,
+		    0xe4, 0xf4, 0x57, 0xb3, 0xcb, 0x49, 0xa0, 0xfc,
+		    0x1e, 0x05, 0x4e, 0x16, 0xbc, 0xd5, 0xa8, 0xa3,
+		    0xee, 0x05, 0x35, 0xc6, 0x7c, 0xab, 0x60, 0x14,
+		    0x55, 0x1a, 0x8e, 0xc5, 0x88, 0x5d, 0xd5, 0x81,
+		    0xc2, 0x81, 0xa5, 0xc4, 0x60, 0xdb, 0xaf, 0x77,
+		    0x91, 0xe1, 0xce, 0xa2, 0x7e, 0x7f, 0x42, 0xe3,
+		    0xb0, 0x13, 0x1c, 0x1f, 0x25, 0x60, 0x21, 0xe2,
+		    0x40, 0x5f, 0x99, 0xb7, 0x73, 0xec, 0x9b, 0x2b,
+		    0xf0, 0x65, 0x11, 0xc8, 0xd0, 0x0a, 0x9f, 0xd3 }
+}, {
+	.key	= { 0xf2, 0xaa, 0x4f, 0x99, 0xfd, 0x3e, 0xa8, 0x53,
+		    0xc1, 0x44, 0xe9, 0x81, 0x18, 0xdc, 0xf5, 0xf0,
+		    0x3e, 0x44, 0x15, 0x59, 0xe0, 0xc5, 0x44, 0x86,
+		    0xc3, 0x91, 0xa8, 0x75, 0xc0, 0x12, 0x46, 0xba },
+	.nonce	= { 0x0e, 0x0d, 0x57, 0xbb, 0x7b, 0x40, 0x54, 0x02 },
+	.nlen	= 8,
+	.assoc	= "",
+	.alen	= 0,
+	.input	= { 0x14, 0xf6, 0x41, 0x37, 0xa6, 0xd4, 0x27, 0xcd,
+		    0xdb, 0x06, 0x3e, 0x9a, 0x4e, 0xab, 0xd5, 0xb1,
+		    0x1e, 0x6b, 0xd2, 0xbc, 0x11, 0xf4, 0x28, 0x93,
+		    0x63, 0x54, 0xef, 0xbb, 0x5e, 0x1d, 0x3a, 0x1d,
+		    0x37, 0x3c, 0x0a, 0x6c, 0x1e, 0xc2, 0xd1, 0x2c,
+		    0xb5, 0xa3, 0xb5, 0x7b, 0xb8, 0x8f, 0x25, 0xa6,
+		    0x1b, 0x61, 0x1c, 0xec, 0x28, 0x58, 0x26, 0xa4,
+		    0xa8, 0x33, 0x28, 0x25, 0x5c, 0x45, 0x05, 0xe5,
+		    0x6c, 0x99, 0xe5, 0x45, 0xc4, 0xa2, 0x03, 0x84,
+		    0x03, 0x73, 0x1e, 0x8c, 0x49, 0xac, 0x20, 0xdd,
+		    0x8d, 0xb3, 0xc4, 0xf5, 0xe7, 0x4f, 0xf1, 0xed,
+		    0xa1, 0x98, 0xde, 0xa4, 0x96, 0xdd, 0x2f, 0xab,
+		    0xab, 0x97, 0xcf, 0x3e, 0xd2, 0x9e, 0xb8, 0x13,
+		    0x07, 0x28, 0x29, 0x19, 0xaf, 0xfd, 0xf2, 0x49,
+		    0x43, 0xea, 0x49, 0x26, 0x91, 0xc1, 0x07, 0xd6,
+		    0xbb, 0x81, 0x75, 0x35, 0x0d, 0x24, 0x7f, 0xc8,
+		    0xda, 0xd4, 0xb7, 0xeb, 0xe8, 0x5c, 0x09, 0xa2,
+		    0x2f, 0xdc, 0x28, 0x7d, 0x3a, 0x03, 0xfa, 0x94,
+		    0xb5, 0x1d, 0x17, 0x99, 0x36, 0xc3, 0x1c, 0x18,
+		    0x34, 0xe3, 0x9f, 0xf5, 0x55, 0x7c, 0xb0, 0x60,
+		    0x9d, 0xff, 0xac, 0xd4, 0x61, 0xf2, 0xad, 0xf8,
+		    0xce, 0xc7, 0xbe, 0x5c, 0xd2, 0x95, 0xa8, 0x4b,
+		    0x77, 0x13, 0x19, 0x59, 0x26, 0xc9, 0xb7, 0x8f,
+		    0x6a, 0xcb, 0x2d, 0x37, 0x91, 0xea, 0x92, 0x9c,
+		    0x94, 0x5b, 0xda, 0x0b, 0xce, 0xfe, 0x30, 0x20,
+		    0xf8, 0x51, 0xad, 0xf2, 0xbe, 0xe7, 0xc7, 0xff,
+		    0xb3, 0x33, 0x91, 0x6a, 0xc9, 0x1a, 0x41, 0xc9,
+		    0x0f, 0xf3, 0x10, 0x0e, 0xfd, 0x53, 0xff, 0x6c,
+		    0x16, 0x52, 0xd9, 0xf3, 0xf7, 0x98, 0x2e, 0xc9,
+		    0x07, 0x31, 0x2c, 0x0c, 0x72, 0xd7, 0xc5, 0xc6,
+		    0x08, 0x2a, 0x7b, 0xda, 0xbd, 0x7e, 0x02, 0xea,
+		    0x1a, 0xbb, 0xf2, 0x04, 0x27, 0x61, 0x28, 0x8e,
+		    0xf5, 0x04, 0x03, 0x1f, 0x4c, 0x07, 0x55, 0x82,
+		    0xec, 0x1e, 0xd7, 0x8b, 0x2f, 0x65, 0x56, 0xd1,
+		    0xd9, 0x1e, 0x3c, 0xe9, 0x1f, 0x5e, 0x98, 0x70,
+		    0x38, 0x4a, 0x8c, 0x49, 0xc5, 0x43, 0xa0, 0xa1,
+		    0x8b, 0x74, 0x9d, 0x4c, 0x62, 0x0d, 0x10, 0x0c,
+		    0xf4, 0x6c, 0x8f, 0xe0, 0xaa, 0x9a, 0x8d, 0xb7,
+		    0xe0, 0xbe, 0x4c, 0x87, 0xf1, 0x98, 0x2f, 0xcc,
+		    0xed, 0xc0, 0x52, 0x29, 0xdc, 0x83, 0xf8, 0xfc,
+		    0x2c, 0x0e, 0xa8, 0x51, 0x4d, 0x80, 0x0d, 0xa3,
+		    0xfe, 0xd8, 0x37, 0xe7, 0x41, 0x24, 0xfc, 0xfb,
+		    0x75, 0xe3, 0x71, 0x7b, 0x57, 0x45, 0xf5, 0x97,
+		    0x73, 0x65, 0x63, 0x14, 0x74, 0xb8, 0x82, 0x9f,
+		    0xf8, 0x60, 0x2f, 0x8a, 0xf2, 0x4e, 0xf1, 0x39,
+		    0xda, 0x33, 0x91, 0xf8, 0x36, 0xe0, 0x8d, 0x3f,
+		    0x1f, 0x3b, 0x56, 0xdc, 0xa0, 0x8f, 0x3c, 0x9d,
+		    0x71, 0x52, 0xa7, 0xb8, 0xc0, 0xa5, 0xc6, 0xa2,
+		    0x73, 0xda, 0xf4, 0x4b, 0x74, 0x5b, 0x00, 0x3d,
+		    0x99, 0xd7, 0x96, 0xba, 0xe6, 0xe1, 0xa6, 0x96,
+		    0x38, 0xad, 0xb3, 0xc0, 0xd2, 0xba, 0x91, 0x6b,
+		    0xf9, 0x19, 0xdd, 0x3b, 0xbe, 0xbe, 0x9c, 0x20,
+		    0x50, 0xba, 0xa1, 0xd0, 0xce, 0x11, 0xbd, 0x95,
+		    0xd8, 0xd1, 0xdd, 0x33, 0x85, 0x74, 0xdc, 0xdb,
+		    0x66, 0x76, 0x44, 0xdc, 0x03, 0x74, 0x48, 0x35,
+		    0x98, 0xb1, 0x18, 0x47, 0x94, 0x7d, 0xff, 0x62,
+		    0xe4, 0x58, 0x78, 0xab, 0xed, 0x95, 0x36, 0xd9,
+		    0x84, 0x91, 0x82, 0x64, 0x41, 0xbb, 0x58, 0xe6,
+		    0x1c, 0x20, 0x6d, 0x15, 0x6b, 0x13, 0x96, 0xe8,
+		    0x35, 0x7f, 0xdc, 0x40, 0x2c, 0xe9, 0xbc, 0x8a,
+		    0x4f, 0x92, 0xec, 0x06, 0x2d, 0x50, 0xdf, 0x93,
+		    0x5d, 0x65, 0x5a, 0xa8, 0xfc, 0x20, 0x50, 0x14,
+		    0xa9, 0x8a, 0x7e, 0x1d, 0x08, 0x1f, 0xe2, 0x99,
+		    0xd0, 0xbe, 0xfb, 0x3a, 0x21, 0x9d, 0xad, 0x86,
+		    0x54, 0xfd, 0x0d, 0x98, 0x1c, 0x5a, 0x6f, 0x1f,
+		    0x9a, 0x40, 0xcd, 0xa2, 0xff, 0x6a, 0xf1, 0x54 },
+	.ilen	= 528,
+	.result	= { 0xc3, 0x09, 0x94, 0x62, 0xe6, 0x46, 0x2e, 0x10,
+		    0xbe, 0x00, 0xe4, 0xfc, 0xf3, 0x40, 0xa3, 0xe2,
+		    0x0f, 0xc2, 0x8b, 0x28, 0xdc, 0xba, 0xb4, 0x3c,
+		    0xe4, 0x21, 0x58, 0x61, 0xcd, 0x8b, 0xcd, 0xfb,
+		    0xac, 0x94, 0xa1, 0x45, 0xf5, 0x1c, 0xe1, 0x12,
+		    0xe0, 0x3b, 0x67, 0x21, 0x54, 0x5e, 0x8c, 0xaa,
+		    0xcf, 0xdb, 0xb4, 0x51, 0xd4, 0x13, 0xda, 0xe6,
+		    0x83, 0x89, 0xb6, 0x92, 0xe9, 0x21, 0x76, 0xa4,
+		    0x93, 0x7d, 0x0e, 0xfd, 0x96, 0x36, 0x03, 0x91,
+		    0x43, 0x5c, 0x92, 0x49, 0x62, 0x61, 0x7b, 0xeb,
+		    0x43, 0x89, 0xb8, 0x12, 0x20, 0x43, 0xd4, 0x47,
+		    0x06, 0x84, 0xee, 0x47, 0xe9, 0x8a, 0x73, 0x15,
+		    0x0f, 0x72, 0xcf, 0xed, 0xce, 0x96, 0xb2, 0x7f,
+		    0x21, 0x45, 0x76, 0xeb, 0x26, 0x28, 0x83, 0x6a,
+		    0xad, 0xaa, 0xa6, 0x81, 0xd8, 0x55, 0xb1, 0xa3,
+		    0x85, 0xb3, 0x0c, 0xdf, 0xf1, 0x69, 0x2d, 0x97,
+		    0x05, 0x2a, 0xbc, 0x7c, 0x7b, 0x25, 0xf8, 0x80,
+		    0x9d, 0x39, 0x25, 0xf3, 0x62, 0xf0, 0x66, 0x5e,
+		    0xf4, 0xa0, 0xcf, 0xd8, 0xfd, 0x4f, 0xb1, 0x1f,
+		    0x60, 0x3a, 0x08, 0x47, 0xaf, 0xe1, 0xf6, 0x10,
+		    0x77, 0x09, 0xa7, 0x27, 0x8f, 0x9a, 0x97, 0x5a,
+		    0x26, 0xfa, 0xfe, 0x41, 0x32, 0x83, 0x10, 0xe0,
+		    0x1d, 0xbf, 0x64, 0x0d, 0xf4, 0x1c, 0x32, 0x35,
+		    0xe5, 0x1b, 0x36, 0xef, 0xd4, 0x4a, 0x93, 0x4d,
+		    0x00, 0x7c, 0xec, 0x02, 0x07, 0x8b, 0x5d, 0x7d,
+		    0x1b, 0x0e, 0xd1, 0xa6, 0xa5, 0x5d, 0x7d, 0x57,
+		    0x88, 0xa8, 0xcc, 0x81, 0xb4, 0x86, 0x4e, 0xb4,
+		    0x40, 0xe9, 0x1d, 0xc3, 0xb1, 0x24, 0x3e, 0x7f,
+		    0xcc, 0x8a, 0x24, 0x9b, 0xdf, 0x6d, 0xf0, 0x39,
+		    0x69, 0x3e, 0x4c, 0xc0, 0x96, 0xe4, 0x13, 0xda,
+		    0x90, 0xda, 0xf4, 0x95, 0x66, 0x8b, 0x17, 0x17,
+		    0xfe, 0x39, 0x43, 0x25, 0xaa, 0xda, 0xa0, 0x43,
+		    0x3c, 0xb1, 0x41, 0x02, 0xa3, 0xf0, 0xa7, 0x19,
+		    0x59, 0xbc, 0x1d, 0x7d, 0x6c, 0x6d, 0x91, 0x09,
+		    0x5c, 0xb7, 0x5b, 0x01, 0xd1, 0x6f, 0x17, 0x21,
+		    0x97, 0xbf, 0x89, 0x71, 0xa5, 0xb0, 0x6e, 0x07,
+		    0x45, 0xfd, 0x9d, 0xea, 0x07, 0xf6, 0x7a, 0x9f,
+		    0x10, 0x18, 0x22, 0x30, 0x73, 0xac, 0xd4, 0x6b,
+		    0x72, 0x44, 0xed, 0xd9, 0x19, 0x9b, 0x2d, 0x4a,
+		    0x41, 0xdd, 0xd1, 0x85, 0x5e, 0x37, 0x19, 0xed,
+		    0xd2, 0x15, 0x8f, 0x5e, 0x91, 0xdb, 0x33, 0xf2,
+		    0xe4, 0xdb, 0xff, 0x98, 0xfb, 0xa3, 0xb5, 0xca,
+		    0x21, 0x69, 0x08, 0xe7, 0x8a, 0xdf, 0x90, 0xff,
+		    0x3e, 0xe9, 0x20, 0x86, 0x3c, 0xe9, 0xfc, 0x0b,
+		    0xfe, 0x5c, 0x61, 0xaa, 0x13, 0x92, 0x7f, 0x7b,
+		    0xec, 0xe0, 0x6d, 0xa8, 0x23, 0x22, 0xf6, 0x6b,
+		    0x77, 0xc4, 0xfe, 0x40, 0x07, 0x3b, 0xb6, 0xf6,
+		    0x8e, 0x5f, 0xd4, 0xb9, 0xb7, 0x0f, 0x21, 0x04,
+		    0xef, 0x83, 0x63, 0x91, 0x69, 0x40, 0xa3, 0x48,
+		    0x5c, 0xd2, 0x60, 0xf9, 0x4f, 0x6c, 0x47, 0x8b,
+		    0x3b, 0xb1, 0x9f, 0x8e, 0xee, 0x16, 0x8a, 0x13,
+		    0xfc, 0x46, 0x17, 0xc3, 0xc3, 0x32, 0x56, 0xf8,
+		    0x3c, 0x85, 0x3a, 0xb6, 0x3e, 0xaa, 0x89, 0x4f,
+		    0xb3, 0xdf, 0x38, 0xfd, 0xf1, 0xe4, 0x3a, 0xc0,
+		    0xe6, 0x58, 0xb5, 0x8f, 0xc5, 0x29, 0xa2, 0x92,
+		    0x4a, 0xb6, 0xa0, 0x34, 0x7f, 0xab, 0xb5, 0x8a,
+		    0x90, 0xa1, 0xdb, 0x4d, 0xca, 0xb6, 0x2c, 0x41,
+		    0x3c, 0xf7, 0x2b, 0x21, 0xc3, 0xfd, 0xf4, 0x17,
+		    0x5c, 0xb5, 0x33, 0x17, 0x68, 0x2b, 0x08, 0x30,
+		    0xf3, 0xf7, 0x30, 0x3c, 0x96, 0xe6, 0x6a, 0x20,
+		    0x97, 0xe7, 0x4d, 0x10, 0x5f, 0x47, 0x5f, 0x49,
+		    0x96, 0x09, 0xf0, 0x27, 0x91, 0xc8, 0xf8, 0x5a,
+		    0x2e, 0x79, 0xb5, 0xe2, 0xb8, 0xe8, 0xb9, 0x7b,
+		    0xd5, 0x10, 0xcb, 0xff, 0x5d, 0x14, 0x73, 0xf3 }
+}, {
+	.key	= { 0xea, 0xbc, 0x56, 0x99, 0xe3, 0x50, 0xff, 0xc5,
+		    0xcc, 0x1a, 0xd7, 0xc1, 0x57, 0x72, 0xea, 0x86,
+		    0x5b, 0x89, 0x88, 0x61, 0x3d, 0x2f, 0x9b, 0xb2,
+		    0xe7, 0x9c, 0xec, 0x74, 0x6e, 0x3e, 0xf4, 0x3b },
+	.nonce	= { 0xef, 0x2d, 0x63, 0xee, 0x6b, 0x80, 0x8b, 0x78 },
+	.nlen	= 8,
+	.assoc	= { 0x5a, 0x27, 0xff, 0xeb, 0xdf, 0x84, 0xb2, 0x9e,
+		    0xef },
+	.alen	= 9,
+	.input	= { 0xfd, 0x81, 0x8d, 0xd0, 0x3d, 0xb4, 0xd5, 0xdf,
+		    0xd3, 0x42, 0x47, 0x5a, 0x6d, 0x19, 0x27, 0x66,
+		    0x4b, 0x2e, 0x0c, 0x27, 0x9c, 0x96, 0x4c, 0x72,
+		    0x02, 0xa3, 0x65, 0xc3, 0xb3, 0x6f, 0x2e, 0xbd,
+		    0x63, 0x8a, 0x4a, 0x5d, 0x29, 0xa2, 0xd0, 0x28,
+		    0x48, 0xc5, 0x3d, 0x98, 0xa3, 0xbc, 0xe0, 0xbe,
+		    0x3b, 0x3f, 0xe6, 0x8a, 0xa4, 0x7f, 0x53, 0x06,
+		    0xfa, 0x7f, 0x27, 0x76, 0x72, 0x31, 0xa1, 0xf5,
+		    0xd6, 0x0c, 0x52, 0x47, 0xba, 0xcd, 0x4f, 0xd7,
+		    0xeb, 0x05, 0x48, 0x0d, 0x7c, 0x35, 0x4a, 0x09,
+		    0xc9, 0x76, 0x71, 0x02, 0xa3, 0xfb, 0xb7, 0x1a,
+		    0x65, 0xb7, 0xed, 0x98, 0xc6, 0x30, 0x8a, 0x00,
+		    0xae, 0xa1, 0x31, 0xe5, 0xb5, 0x9e, 0x6d, 0x62,
+		    0xda, 0xda, 0x07, 0x0f, 0x38, 0x38, 0xd3, 0xcb,
+		    0xc1, 0xb0, 0xad, 0xec, 0x72, 0xec, 0xb1, 0xa2,
+		    0x7b, 0x59, 0xf3, 0x3d, 0x2b, 0xef, 0xcd, 0x28,
+		    0x5b, 0x83, 0xcc, 0x18, 0x91, 0x88, 0xb0, 0x2e,
+		    0xf9, 0x29, 0x31, 0x18, 0xf9, 0x4e, 0xe9, 0x0a,
+		    0x91, 0x92, 0x9f, 0xae, 0x2d, 0xad, 0xf4, 0xe6,
+		    0x1a, 0xe2, 0xa4, 0xee, 0x47, 0x15, 0xbf, 0x83,
+		    0x6e, 0xd7, 0x72, 0x12, 0x3b, 0x2d, 0x24, 0xe9,
+		    0xb2, 0x55, 0xcb, 0x3c, 0x10, 0xf0, 0x24, 0x8a,
+		    0x4a, 0x02, 0xea, 0x90, 0x25, 0xf0, 0xb4, 0x79,
+		    0x3a, 0xef, 0x6e, 0xf5, 0x52, 0xdf, 0xb0, 0x0a,
+		    0xcd, 0x24, 0x1c, 0xd3, 0x2e, 0x22, 0x74, 0xea,
+		    0x21, 0x6f, 0xe9, 0xbd, 0xc8, 0x3e, 0x36, 0x5b,
+		    0x19, 0xf1, 0xca, 0x99, 0x0a, 0xb4, 0xa7, 0x52,
+		    0x1a, 0x4e, 0xf2, 0xad, 0x8d, 0x56, 0x85, 0xbb,
+		    0x64, 0x89, 0xba, 0x26, 0xf9, 0xc7, 0xe1, 0x89,
+		    0x19, 0x22, 0x77, 0xc3, 0xa8, 0xfc, 0xff, 0xad,
+		    0xfe, 0xb9, 0x48, 0xae, 0x12, 0x30, 0x9f, 0x19,
+		    0xfb, 0x1b, 0xef, 0x14, 0x87, 0x8a, 0x78, 0x71,
+		    0xf3, 0xf4, 0xb7, 0x00, 0x9c, 0x1d, 0xb5, 0x3d,
+		    0x49, 0x00, 0x0c, 0x06, 0xd4, 0x50, 0xf9, 0x54,
+		    0x45, 0xb2, 0x5b, 0x43, 0xdb, 0x6d, 0xcf, 0x1a,
+		    0xe9, 0x7a, 0x7a, 0xcf, 0xfc, 0x8a, 0x4e, 0x4d,
+		    0x0b, 0x07, 0x63, 0x28, 0xd8, 0xe7, 0x08, 0x95,
+		    0xdf, 0xa6, 0x72, 0x93, 0x2e, 0xbb, 0xa0, 0x42,
+		    0x89, 0x16, 0xf1, 0xd9, 0x0c, 0xf9, 0xa1, 0x16,
+		    0xfd, 0xd9, 0x03, 0xb4, 0x3b, 0x8a, 0xf5, 0xf6,
+		    0xe7, 0x6b, 0x2e, 0x8e, 0x4c, 0x3d, 0xe2, 0xaf,
+		    0x08, 0x45, 0x03, 0xff, 0x09, 0xb6, 0xeb, 0x2d,
+		    0xc6, 0x1b, 0x88, 0x94, 0xac, 0x3e, 0xf1, 0x9f,
+		    0x0e, 0x0e, 0x2b, 0xd5, 0x00, 0x4d, 0x3f, 0x3b,
+		    0x53, 0xae, 0xaf, 0x1c, 0x33, 0x5f, 0x55, 0x6e,
+		    0x8d, 0xaf, 0x05, 0x7a, 0x10, 0x34, 0xc9, 0xf4,
+		    0x66, 0xcb, 0x62, 0x12, 0xa6, 0xee, 0xe8, 0x1c,
+		    0x5d, 0x12, 0x86, 0xdb, 0x6f, 0x1c, 0x33, 0xc4,
+		    0x1c, 0xda, 0x82, 0x2d, 0x3b, 0x59, 0xfe, 0xb1,
+		    0xa4, 0x59, 0x41, 0x86, 0xd0, 0xef, 0xae, 0xfb,
+		    0xda, 0x6d, 0x11, 0xb8, 0xca, 0xe9, 0x6e, 0xff,
+		    0xf7, 0xa9, 0xd9, 0x70, 0x30, 0xfc, 0x53, 0xe2,
+		    0xd7, 0xa2, 0x4e, 0xc7, 0x91, 0xd9, 0x07, 0x06,
+		    0xaa, 0xdd, 0xb0, 0x59, 0x28, 0x1d, 0x00, 0x66,
+		    0xc5, 0x54, 0xc2, 0xfc, 0x06, 0xda, 0x05, 0x90,
+		    0x52, 0x1d, 0x37, 0x66, 0xee, 0xf0, 0xb2, 0x55,
+		    0x8a, 0x5d, 0xd2, 0x38, 0x86, 0x94, 0x9b, 0xfc,
+		    0x10, 0x4c, 0xa1, 0xb9, 0x64, 0x3e, 0x44, 0xb8,
+		    0x5f, 0xb0, 0x0c, 0xec, 0xe0, 0xc9, 0xe5, 0x62,
+		    0x75, 0x3f, 0x09, 0xd5, 0xf5, 0xd9, 0x26, 0xba,
+		    0x9e, 0xd2, 0xf4, 0xb9, 0x48, 0x0a, 0xbc, 0xa2,
+		    0xd6, 0x7c, 0x36, 0x11, 0x7d, 0x26, 0x81, 0x89,
+		    0xcf, 0xa4, 0xad, 0x73, 0x0e, 0xee, 0xcc, 0x06,
+		    0xa9, 0xdb, 0xb1, 0xfd, 0xfb, 0x09, 0x7f, 0x90,
+		    0x42, 0x37, 0x2f, 0xe1, 0x9c, 0x0f, 0x6f, 0xcf,
+		    0x43, 0xb5, 0xd9, 0x90, 0xe1, 0x85, 0xf5, 0xa8,
+		    0xae },
+	.ilen	= 529,
+	.result	= { 0xe6, 0xc3, 0xdb, 0x63, 0x55, 0x15, 0xe3, 0x5b,
+		    0xb7, 0x4b, 0x27, 0x8b, 0x5a, 0xdd, 0xc2, 0xe8,
+		    0x3a, 0x6b, 0xd7, 0x81, 0x96, 0x35, 0x97, 0xca,
+		    0xd7, 0x68, 0xe8, 0xef, 0xce, 0xab, 0xda, 0x09,
+		    0x6e, 0xd6, 0x8e, 0xcb, 0x55, 0xb5, 0xe1, 0xe5,
+		    0x57, 0xfd, 0xc4, 0xe3, 0xe0, 0x18, 0x4f, 0x85,
+		    0xf5, 0x3f, 0x7e, 0x4b, 0x88, 0xc9, 0x52, 0x44,
+		    0x0f, 0xea, 0xaf, 0x1f, 0x71, 0x48, 0x9f, 0x97,
+		    0x6d, 0xb9, 0x6f, 0x00, 0xa6, 0xde, 0x2b, 0x77,
+		    0x8b, 0x15, 0xad, 0x10, 0xa0, 0x2b, 0x7b, 0x41,
+		    0x90, 0x03, 0x2d, 0x69, 0xae, 0xcc, 0x77, 0x7c,
+		    0xa5, 0x9d, 0x29, 0x22, 0xc2, 0xea, 0xb4, 0x00,
+		    0x1a, 0xd2, 0x7a, 0x98, 0x8a, 0xf9, 0xf7, 0x82,
+		    0xb0, 0xab, 0xd8, 0xa6, 0x94, 0x8d, 0x58, 0x2f,
+		    0x01, 0x9e, 0x00, 0x20, 0xfc, 0x49, 0xdc, 0x0e,
+		    0x03, 0xe8, 0x45, 0x10, 0xd6, 0xa8, 0xda, 0x55,
+		    0x10, 0x9a, 0xdf, 0x67, 0x22, 0x8b, 0x43, 0xab,
+		    0x00, 0xbb, 0x02, 0xc8, 0xdd, 0x7b, 0x97, 0x17,
+		    0xd7, 0x1d, 0x9e, 0x02, 0x5e, 0x48, 0xde, 0x8e,
+		    0xcf, 0x99, 0x07, 0x95, 0x92, 0x3c, 0x5f, 0x9f,
+		    0xc5, 0x8a, 0xc0, 0x23, 0xaa, 0xd5, 0x8c, 0x82,
+		    0x6e, 0x16, 0x92, 0xb1, 0x12, 0x17, 0x07, 0xc3,
+		    0xfb, 0x36, 0xf5, 0x6c, 0x35, 0xd6, 0x06, 0x1f,
+		    0x9f, 0xa7, 0x94, 0xa2, 0x38, 0x63, 0x9c, 0xb0,
+		    0x71, 0xb3, 0xa5, 0xd2, 0xd8, 0xba, 0x9f, 0x08,
+		    0x01, 0xb3, 0xff, 0x04, 0x97, 0x73, 0x45, 0x1b,
+		    0xd5, 0xa9, 0x9c, 0x80, 0xaf, 0x04, 0x9a, 0x85,
+		    0xdb, 0x32, 0x5b, 0x5d, 0x1a, 0xc1, 0x36, 0x28,
+		    0x10, 0x79, 0xf1, 0x3c, 0xbf, 0x1a, 0x41, 0x5c,
+		    0x4e, 0xdf, 0xb2, 0x7c, 0x79, 0x3b, 0x7a, 0x62,
+		    0x3d, 0x4b, 0xc9, 0x9b, 0x2a, 0x2e, 0x7c, 0xa2,
+		    0xb1, 0x11, 0x98, 0xa7, 0x34, 0x1a, 0x00, 0xf3,
+		    0xd1, 0xbc, 0x18, 0x22, 0xba, 0x02, 0x56, 0x62,
+		    0x31, 0x10, 0x11, 0x6d, 0xe0, 0x54, 0x9d, 0x40,
+		    0x1f, 0x26, 0x80, 0x41, 0xca, 0x3f, 0x68, 0x0f,
+		    0x32, 0x1d, 0x0a, 0x8e, 0x79, 0xd8, 0xa4, 0x1b,
+		    0x29, 0x1c, 0x90, 0x8e, 0xc5, 0xe3, 0xb4, 0x91,
+		    0x37, 0x9a, 0x97, 0x86, 0x99, 0xd5, 0x09, 0xc5,
+		    0xbb, 0xa3, 0x3f, 0x21, 0x29, 0x82, 0x14, 0x5c,
+		    0xab, 0x25, 0xfb, 0xf2, 0x4f, 0x58, 0x26, 0xd4,
+		    0x83, 0xaa, 0x66, 0x89, 0x67, 0x7e, 0xc0, 0x49,
+		    0xe1, 0x11, 0x10, 0x7f, 0x7a, 0xda, 0x29, 0x04,
+		    0xff, 0xf0, 0xcb, 0x09, 0x7c, 0x9d, 0xfa, 0x03,
+		    0x6f, 0x81, 0x09, 0x31, 0x60, 0xfb, 0x08, 0xfa,
+		    0x74, 0xd3, 0x64, 0x44, 0x7c, 0x55, 0x85, 0xec,
+		    0x9c, 0x6e, 0x25, 0xb7, 0x6c, 0xc5, 0x37, 0xb6,
+		    0x83, 0x87, 0x72, 0x95, 0x8b, 0x9d, 0xe1, 0x69,
+		    0x5c, 0x31, 0x95, 0x42, 0xa6, 0x2c, 0xd1, 0x36,
+		    0x47, 0x1f, 0xec, 0x54, 0xab, 0xa2, 0x1c, 0xd8,
+		    0x00, 0xcc, 0xbc, 0x0d, 0x65, 0xe2, 0x67, 0xbf,
+		    0xbc, 0xea, 0xee, 0x9e, 0xe4, 0x36, 0x95, 0xbe,
+		    0x73, 0xd9, 0xa6, 0xd9, 0x0f, 0xa0, 0xcc, 0x82,
+		    0x76, 0x26, 0xad, 0x5b, 0x58, 0x6c, 0x4e, 0xab,
+		    0x29, 0x64, 0xd3, 0xd9, 0xa9, 0x08, 0x8c, 0x1d,
+		    0xa1, 0x4f, 0x80, 0xd8, 0x3f, 0x94, 0xfb, 0xd3,
+		    0x7b, 0xfc, 0xd1, 0x2b, 0xc3, 0x21, 0xeb, 0xe5,
+		    0x1c, 0x84, 0x23, 0x7f, 0x4b, 0xfa, 0xdb, 0x34,
+		    0x18, 0xa2, 0xc2, 0xe5, 0x13, 0xfe, 0x6c, 0x49,
+		    0x81, 0xd2, 0x73, 0xe7, 0xe2, 0xd7, 0xe4, 0x4f,
+		    0x4b, 0x08, 0x6e, 0xb1, 0x12, 0x22, 0x10, 0x9d,
+		    0xac, 0x51, 0x1e, 0x17, 0xd9, 0x8a, 0x0b, 0x42,
+		    0x88, 0x16, 0x81, 0x37, 0x7c, 0x6a, 0xf7, 0xef,
+		    0x2d, 0xe3, 0xd9, 0xf8, 0x5f, 0xe0, 0x53, 0x27,
+		    0x74, 0xb9, 0xe2, 0xd6, 0x1c, 0x80, 0x2c, 0x52,
+		    0x65 }
+}, {
+	.key	= { 0x47, 0x11, 0xeb, 0x86, 0x2b, 0x2c, 0xab, 0x44,
+		    0x34, 0xda, 0x7f, 0x57, 0x03, 0x39, 0x0c, 0xaf,
+		    0x2c, 0x14, 0xfd, 0x65, 0x23, 0xe9, 0x8e, 0x74,
+		    0xd5, 0x08, 0x68, 0x08, 0xe7, 0xb4, 0x72, 0xd7 },
+	.nonce	= { 0xdb, 0x92, 0x0f, 0x7f, 0x17, 0x54, 0x0c, 0x30 },
+	.nlen	= 8,
+	.assoc	= { 0xd2, 0xa1, 0x70, 0xdb, 0x7a, 0xf8, 0xfa, 0x27,
+		    0xba, 0x73, 0x0f, 0xbf, 0x3d, 0x1e, 0x82, 0xb2 },
+	.alen	= 16,
+	.input	= { 0xe5, 0x26, 0xa4, 0x3d, 0xbd, 0x33, 0xd0, 0x4b,
+		    0x6f, 0x05, 0xa7, 0x6e, 0x12, 0x7a, 0xd2, 0x74,
+		    0xa6, 0xdd, 0xbd, 0x95, 0xeb, 0xf9, 0xa4, 0xf1,
+		    0x59, 0x93, 0x91, 0x70, 0xd9, 0xfe, 0x9a, 0xcd,
+		    0x53, 0x1f, 0x3a, 0xab, 0xa6, 0x7c, 0x9f, 0xa6,
+		    0x9e, 0xbd, 0x99, 0xd9, 0xb5, 0x97, 0x44, 0xd5,
+		    0x14, 0x48, 0x4d, 0x9d, 0xc0, 0xd0, 0x05, 0x96,
+		    0xeb, 0x4c, 0x78, 0x55, 0x09, 0x08, 0x01, 0x02,
+		    0x30, 0x90, 0x7b, 0x96, 0x7a, 0x7b, 0x5f, 0x30,
+		    0x41, 0x24, 0xce, 0x68, 0x61, 0x49, 0x86, 0x57,
+		    0x82, 0xdd, 0x53, 0x1c, 0x51, 0x28, 0x2b, 0x53,
+		    0x6e, 0x2d, 0xc2, 0x20, 0x4c, 0xdd, 0x8f, 0x65,
+		    0x10, 0x20, 0x50, 0xdd, 0x9d, 0x50, 0xe5, 0x71,
+		    0x40, 0x53, 0x69, 0xfc, 0x77, 0x48, 0x11, 0xb9,
+		    0xde, 0xa4, 0x8d, 0x58, 0xe4, 0xa6, 0x1a, 0x18,
+		    0x47, 0x81, 0x7e, 0xfc, 0xdd, 0xf6, 0xef, 0xce,
+		    0x2f, 0x43, 0x68, 0xd6, 0x06, 0xe2, 0x74, 0x6a,
+		    0xad, 0x90, 0xf5, 0x37, 0xf3, 0x3d, 0x82, 0x69,
+		    0x40, 0xe9, 0x6b, 0xa7, 0x3d, 0xa8, 0x1e, 0xd2,
+		    0x02, 0x7c, 0xb7, 0x9b, 0xe4, 0xda, 0x8f, 0x95,
+		    0x06, 0xc5, 0xdf, 0x73, 0xa3, 0x20, 0x9a, 0x49,
+		    0xde, 0x9c, 0xbc, 0xee, 0x14, 0x3f, 0x81, 0x5e,
+		    0xf8, 0x3b, 0x59, 0x3c, 0xe1, 0x68, 0x12, 0x5a,
+		    0x3a, 0x76, 0x3a, 0x3f, 0xf7, 0x87, 0x33, 0x0a,
+		    0x01, 0xb8, 0xd4, 0xed, 0xb6, 0xbe, 0x94, 0x5e,
+		    0x70, 0x40, 0x56, 0x67, 0x1f, 0x50, 0x44, 0x19,
+		    0xce, 0x82, 0x70, 0x10, 0x87, 0x13, 0x20, 0x0b,
+		    0x4c, 0x5a, 0xb6, 0xf6, 0xa7, 0xae, 0x81, 0x75,
+		    0x01, 0x81, 0xe6, 0x4b, 0x57, 0x7c, 0xdd, 0x6d,
+		    0xf8, 0x1c, 0x29, 0x32, 0xf7, 0xda, 0x3c, 0x2d,
+		    0xf8, 0x9b, 0x25, 0x6e, 0x00, 0xb4, 0xf7, 0x2f,
+		    0xf7, 0x04, 0xf7, 0xa1, 0x56, 0xac, 0x4f, 0x1a,
+		    0x64, 0xb8, 0x47, 0x55, 0x18, 0x7b, 0x07, 0x4d,
+		    0xbd, 0x47, 0x24, 0x80, 0x5d, 0xa2, 0x70, 0xc5,
+		    0xdd, 0x8e, 0x82, 0xd4, 0xeb, 0xec, 0xb2, 0x0c,
+		    0x39, 0xd2, 0x97, 0xc1, 0xcb, 0xeb, 0xf4, 0x77,
+		    0x59, 0xb4, 0x87, 0xef, 0xcb, 0x43, 0x2d, 0x46,
+		    0x54, 0xd1, 0xa7, 0xd7, 0x15, 0x99, 0x0a, 0x43,
+		    0xa1, 0xe0, 0x99, 0x33, 0x71, 0xc1, 0xed, 0xfe,
+		    0x72, 0x46, 0x33, 0x8e, 0x91, 0x08, 0x9f, 0xc8,
+		    0x2e, 0xca, 0xfa, 0xdc, 0x59, 0xd5, 0xc3, 0x76,
+		    0x84, 0x9f, 0xa3, 0x37, 0x68, 0xc3, 0xf0, 0x47,
+		    0x2c, 0x68, 0xdb, 0x5e, 0xc3, 0x49, 0x4c, 0xe8,
+		    0x92, 0x85, 0xe2, 0x23, 0xd3, 0x3f, 0xad, 0x32,
+		    0xe5, 0x2b, 0x82, 0xd7, 0x8f, 0x99, 0x0a, 0x59,
+		    0x5c, 0x45, 0xd9, 0xb4, 0x51, 0x52, 0xc2, 0xae,
+		    0xbf, 0x80, 0xcf, 0xc9, 0xc9, 0x51, 0x24, 0x2a,
+		    0x3b, 0x3a, 0x4d, 0xae, 0xeb, 0xbd, 0x22, 0xc3,
+		    0x0e, 0x0f, 0x59, 0x25, 0x92, 0x17, 0xe9, 0x74,
+		    0xc7, 0x8b, 0x70, 0x70, 0x36, 0x55, 0x95, 0x75,
+		    0x4b, 0xad, 0x61, 0x2b, 0x09, 0xbc, 0x82, 0xf2,
+		    0x6e, 0x94, 0x43, 0xae, 0xc3, 0xd5, 0xcd, 0x8e,
+		    0xfe, 0x5b, 0x9a, 0x88, 0x43, 0x01, 0x75, 0xb2,
+		    0x23, 0x09, 0xf7, 0x89, 0x83, 0xe7, 0xfa, 0xf9,
+		    0xb4, 0x9b, 0xf8, 0xef, 0xbd, 0x1c, 0x92, 0xc1,
+		    0xda, 0x7e, 0xfe, 0x05, 0xba, 0x5a, 0xcd, 0x07,
+		    0x6a, 0x78, 0x9e, 0x5d, 0xfb, 0x11, 0x2f, 0x79,
+		    0x38, 0xb6, 0xc2, 0x5b, 0x6b, 0x51, 0xb4, 0x71,
+		    0xdd, 0xf7, 0x2a, 0xe4, 0xf4, 0x72, 0x76, 0xad,
+		    0xc2, 0xdd, 0x64, 0x5d, 0x79, 0xb6, 0xf5, 0x7a,
+		    0x77, 0x20, 0x05, 0x3d, 0x30, 0x06, 0xd4, 0x4c,
+		    0x0a, 0x2c, 0x98, 0x5a, 0xb9, 0xd4, 0x98, 0xa9,
+		    0x3f, 0xc6, 0x12, 0xea, 0x3b, 0x4b, 0xc5, 0x79,
+		    0x64, 0x63, 0x6b, 0x09, 0x54, 0x3b, 0x14, 0x27,
+		    0xba, 0x99, 0x80, 0xc8, 0x72, 0xa8, 0x12, 0x90,
+		    0x29, 0xba, 0x40, 0x54, 0x97, 0x2b, 0x7b, 0xfe,
+		    0xeb, 0xcd, 0x01, 0x05, 0x44, 0x72, 0xdb, 0x99,
+		    0xe4, 0x61, 0xc9, 0x69, 0xd6, 0xb9, 0x28, 0xd1,
+		    0x05, 0x3e, 0xf9, 0x0b, 0x49, 0x0a, 0x49, 0xe9,
+		    0x8d, 0x0e, 0xa7, 0x4a, 0x0f, 0xaf, 0x32, 0xd0,
+		    0xe0, 0xb2, 0x3a, 0x55, 0x58, 0xfe, 0x5c, 0x28,
+		    0x70, 0x51, 0x23, 0xb0, 0x7b, 0x6a, 0x5f, 0x1e,
+		    0xb8, 0x17, 0xd7, 0x94, 0x15, 0x8f, 0xee, 0x20,
+		    0xc7, 0x42, 0x25, 0x3e, 0x9a, 0x14, 0xd7, 0x60,
+		    0x72, 0x39, 0x47, 0x48, 0xa9, 0xfe, 0xdd, 0x47,
+		    0x0a, 0xb1, 0xe6, 0x60, 0x28, 0x8c, 0x11, 0x68,
+		    0xe1, 0xff, 0xd7, 0xce, 0xc8, 0xbe, 0xb3, 0xfe,
+		    0x27, 0x30, 0x09, 0x70, 0xd7, 0xfa, 0x02, 0x33,
+		    0x3a, 0x61, 0x2e, 0xc7, 0xff, 0xa4, 0x2a, 0xa8,
+		    0x6e, 0xb4, 0x79, 0x35, 0x6d, 0x4c, 0x1e, 0x38,
+		    0xf8, 0xee, 0xd4, 0x84, 0x4e, 0x6e, 0x28, 0xa7,
+		    0xce, 0xc8, 0xc1, 0xcf, 0x80, 0x05, 0xf3, 0x04,
+		    0xef, 0xc8, 0x18, 0x28, 0x2e, 0x8d, 0x5e, 0x0c,
+		    0xdf, 0xb8, 0x5f, 0x96, 0xe8, 0xc6, 0x9c, 0x2f,
+		    0xe5, 0xa6, 0x44, 0xd7, 0xe7, 0x99, 0x44, 0x0c,
+		    0xec, 0xd7, 0x05, 0x60, 0x97, 0xbb, 0x74, 0x77,
+		    0x58, 0xd5, 0xbb, 0x48, 0xde, 0x5a, 0xb2, 0x54,
+		    0x7f, 0x0e, 0x46, 0x70, 0x6a, 0x6f, 0x78, 0xa5,
+		    0x08, 0x89, 0x05, 0x4e, 0x7e, 0xa0, 0x69, 0xb4,
+		    0x40, 0x60, 0x55, 0x77, 0x75, 0x9b, 0x19, 0xf2,
+		    0xd5, 0x13, 0x80, 0x77, 0xf9, 0x4b, 0x3f, 0x1e,
+		    0xee, 0xe6, 0x76, 0x84, 0x7b, 0x8c, 0xe5, 0x27,
+		    0xa8, 0x0a, 0x91, 0x01, 0x68, 0x71, 0x8a, 0x3f,
+		    0x06, 0xab, 0xf6, 0xa9, 0xa5, 0xe6, 0x72, 0x92,
+		    0xe4, 0x67, 0xe2, 0xa2, 0x46, 0x35, 0x84, 0x55,
+		    0x7d, 0xca, 0xa8, 0x85, 0xd0, 0xf1, 0x3f, 0xbe,
+		    0xd7, 0x34, 0x64, 0xfc, 0xae, 0xe3, 0xe4, 0x04,
+		    0x9f, 0x66, 0x02, 0xb9, 0x88, 0x10, 0xd9, 0xc4,
+		    0x4c, 0x31, 0x43, 0x7a, 0x93, 0xe2, 0x9b, 0x56,
+		    0x43, 0x84, 0xdc, 0xdc, 0xde, 0x1d, 0xa4, 0x02,
+		    0x0e, 0xc2, 0xef, 0xc3, 0xf8, 0x78, 0xd1, 0xb2,
+		    0x6b, 0x63, 0x18, 0xc9, 0xa9, 0xe5, 0x72, 0xd8,
+		    0xf3, 0xb9, 0xd1, 0x8a, 0xc7, 0x1a, 0x02, 0x27,
+		    0x20, 0x77, 0x10, 0xe5, 0xc8, 0xd4, 0x4a, 0x47,
+		    0xe5, 0xdf, 0x5f, 0x01, 0xaa, 0xb0, 0xd4, 0x10,
+		    0xbb, 0x69, 0xe3, 0x36, 0xc8, 0xe1, 0x3d, 0x43,
+		    0xfb, 0x86, 0xcd, 0xcc, 0xbf, 0xf4, 0x88, 0xe0,
+		    0x20, 0xca, 0xb7, 0x1b, 0xf1, 0x2f, 0x5c, 0xee,
+		    0xd4, 0xd3, 0xa3, 0xcc, 0xa4, 0x1e, 0x1c, 0x47,
+		    0xfb, 0xbf, 0xfc, 0xa2, 0x41, 0x55, 0x9d, 0xf6,
+		    0x5a, 0x5e, 0x65, 0x32, 0x34, 0x7b, 0x52, 0x8d,
+		    0xd5, 0xd0, 0x20, 0x60, 0x03, 0xab, 0x3f, 0x8c,
+		    0xd4, 0x21, 0xea, 0x2a, 0xd9, 0xc4, 0xd0, 0xd3,
+		    0x65, 0xd8, 0x7a, 0x13, 0x28, 0x62, 0x32, 0x4b,
+		    0x2c, 0x87, 0x93, 0xa8, 0xb4, 0x52, 0x45, 0x09,
+		    0x44, 0xec, 0xec, 0xc3, 0x17, 0xdb, 0x9a, 0x4d,
+		    0x5c, 0xa9, 0x11, 0xd4, 0x7d, 0xaf, 0x9e, 0xf1,
+		    0x2d, 0xb2, 0x66, 0xc5, 0x1d, 0xed, 0xb7, 0xcd,
+		    0x0b, 0x25, 0x5e, 0x30, 0x47, 0x3f, 0x40, 0xf4,
+		    0xa1, 0xa0, 0x00, 0x94, 0x10, 0xc5, 0x6a, 0x63,
+		    0x1a, 0xd5, 0x88, 0x92, 0x8e, 0x82, 0x39, 0x87,
+		    0x3c, 0x78, 0x65, 0x58, 0x42, 0x75, 0x5b, 0xdd,
+		    0x77, 0x3e, 0x09, 0x4e, 0x76, 0x5b, 0xe6, 0x0e,
+		    0x4d, 0x38, 0xb2, 0xc0, 0xb8, 0x95, 0x01, 0x7a,
+		    0x10, 0xe0, 0xfb, 0x07, 0xf2, 0xab, 0x2d, 0x8c,
+		    0x32, 0xed, 0x2b, 0xc0, 0x46, 0xc2, 0xf5, 0x38,
+		    0x83, 0xf0, 0x17, 0xec, 0xc1, 0x20, 0x6a, 0x9a,
+		    0x0b, 0x00, 0xa0, 0x98, 0x22, 0x50, 0x23, 0xd5,
+		    0x80, 0x6b, 0xf6, 0x1f, 0xc3, 0xcc, 0x97, 0xc9,
+		    0x24, 0x9f, 0xf3, 0xaf, 0x43, 0x14, 0xd5, 0xa0 },
+	.ilen	= 1040,
+	.result	= { 0x42, 0x93, 0xe4, 0xeb, 0x97, 0xb0, 0x57, 0xbf,
+		    0x1a, 0x8b, 0x1f, 0xe4, 0x5f, 0x36, 0x20, 0x3c,
+		    0xef, 0x0a, 0xa9, 0x48, 0x5f, 0x5f, 0x37, 0x22,
+		    0x3a, 0xde, 0xe3, 0xae, 0xbe, 0xad, 0x07, 0xcc,
+		    0xb1, 0xf6, 0xf5, 0xf9, 0x56, 0xdd, 0xe7, 0x16,
+		    0x1e, 0x7f, 0xdf, 0x7a, 0x9e, 0x75, 0xb7, 0xc7,
+		    0xbe, 0xbe, 0x8a, 0x36, 0x04, 0xc0, 0x10, 0xf4,
+		    0x95, 0x20, 0x03, 0xec, 0xdc, 0x05, 0xa1, 0x7d,
+		    0xc4, 0xa9, 0x2c, 0x82, 0xd0, 0xbc, 0x8b, 0xc5,
+		    0xc7, 0x45, 0x50, 0xf6, 0xa2, 0x1a, 0xb5, 0x46,
+		    0x3b, 0x73, 0x02, 0xa6, 0x83, 0x4b, 0x73, 0x82,
+		    0x58, 0x5e, 0x3b, 0x65, 0x2f, 0x0e, 0xfd, 0x2b,
+		    0x59, 0x16, 0xce, 0xa1, 0x60, 0x9c, 0xe8, 0x3a,
+		    0x99, 0xed, 0x8d, 0x5a, 0xcf, 0xf6, 0x83, 0xaf,
+		    0xba, 0xd7, 0x73, 0x73, 0x40, 0x97, 0x3d, 0xca,
+		    0xef, 0x07, 0x57, 0xe6, 0xd9, 0x70, 0x0e, 0x95,
+		    0xae, 0xa6, 0x8d, 0x04, 0xcc, 0xee, 0xf7, 0x09,
+		    0x31, 0x77, 0x12, 0xa3, 0x23, 0x97, 0x62, 0xb3,
+		    0x7b, 0x32, 0xfb, 0x80, 0x14, 0x48, 0x81, 0xc3,
+		    0xe5, 0xea, 0x91, 0x39, 0x52, 0x81, 0xa2, 0x4f,
+		    0xe4, 0xb3, 0x09, 0xff, 0xde, 0x5e, 0xe9, 0x58,
+		    0x84, 0x6e, 0xf9, 0x3d, 0xdf, 0x25, 0xea, 0xad,
+		    0xae, 0xe6, 0x9a, 0xd1, 0x89, 0x55, 0xd3, 0xde,
+		    0x6c, 0x52, 0xdb, 0x70, 0xfe, 0x37, 0xce, 0x44,
+		    0x0a, 0xa8, 0x25, 0x5f, 0x92, 0xc1, 0x33, 0x4a,
+		    0x4f, 0x9b, 0x62, 0x35, 0xff, 0xce, 0xc0, 0xa9,
+		    0x60, 0xce, 0x52, 0x00, 0x97, 0x51, 0x35, 0x26,
+		    0x2e, 0xb9, 0x36, 0xa9, 0x87, 0x6e, 0x1e, 0xcc,
+		    0x91, 0x78, 0x53, 0x98, 0x86, 0x5b, 0x9c, 0x74,
+		    0x7d, 0x88, 0x33, 0xe1, 0xdf, 0x37, 0x69, 0x2b,
+		    0xbb, 0xf1, 0x4d, 0xf4, 0xd1, 0xf1, 0x39, 0x93,
+		    0x17, 0x51, 0x19, 0xe3, 0x19, 0x1e, 0x76, 0x37,
+		    0x25, 0xfb, 0x09, 0x27, 0x6a, 0xab, 0x67, 0x6f,
+		    0x14, 0x12, 0x64, 0xe7, 0xc4, 0x07, 0xdf, 0x4d,
+		    0x17, 0xbb, 0x6d, 0xe0, 0xe9, 0xb9, 0xab, 0xca,
+		    0x10, 0x68, 0xaf, 0x7e, 0xb7, 0x33, 0x54, 0x73,
+		    0x07, 0x6e, 0xf7, 0x81, 0x97, 0x9c, 0x05, 0x6f,
+		    0x84, 0x5f, 0xd2, 0x42, 0xfb, 0x38, 0xcf, 0xd1,
+		    0x2f, 0x14, 0x30, 0x88, 0x98, 0x4d, 0x5a, 0xa9,
+		    0x76, 0xd5, 0x4f, 0x3e, 0x70, 0x6c, 0x85, 0x76,
+		    0xd7, 0x01, 0xa0, 0x1a, 0xc8, 0x4e, 0xaa, 0xac,
+		    0x78, 0xfe, 0x46, 0xde, 0x6a, 0x05, 0x46, 0xa7,
+		    0x43, 0x0c, 0xb9, 0xde, 0xb9, 0x68, 0xfb, 0xce,
+		    0x42, 0x99, 0x07, 0x4d, 0x0b, 0x3b, 0x5a, 0x30,
+		    0x35, 0xa8, 0xf9, 0x3a, 0x73, 0xef, 0x0f, 0xdb,
+		    0x1e, 0x16, 0x42, 0xc4, 0xba, 0xae, 0x58, 0xaa,
+		    0xf8, 0xe5, 0x75, 0x2f, 0x1b, 0x15, 0x5c, 0xfd,
+		    0x0a, 0x97, 0xd0, 0xe4, 0x37, 0x83, 0x61, 0x5f,
+		    0x43, 0xa6, 0xc7, 0x3f, 0x38, 0x59, 0xe6, 0xeb,
+		    0xa3, 0x90, 0xc3, 0xaa, 0xaa, 0x5a, 0xd3, 0x34,
+		    0xd4, 0x17, 0xc8, 0x65, 0x3e, 0x57, 0xbc, 0x5e,
+		    0xdd, 0x9e, 0xb7, 0xf0, 0x2e, 0x5b, 0xb2, 0x1f,
+		    0x8a, 0x08, 0x0d, 0x45, 0x91, 0x0b, 0x29, 0x53,
+		    0x4f, 0x4c, 0x5a, 0x73, 0x56, 0xfe, 0xaf, 0x41,
+		    0x01, 0x39, 0x0a, 0x24, 0x3c, 0x7e, 0xbe, 0x4e,
+		    0x53, 0xf3, 0xeb, 0x06, 0x66, 0x51, 0x28, 0x1d,
+		    0xbd, 0x41, 0x0a, 0x01, 0xab, 0x16, 0x47, 0x27,
+		    0x47, 0x47, 0xf7, 0xcb, 0x46, 0x0a, 0x70, 0x9e,
+		    0x01, 0x9c, 0x09, 0xe1, 0x2a, 0x00, 0x1a, 0xd8,
+		    0xd4, 0x79, 0x9d, 0x80, 0x15, 0x8e, 0x53, 0x2a,
+		    0x65, 0x83, 0x78, 0x3e, 0x03, 0x00, 0x07, 0x12,
+		    0x1f, 0x33, 0x3e, 0x7b, 0x13, 0x37, 0xf1, 0xc3,
+		    0xef, 0xb7, 0xc1, 0x20, 0x3c, 0x3e, 0x67, 0x66,
+		    0x5d, 0x88, 0xa7, 0x7d, 0x33, 0x50, 0x77, 0xb0,
+		    0x28, 0x8e, 0xe7, 0x2c, 0x2e, 0x7a, 0xf4, 0x3c,
+		    0x8d, 0x74, 0x83, 0xaf, 0x8e, 0x87, 0x0f, 0xe4,
+		    0x50, 0xff, 0x84, 0x5c, 0x47, 0x0c, 0x6a, 0x49,
+		    0xbf, 0x42, 0x86, 0x77, 0x15, 0x48, 0xa5, 0x90,
+		    0x5d, 0x93, 0xd6, 0x2a, 0x11, 0xd5, 0xd5, 0x11,
+		    0xaa, 0xce, 0xe7, 0x6f, 0xa5, 0xb0, 0x09, 0x2c,
+		    0x8d, 0xd3, 0x92, 0xf0, 0x5a, 0x2a, 0xda, 0x5b,
+		    0x1e, 0xd5, 0x9a, 0xc4, 0xc4, 0xf3, 0x49, 0x74,
+		    0x41, 0xca, 0xe8, 0xc1, 0xf8, 0x44, 0xd6, 0x3c,
+		    0xae, 0x6c, 0x1d, 0x9a, 0x30, 0x04, 0x4d, 0x27,
+		    0x0e, 0xb1, 0x5f, 0x59, 0xa2, 0x24, 0xe8, 0xe1,
+		    0x98, 0xc5, 0x6a, 0x4c, 0xfe, 0x41, 0xd2, 0x27,
+		    0x42, 0x52, 0xe1, 0xe9, 0x7d, 0x62, 0xe4, 0x88,
+		    0x0f, 0xad, 0xb2, 0x70, 0xcb, 0x9d, 0x4c, 0x27,
+		    0x2e, 0x76, 0x1e, 0x1a, 0x63, 0x65, 0xf5, 0x3b,
+		    0xf8, 0x57, 0x69, 0xeb, 0x5b, 0x38, 0x26, 0x39,
+		    0x33, 0x25, 0x45, 0x3e, 0x91, 0xb8, 0xd8, 0xc7,
+		    0xd5, 0x42, 0xc0, 0x22, 0x31, 0x74, 0xf4, 0xbc,
+		    0x0c, 0x23, 0xf1, 0xca, 0xc1, 0x8d, 0xd7, 0xbe,
+		    0xc9, 0x62, 0xe4, 0x08, 0x1a, 0xcf, 0x36, 0xd5,
+		    0xfe, 0x55, 0x21, 0x59, 0x91, 0x87, 0x87, 0xdf,
+		    0x06, 0xdb, 0xdf, 0x96, 0x45, 0x58, 0xda, 0x05,
+		    0xcd, 0x50, 0x4d, 0xd2, 0x7d, 0x05, 0x18, 0x73,
+		    0x6a, 0x8d, 0x11, 0x85, 0xa6, 0x88, 0xe8, 0xda,
+		    0xe6, 0x30, 0x33, 0xa4, 0x89, 0x31, 0x75, 0xbe,
+		    0x69, 0x43, 0x84, 0x43, 0x50, 0x87, 0xdd, 0x71,
+		    0x36, 0x83, 0xc3, 0x78, 0x74, 0x24, 0x0a, 0xed,
+		    0x7b, 0xdb, 0xa4, 0x24, 0x0b, 0xb9, 0x7e, 0x5d,
+		    0xff, 0xde, 0xb1, 0xef, 0x61, 0x5a, 0x45, 0x33,
+		    0xf6, 0x17, 0x07, 0x08, 0x98, 0x83, 0x92, 0x0f,
+		    0x23, 0x6d, 0xe6, 0xaa, 0x17, 0x54, 0xad, 0x6a,
+		    0xc8, 0xdb, 0x26, 0xbe, 0xb8, 0xb6, 0x08, 0xfa,
+		    0x68, 0xf1, 0xd7, 0x79, 0x6f, 0x18, 0xb4, 0x9e,
+		    0x2d, 0x3f, 0x1b, 0x64, 0xaf, 0x8d, 0x06, 0x0e,
+		    0x49, 0x28, 0xe0, 0x5d, 0x45, 0x68, 0x13, 0x87,
+		    0xfa, 0xde, 0x40, 0x7b, 0xd2, 0xc3, 0x94, 0xd5,
+		    0xe1, 0xd9, 0xc2, 0xaf, 0x55, 0x89, 0xeb, 0xb4,
+		    0x12, 0x59, 0xa8, 0xd4, 0xc5, 0x29, 0x66, 0x38,
+		    0xe6, 0xac, 0x22, 0x22, 0xd9, 0x64, 0x9b, 0x34,
+		    0x0a, 0x32, 0x9f, 0xc2, 0xbf, 0x17, 0x6c, 0x3f,
+		    0x71, 0x7a, 0x38, 0x6b, 0x98, 0xfb, 0x49, 0x36,
+		    0x89, 0xc9, 0xe2, 0xd6, 0xc7, 0x5d, 0xd0, 0x69,
+		    0x5f, 0x23, 0x35, 0xc9, 0x30, 0xe2, 0xfd, 0x44,
+		    0x58, 0x39, 0xd7, 0x97, 0xfb, 0x5c, 0x00, 0xd5,
+		    0x4f, 0x7a, 0x1a, 0x95, 0x8b, 0x62, 0x4b, 0xce,
+		    0xe5, 0x91, 0x21, 0x7b, 0x30, 0x00, 0xd6, 0xdd,
+		    0x6d, 0x02, 0x86, 0x49, 0x0f, 0x3c, 0x1a, 0x27,
+		    0x3c, 0xd3, 0x0e, 0x71, 0xf2, 0xff, 0xf5, 0x2f,
+		    0x87, 0xac, 0x67, 0x59, 0x81, 0xa3, 0xf7, 0xf8,
+		    0xd6, 0x11, 0x0c, 0x84, 0xa9, 0x03, 0xee, 0x2a,
+		    0xc4, 0xf3, 0x22, 0xab, 0x7c, 0xe2, 0x25, 0xf5,
+		    0x67, 0xa3, 0xe4, 0x11, 0xe0, 0x59, 0xb3, 0xca,
+		    0x87, 0xa0, 0xae, 0xc9, 0xa6, 0x62, 0x1b, 0x6e,
+		    0x4d, 0x02, 0x6b, 0x07, 0x9d, 0xfd, 0xd0, 0x92,
+		    0x06, 0xe1, 0xb2, 0x9a, 0x4a, 0x1f, 0x1f, 0x13,
+		    0x49, 0x99, 0x97, 0x08, 0xde, 0x7f, 0x98, 0xaf,
+		    0x51, 0x98, 0xee, 0x2c, 0xcb, 0xf0, 0x0b, 0xc6,
+		    0xb6, 0xb7, 0x2d, 0x9a, 0xb1, 0xac, 0xa6, 0xe3,
+		    0x15, 0x77, 0x9d, 0x6b, 0x1a, 0xe4, 0xfc, 0x8b,
+		    0xf2, 0x17, 0x59, 0x08, 0x04, 0x58, 0x81, 0x9d,
+		    0x1b, 0x1b, 0x69, 0x55, 0xc2, 0xb4, 0x3c, 0x1f,
+		    0x50, 0xf1, 0x7f, 0x77, 0x90, 0x4c, 0x66, 0x40,
+		    0x5a, 0xc0, 0x33, 0x1f, 0xcb, 0x05, 0x6d, 0x5c,
+		    0x06, 0x87, 0x52, 0xa2, 0x8f, 0x26, 0xd5, 0x4f }
+}, {
+	.key	= { 0x35, 0x4e, 0xb5, 0x70, 0x50, 0x42, 0x8a, 0x85,
+		    0xf2, 0xfb, 0xed, 0x7b, 0xd0, 0x9e, 0x97, 0xca,
+		    0xfa, 0x98, 0x66, 0x63, 0xee, 0x37, 0xcc, 0x52,
+		    0xfe, 0xd1, 0xdf, 0x95, 0x15, 0x34, 0x29, 0x38 },
+	.nonce	= { 0xfd, 0x87, 0xd4, 0xd8, 0x62, 0xfd, 0xec, 0xaa },
+	.nlen	= 8,
+	.assoc	= { 0xd6, 0x31, 0xda, 0x5d, 0x42, 0x5e, 0xd7 },
+	.alen	= 7,
+	.input	= { 0x6a, 0xfc, 0x4b, 0x25, 0xdf, 0xc0, 0xe4, 0xe8,
+		    0x17, 0x4d, 0x4c, 0xc9, 0x7e, 0xde, 0x3a, 0xcc,
+		    0x3c, 0xba, 0x6a, 0x77, 0x47, 0xdb, 0xe3, 0x74,
+		    0x7a, 0x4d, 0x5f, 0x8d, 0x37, 0x55, 0x80, 0x73,
+		    0x90, 0x66, 0x5d, 0x3a, 0x7d, 0x5d, 0x86, 0x5e,
+		    0x8d, 0xfd, 0x83, 0xff, 0x4e, 0x74, 0x6f, 0xf9,
+		    0xe6, 0x70, 0x17, 0x70, 0x3e, 0x96, 0xa7, 0x7e,
+		    0xcb, 0xab, 0x8f, 0x58, 0x24, 0x9b, 0x01, 0xfd,
+		    0xcb, 0xe6, 0x4d, 0x9b, 0xf0, 0x88, 0x94, 0x57,
+		    0x66, 0xef, 0x72, 0x4c, 0x42, 0x6e, 0x16, 0x19,
+		    0x15, 0xea, 0x70, 0x5b, 0xac, 0x13, 0xdb, 0x9f,
+		    0x18, 0xe2, 0x3c, 0x26, 0x97, 0xbc, 0xdc, 0x45,
+		    0x8c, 0x6c, 0x24, 0x69, 0x9c, 0xf7, 0x65, 0x1e,
+		    0x18, 0x59, 0x31, 0x7c, 0xe4, 0x73, 0xbc, 0x39,
+		    0x62, 0xc6, 0x5c, 0x9f, 0xbf, 0xfa, 0x90, 0x03,
+		    0xc9, 0x72, 0x26, 0xb6, 0x1b, 0xc2, 0xb7, 0x3f,
+		    0xf2, 0x13, 0x77, 0xf2, 0x8d, 0xb9, 0x47, 0xd0,
+		    0x53, 0xdd, 0xc8, 0x91, 0x83, 0x8b, 0xb1, 0xce,
+		    0xa3, 0xfe, 0xcd, 0xd9, 0xdd, 0x92, 0x7b, 0xdb,
+		    0xb8, 0xfb, 0xc9, 0x2d, 0x01, 0x59, 0x39, 0x52,
+		    0xad, 0x1b, 0xec, 0xcf, 0xd7, 0x70, 0x13, 0x21,
+		    0xf5, 0x47, 0xaa, 0x18, 0x21, 0x5c, 0xc9, 0x9a,
+		    0xd2, 0x6b, 0x05, 0x9c, 0x01, 0xa1, 0xda, 0x35,
+		    0x5d, 0xb3, 0x70, 0xe6, 0xa9, 0x80, 0x8b, 0x91,
+		    0xb7, 0xb3, 0x5f, 0x24, 0x9a, 0xb7, 0xd1, 0x6b,
+		    0xa1, 0x1c, 0x50, 0xba, 0x49, 0xe0, 0xee, 0x2e,
+		    0x75, 0xac, 0x69, 0xc0, 0xeb, 0x03, 0xdd, 0x19,
+		    0xe5, 0xf6, 0x06, 0xdd, 0xc3, 0xd7, 0x2b, 0x07,
+		    0x07, 0x30, 0xa7, 0x19, 0x0c, 0xbf, 0xe6, 0x18,
+		    0xcc, 0xb1, 0x01, 0x11, 0x85, 0x77, 0x1d, 0x96,
+		    0xa7, 0xa3, 0x00, 0x84, 0x02, 0xa2, 0x83, 0x68,
+		    0xda, 0x17, 0x27, 0xc8, 0x7f, 0x23, 0xb7, 0xf4,
+		    0x13, 0x85, 0xcf, 0xdd, 0x7a, 0x7d, 0x24, 0x57,
+		    0xfe, 0x05, 0x93, 0xf5, 0x74, 0xce, 0xed, 0x0c,
+		    0x20, 0x98, 0x8d, 0x92, 0x30, 0xa1, 0x29, 0x23,
+		    0x1a, 0xa0, 0x4f, 0x69, 0x56, 0x4c, 0xe1, 0xc8,
+		    0xce, 0xf6, 0x9a, 0x0c, 0xa4, 0xfa, 0x04, 0xf6,
+		    0x62, 0x95, 0xf2, 0xfa, 0xc7, 0x40, 0x68, 0x40,
+		    0x8f, 0x41, 0xda, 0xb4, 0x26, 0x6f, 0x70, 0xab,
+		    0x40, 0x61, 0xa4, 0x0e, 0x75, 0xfb, 0x86, 0xeb,
+		    0x9d, 0x9a, 0x1f, 0xec, 0x76, 0x99, 0xe7, 0xea,
+		    0xaa, 0x1e, 0x2d, 0xb5, 0xd4, 0xa6, 0x1a, 0xb8,
+		    0x61, 0x0a, 0x1d, 0x16, 0x5b, 0x98, 0xc2, 0x31,
+		    0x40, 0xe7, 0x23, 0x1d, 0x66, 0x99, 0xc8, 0xc0,
+		    0xd7, 0xce, 0xf3, 0x57, 0x40, 0x04, 0x3f, 0xfc,
+		    0xea, 0xb3, 0xfc, 0xd2, 0xd3, 0x99, 0xa4, 0x94,
+		    0x69, 0xa0, 0xef, 0xd1, 0x85, 0xb3, 0xa6, 0xb1,
+		    0x28, 0xbf, 0x94, 0x67, 0x22, 0xc3, 0x36, 0x46,
+		    0xf8, 0xd2, 0x0f, 0x5f, 0xf4, 0x59, 0x80, 0xe6,
+		    0x2d, 0x43, 0x08, 0x7d, 0x19, 0x09, 0x97, 0xa7,
+		    0x4c, 0x3d, 0x8d, 0xba, 0x65, 0x62, 0xa3, 0x71,
+		    0x33, 0x29, 0x62, 0xdb, 0xc1, 0x33, 0x34, 0x1a,
+		    0x63, 0x33, 0x16, 0xb6, 0x64, 0x7e, 0xab, 0x33,
+		    0xf0, 0xe6, 0x26, 0x68, 0xba, 0x1d, 0x2e, 0x38,
+		    0x08, 0xe6, 0x02, 0xd3, 0x25, 0x2c, 0x47, 0x23,
+		    0x58, 0x34, 0x0f, 0x9d, 0x63, 0x4f, 0x63, 0xbb,
+		    0x7f, 0x3b, 0x34, 0x38, 0xa7, 0xb5, 0x8d, 0x65,
+		    0xd9, 0x9f, 0x79, 0x55, 0x3e, 0x4d, 0xe7, 0x73,
+		    0xd8, 0xf6, 0x98, 0x97, 0x84, 0x60, 0x9c, 0xc8,
+		    0xa9, 0x3c, 0xf6, 0xdc, 0x12, 0x5c, 0xe1, 0xbb,
+		    0x0b, 0x8b, 0x98, 0x9c, 0x9d, 0x26, 0x7c, 0x4a,
+		    0xe6, 0x46, 0x36, 0x58, 0x21, 0x4a, 0xee, 0xca,
+		    0xd7, 0x3b, 0xc2, 0x6c, 0x49, 0x2f, 0xe5, 0xd5,
+		    0x03, 0x59, 0x84, 0x53, 0xcb, 0xfe, 0x92, 0x71,
+		    0x2e, 0x7c, 0x21, 0xcc, 0x99, 0x85, 0x7f, 0xb8,
+		    0x74, 0x90, 0x13, 0x42, 0x3f, 0xe0, 0x6b, 0x1d,
+		    0xf2, 0x4d, 0x54, 0xd4, 0xfc, 0x3a, 0x05, 0xe6,
+		    0x74, 0xaf, 0xa6, 0xa0, 0x2a, 0x20, 0x23, 0x5d,
+		    0x34, 0x5c, 0xd9, 0x3e, 0x4e, 0xfa, 0x93, 0xe7,
+		    0xaa, 0xe9, 0x6f, 0x08, 0x43, 0x67, 0x41, 0xc5,
+		    0xad, 0xfb, 0x31, 0x95, 0x82, 0x73, 0x32, 0xd8,
+		    0xa6, 0xa3, 0xed, 0x0e, 0x2d, 0xf6, 0x5f, 0xfd,
+		    0x80, 0xa6, 0x7a, 0xe0, 0xdf, 0x78, 0x15, 0x29,
+		    0x74, 0x33, 0xd0, 0x9e, 0x83, 0x86, 0x72, 0x22,
+		    0x57, 0x29, 0xb9, 0x9e, 0x5d, 0xd3, 0x1a, 0xb5,
+		    0x96, 0x72, 0x41, 0x3d, 0xf1, 0x64, 0x43, 0x67,
+		    0xee, 0xaa, 0x5c, 0xd3, 0x9a, 0x96, 0x13, 0x11,
+		    0x5d, 0xf3, 0x0c, 0x87, 0x82, 0x1e, 0x41, 0x9e,
+		    0xd0, 0x27, 0xd7, 0x54, 0x3b, 0x67, 0x73, 0x09,
+		    0x91, 0xe9, 0xd5, 0x36, 0xa7, 0xb5, 0x55, 0xe4,
+		    0xf3, 0x21, 0x51, 0x49, 0x22, 0x07, 0x55, 0x4f,
+		    0x44, 0x4b, 0xd2, 0x15, 0x93, 0x17, 0x2a, 0xfa,
+		    0x4d, 0x4a, 0x57, 0xdb, 0x4c, 0xa6, 0xeb, 0xec,
+		    0x53, 0x25, 0x6c, 0x21, 0xed, 0x00, 0x4c, 0x3b,
+		    0xca, 0x14, 0x57, 0xa9, 0xd6, 0x6a, 0xcd, 0x8d,
+		    0x5e, 0x74, 0xac, 0x72, 0xc1, 0x97, 0xe5, 0x1b,
+		    0x45, 0x4e, 0xda, 0xfc, 0xcc, 0x40, 0xe8, 0x48,
+		    0x88, 0x0b, 0xa3, 0xe3, 0x8d, 0x83, 0x42, 0xc3,
+		    0x23, 0xfd, 0x68, 0xb5, 0x8e, 0xf1, 0x9d, 0x63,
+		    0x77, 0xe9, 0xa3, 0x8e, 0x8c, 0x26, 0x6b, 0xbd,
+		    0x72, 0x73, 0x35, 0x0c, 0x03, 0xf8, 0x43, 0x78,
+		    0x52, 0x71, 0x15, 0x1f, 0x71, 0x5d, 0x6e, 0xed,
+		    0xb9, 0xcc, 0x86, 0x30, 0xdb, 0x2b, 0xd3, 0x82,
+		    0x88, 0x23, 0x71, 0x90, 0x53, 0x5c, 0xa9, 0x2f,
+		    0x76, 0x01, 0xb7, 0x9a, 0xfe, 0x43, 0x55, 0xa3,
+		    0x04, 0x9b, 0x0e, 0xe4, 0x59, 0xdf, 0xc9, 0xe9,
+		    0xb1, 0xea, 0x29, 0x28, 0x3c, 0x5c, 0xae, 0x72,
+		    0x84, 0xb6, 0xc6, 0xeb, 0x0c, 0x27, 0x07, 0x74,
+		    0x90, 0x0d, 0x31, 0xb0, 0x00, 0x77, 0xe9, 0x40,
+		    0x70, 0x6f, 0x68, 0xa7, 0xfd, 0x06, 0xec, 0x4b,
+		    0xc0, 0xb7, 0xac, 0xbc, 0x33, 0xb7, 0x6d, 0x0a,
+		    0xbd, 0x12, 0x1b, 0x59, 0xcb, 0xdd, 0x32, 0xf5,
+		    0x1d, 0x94, 0x57, 0x76, 0x9e, 0x0c, 0x18, 0x98,
+		    0x71, 0xd7, 0x2a, 0xdb, 0x0b, 0x7b, 0xa7, 0x71,
+		    0xb7, 0x67, 0x81, 0x23, 0x96, 0xae, 0xb9, 0x7e,
+		    0x32, 0x43, 0x92, 0x8a, 0x19, 0xa0, 0xc4, 0xd4,
+		    0x3b, 0x57, 0xf9, 0x4a, 0x2c, 0xfb, 0x51, 0x46,
+		    0xbb, 0xcb, 0x5d, 0xb3, 0xef, 0x13, 0x93, 0x6e,
+		    0x68, 0x42, 0x54, 0x57, 0xd3, 0x6a, 0x3a, 0x8f,
+		    0x9d, 0x66, 0xbf, 0xbd, 0x36, 0x23, 0xf5, 0x93,
+		    0x83, 0x7b, 0x9c, 0xc0, 0xdd, 0xc5, 0x49, 0xc0,
+		    0x64, 0xed, 0x07, 0x12, 0xb3, 0xe6, 0xe4, 0xe5,
+		    0x38, 0x95, 0x23, 0xb1, 0xa0, 0x3b, 0x1a, 0x61,
+		    0xda, 0x17, 0xac, 0xc3, 0x58, 0xdd, 0x74, 0x64,
+		    0x22, 0x11, 0xe8, 0x32, 0x1d, 0x16, 0x93, 0x85,
+		    0x99, 0xa5, 0x9c, 0x34, 0x55, 0xb1, 0xe9, 0x20,
+		    0x72, 0xc9, 0x28, 0x7b, 0x79, 0x00, 0xa1, 0xa6,
+		    0xa3, 0x27, 0x40, 0x18, 0x8a, 0x54, 0xe0, 0xcc,
+		    0xe8, 0x4e, 0x8e, 0x43, 0x96, 0xe7, 0x3f, 0xc8,
+		    0xe9, 0xb2, 0xf9, 0xc9, 0xda, 0x04, 0x71, 0x50,
+		    0x47, 0xe4, 0xaa, 0xce, 0xa2, 0x30, 0xc8, 0xe4,
+		    0xac, 0xc7, 0x0d, 0x06, 0x2e, 0xe6, 0xe8, 0x80,
+		    0x36, 0x29, 0x9e, 0x01, 0xb8, 0xc3, 0xf0, 0xa0,
+		    0x5d, 0x7a, 0xca, 0x4d, 0xa0, 0x57, 0xbd, 0x2a,
+		    0x45, 0xa7, 0x7f, 0x9c, 0x93, 0x07, 0x8f, 0x35,
+		    0x67, 0x92, 0xe3, 0xe9, 0x7f, 0xa8, 0x61, 0x43,
+		    0x9e, 0x25, 0x4f, 0x33, 0x76, 0x13, 0x6e, 0x12,
+		    0xb9, 0xdd, 0xa4, 0x7c, 0x08, 0x9f, 0x7c, 0xe7,
+		    0x0a, 0x8d, 0x84, 0x06, 0xa4, 0x33, 0x17, 0x34,
+		    0x5e, 0x10, 0x7c, 0xc0, 0xa8, 0x3d, 0x1f, 0x42,
+		    0x20, 0x51, 0x65, 0x5d, 0x09, 0xc3, 0xaa, 0xc0,
+		    0xc8, 0x0d, 0xf0, 0x79, 0xbc, 0x20, 0x1b, 0x95,
+		    0xe7, 0x06, 0x7d, 0x47, 0x20, 0x03, 0x1a, 0x74,
+		    0xdd, 0xe2, 0xd4, 0xae, 0x38, 0x71, 0x9b, 0xf5,
+		    0x80, 0xec, 0x08, 0x4e, 0x56, 0xba, 0x76, 0x12,
+		    0x1a, 0xdf, 0x48, 0xf3, 0xae, 0xb3, 0xe6, 0xe6,
+		    0xbe, 0xc0, 0x91, 0x2e, 0x01, 0xb3, 0x01, 0x86,
+		    0xa2, 0xb9, 0x52, 0xd1, 0x21, 0xae, 0xd4, 0x97,
+		    0x1d, 0xef, 0x41, 0x12, 0x95, 0x3d, 0x48, 0x45,
+		    0x1c, 0x56, 0x32, 0x8f, 0xb8, 0x43, 0xbb, 0x19,
+		    0xf3, 0xca, 0xe9, 0xeb, 0x6d, 0x84, 0xbe, 0x86,
+		    0x06, 0xe2, 0x36, 0xb2, 0x62, 0x9d, 0xd3, 0x4c,
+		    0x48, 0x18, 0x54, 0x13, 0x4e, 0xcf, 0xfd, 0xba,
+		    0x84, 0xb9, 0x30, 0x53, 0xcf, 0xfb, 0xb9, 0x29,
+		    0x8f, 0xdc, 0x9f, 0xef, 0x60, 0x0b, 0x64, 0xf6,
+		    0x8b, 0xee, 0xa6, 0x91, 0xc2, 0x41, 0x6c, 0xf6,
+		    0xfa, 0x79, 0x67, 0x4b, 0xc1, 0x3f, 0xaf, 0x09,
+		    0x81, 0xd4, 0x5d, 0xcb, 0x09, 0xdf, 0x36, 0x31,
+		    0xc0, 0x14, 0x3c, 0x7c, 0x0e, 0x65, 0x95, 0x99,
+		    0x6d, 0xa3, 0xf4, 0xd7, 0x38, 0xee, 0x1a, 0x2b,
+		    0x37, 0xe2, 0xa4, 0x3b, 0x4b, 0xd0, 0x65, 0xca,
+		    0xf8, 0xc3, 0xe8, 0x15, 0x20, 0xef, 0xf2, 0x00,
+		    0xfd, 0x01, 0x09, 0xc5, 0xc8, 0x17, 0x04, 0x93,
+		    0xd0, 0x93, 0x03, 0x55, 0xc5, 0xfe, 0x32, 0xa3,
+		    0x3e, 0x28, 0x2d, 0x3b, 0x93, 0x8a, 0xcc, 0x07,
+		    0x72, 0x80, 0x8b, 0x74, 0x16, 0x24, 0xbb, 0xda,
+		    0x94, 0x39, 0x30, 0x8f, 0xb1, 0xcd, 0x4a, 0x90,
+		    0x92, 0x7c, 0x14, 0x8f, 0x95, 0x4e, 0xac, 0x9b,
+		    0xd8, 0x8f, 0x1a, 0x87, 0xa4, 0x32, 0x27, 0x8a,
+		    0xba, 0xf7, 0x41, 0xcf, 0x84, 0x37, 0x19, 0xe6,
+		    0x06, 0xf5, 0x0e, 0xcf, 0x36, 0xf5, 0x9e, 0x6c,
+		    0xde, 0xbc, 0xff, 0x64, 0x7e, 0x4e, 0x59, 0x57,
+		    0x48, 0xfe, 0x14, 0xf7, 0x9c, 0x93, 0x5d, 0x15,
+		    0xad, 0xcc, 0x11, 0xb1, 0x17, 0x18, 0xb2, 0x7e,
+		    0xcc, 0xab, 0xe9, 0xce, 0x7d, 0x77, 0x5b, 0x51,
+		    0x1b, 0x1e, 0x20, 0xa8, 0x32, 0x06, 0x0e, 0x75,
+		    0x93, 0xac, 0xdb, 0x35, 0x37, 0x1f, 0xe9, 0x19,
+		    0x1d, 0xb4, 0x71, 0x97, 0xd6, 0x4e, 0x2c, 0x08,
+		    0xa5, 0x13, 0xf9, 0x0e, 0x7e, 0x78, 0x6e, 0x14,
+		    0xe0, 0xa9, 0xb9, 0x96, 0x4c, 0x80, 0x82, 0xba,
+		    0x17, 0xb3, 0x9d, 0x69, 0xb0, 0x84, 0x46, 0xff,
+		    0xf9, 0x52, 0x79, 0x94, 0x58, 0x3a, 0x62, 0x90,
+		    0x15, 0x35, 0x71, 0x10, 0x37, 0xed, 0xa1, 0x8e,
+		    0x53, 0x6e, 0xf4, 0x26, 0x57, 0x93, 0x15, 0x93,
+		    0xf6, 0x81, 0x2c, 0x5a, 0x10, 0xda, 0x92, 0xad,
+		    0x2f, 0xdb, 0x28, 0x31, 0x2d, 0x55, 0x04, 0xd2,
+		    0x06, 0x28, 0x8c, 0x1e, 0xdc, 0xea, 0x54, 0xac,
+		    0xff, 0xb7, 0x6c, 0x30, 0x15, 0xd4, 0xb4, 0x0d,
+		    0x00, 0x93, 0x57, 0xdd, 0xd2, 0x07, 0x07, 0x06,
+		    0xd9, 0x43, 0x9b, 0xcd, 0x3a, 0xf4, 0x7d, 0x4c,
+		    0x36, 0x5d, 0x23, 0xa2, 0xcc, 0x57, 0x40, 0x91,
+		    0xe9, 0x2c, 0x2f, 0x2c, 0xd5, 0x30, 0x9b, 0x17,
+		    0xb0, 0xc9, 0xf7, 0xa7, 0x2f, 0xd1, 0x93, 0x20,
+		    0x6b, 0xc6, 0xc1, 0xe4, 0x6f, 0xcb, 0xd1, 0xe7,
+		    0x09, 0x0f, 0x9e, 0xdc, 0xaa, 0x9f, 0x2f, 0xdf,
+		    0x56, 0x9f, 0xd4, 0x33, 0x04, 0xaf, 0xd3, 0x6c,
+		    0x58, 0x61, 0xf0, 0x30, 0xec, 0xf2, 0x7f, 0xf2,
+		    0x9c, 0xdf, 0x39, 0xbb, 0x6f, 0xa2, 0x8c, 0x7e,
+		    0xc4, 0x22, 0x51, 0x71, 0xc0, 0x4d, 0x14, 0x1a,
+		    0xc4, 0xcd, 0x04, 0xd9, 0x87, 0x08, 0x50, 0x05,
+		    0xcc, 0xaf, 0xf6, 0xf0, 0x8f, 0x92, 0x54, 0x58,
+		    0xc2, 0xc7, 0x09, 0x7a, 0x59, 0x02, 0x05, 0xe8,
+		    0xb0, 0x86, 0xd9, 0xbf, 0x7b, 0x35, 0x51, 0x4d,
+		    0xaf, 0x08, 0x97, 0x2c, 0x65, 0xda, 0x2a, 0x71,
+		    0x3a, 0xa8, 0x51, 0xcc, 0xf2, 0x73, 0x27, 0xc3,
+		    0xfd, 0x62, 0xcf, 0xe3, 0xb2, 0xca, 0xcb, 0xbe,
+		    0x1a, 0x0a, 0xa1, 0x34, 0x7b, 0x77, 0xc4, 0x62,
+		    0x68, 0x78, 0x5f, 0x94, 0x07, 0x04, 0x65, 0x16,
+		    0x4b, 0x61, 0xcb, 0xff, 0x75, 0x26, 0x50, 0x66,
+		    0x1f, 0x6e, 0x93, 0xf8, 0xc5, 0x51, 0xeb, 0xa4,
+		    0x4a, 0x48, 0x68, 0x6b, 0xe2, 0x5e, 0x44, 0xb2,
+		    0x50, 0x2c, 0x6c, 0xae, 0x79, 0x4e, 0x66, 0x35,
+		    0x81, 0x50, 0xac, 0xbc, 0x3f, 0xb1, 0x0c, 0xf3,
+		    0x05, 0x3c, 0x4a, 0xa3, 0x6c, 0x2a, 0x79, 0xb4,
+		    0xb7, 0xab, 0xca, 0xc7, 0x9b, 0x8e, 0xcd, 0x5f,
+		    0x11, 0x03, 0xcb, 0x30, 0xa3, 0xab, 0xda, 0xfe,
+		    0x64, 0xb9, 0xbb, 0xd8, 0x5e, 0x3a, 0x1a, 0x56,
+		    0xe5, 0x05, 0x48, 0x90, 0x1e, 0x61, 0x69, 0x1b,
+		    0x22, 0xe6, 0x1a, 0x3c, 0x75, 0xad, 0x1f, 0x37,
+		    0x28, 0xdc, 0xe4, 0x6d, 0xbd, 0x42, 0xdc, 0xd3,
+		    0xc8, 0xb6, 0x1c, 0x48, 0xfe, 0x94, 0x77, 0x7f,
+		    0xbd, 0x62, 0xac, 0xa3, 0x47, 0x27, 0xcf, 0x5f,
+		    0xd9, 0xdb, 0xaf, 0xec, 0xf7, 0x5e, 0xc1, 0xb0,
+		    0x9d, 0x01, 0x26, 0x99, 0x7e, 0x8f, 0x03, 0x70,
+		    0xb5, 0x42, 0xbe, 0x67, 0x28, 0x1b, 0x7c, 0xbd,
+		    0x61, 0x21, 0x97, 0xcc, 0x5c, 0xe1, 0x97, 0x8f,
+		    0x8d, 0xde, 0x2b, 0xaa, 0xa7, 0x71, 0x1d, 0x1e,
+		    0x02, 0x73, 0x70, 0x58, 0x32, 0x5b, 0x1d, 0x67,
+		    0x3d, 0xe0, 0x74, 0x4f, 0x03, 0xf2, 0x70, 0x51,
+		    0x79, 0xf1, 0x61, 0x70, 0x15, 0x74, 0x9d, 0x23,
+		    0x89, 0xde, 0xac, 0xfd, 0xde, 0xd0, 0x1f, 0xc3,
+		    0x87, 0x44, 0x35, 0x4b, 0xe5, 0xb0, 0x60, 0xc5,
+		    0x22, 0xe4, 0x9e, 0xca, 0xeb, 0xd5, 0x3a, 0x09,
+		    0x45, 0xa4, 0xdb, 0xfa, 0x3f, 0xeb, 0x1b, 0xc7,
+		    0xc8, 0x14, 0x99, 0x51, 0x92, 0x10, 0xed, 0xed,
+		    0x28, 0xe0, 0xa1, 0xf8, 0x26, 0xcf, 0xcd, 0xcb,
+		    0x63, 0xa1, 0x3b, 0xe3, 0xdf, 0x7e, 0xfe, 0xa6,
+		    0xf0, 0x81, 0x9a, 0xbf, 0x55, 0xde, 0x54, 0xd5,
+		    0x56, 0x60, 0x98, 0x10, 0x68, 0xf4, 0x38, 0x96,
+		    0x8e, 0x6f, 0x1d, 0x44, 0x7f, 0xd6, 0x2f, 0xfe,
+		    0x55, 0xfb, 0x0c, 0x7e, 0x67, 0xe2, 0x61, 0x44,
+		    0xed, 0xf2, 0x35, 0x30, 0x5d, 0xe9, 0xc7, 0xd6,
+		    0x6d, 0xe0, 0xa0, 0xed, 0xf3, 0xfc, 0xd8, 0x3e,
+		    0x0a, 0x7b, 0xcd, 0xaf, 0x65, 0x68, 0x18, 0xc0,
+		    0xec, 0x04, 0x1c, 0x74, 0x6d, 0xe2, 0x6e, 0x79,
+		    0xd4, 0x11, 0x2b, 0x62, 0xd5, 0x27, 0xad, 0x4f,
+		    0x01, 0x59, 0x73, 0xcc, 0x6a, 0x53, 0xfb, 0x2d,
+		    0xd5, 0x4e, 0x99, 0x21, 0x65, 0x4d, 0xf5, 0x82,
+		    0xf7, 0xd8, 0x42, 0xce, 0x6f, 0x3d, 0x36, 0x47,
+		    0xf1, 0x05, 0x16, 0xe8, 0x1b, 0x6a, 0x8f, 0x93,
+		    0xf2, 0x8f, 0x37, 0x40, 0x12, 0x28, 0xa3, 0xe6,
+		    0xb9, 0x17, 0x4a, 0x1f, 0xb1, 0xd1, 0x66, 0x69,
+		    0x86, 0xc4, 0xfc, 0x97, 0xae, 0x3f, 0x8f, 0x1e,
+		    0x2b, 0xdf, 0xcd, 0xf9, 0x3c },
+	.ilen	= 1949,
+	.result	= { 0x7a, 0x57, 0xf2, 0xc7, 0x06, 0x3f, 0x50, 0x7b,
+		    0x36, 0x1a, 0x66, 0x5c, 0xb9, 0x0e, 0x5e, 0x3b,
+		    0x45, 0x60, 0xbe, 0x9a, 0x31, 0x9f, 0xff, 0x5d,
+		    0x66, 0x34, 0xb4, 0xdc, 0xfb, 0x9d, 0x8e, 0xee,
+		    0x6a, 0x33, 0xa4, 0x07, 0x3c, 0xf9, 0x4c, 0x30,
+		    0xa1, 0x24, 0x52, 0xf9, 0x50, 0x46, 0x88, 0x20,
+		    0x02, 0x32, 0x3a, 0x0e, 0x99, 0x63, 0xaf, 0x1f,
+		    0x15, 0x28, 0x2a, 0x05, 0xff, 0x57, 0x59, 0x5e,
+		    0x18, 0xa1, 0x1f, 0xd0, 0x92, 0x5c, 0x88, 0x66,
+		    0x1b, 0x00, 0x64, 0xa5, 0x93, 0x8d, 0x06, 0x46,
+		    0xb0, 0x64, 0x8b, 0x8b, 0xef, 0x99, 0x05, 0x35,
+		    0x85, 0xb3, 0xf3, 0x33, 0xbb, 0xec, 0x66, 0xb6,
+		    0x3d, 0x57, 0x42, 0xe3, 0xb4, 0xc6, 0xaa, 0xb0,
+		    0x41, 0x2a, 0xb9, 0x59, 0xa9, 0xf6, 0x3e, 0x15,
+		    0x26, 0x12, 0x03, 0x21, 0x4c, 0x74, 0x43, 0x13,
+		    0x2a, 0x03, 0x27, 0x09, 0xb4, 0xfb, 0xe7, 0xb7,
+		    0x40, 0xff, 0x5e, 0xce, 0x48, 0x9a, 0x60, 0xe3,
+		    0x8b, 0x80, 0x8c, 0x38, 0x2d, 0xcb, 0x93, 0x37,
+		    0x74, 0x05, 0x52, 0x6f, 0x73, 0x3e, 0xc3, 0xbc,
+		    0xca, 0x72, 0x0a, 0xeb, 0xf1, 0x3b, 0xa0, 0x95,
+		    0xdc, 0x8a, 0xc4, 0xa9, 0xdc, 0xca, 0x44, 0xd8,
+		    0x08, 0x63, 0x6a, 0x36, 0xd3, 0x3c, 0xb8, 0xac,
+		    0x46, 0x7d, 0xfd, 0xaa, 0xeb, 0x3e, 0x0f, 0x45,
+		    0x8f, 0x49, 0xda, 0x2b, 0xf2, 0x12, 0xbd, 0xaf,
+		    0x67, 0x8a, 0x63, 0x48, 0x4b, 0x55, 0x5f, 0x6d,
+		    0x8c, 0xb9, 0x76, 0x34, 0x84, 0xae, 0xc2, 0xfc,
+		    0x52, 0x64, 0x82, 0xf7, 0xb0, 0x06, 0xf0, 0x45,
+		    0x73, 0x12, 0x50, 0x30, 0x72, 0xea, 0x78, 0x9a,
+		    0xa8, 0xaf, 0xb5, 0xe3, 0xbb, 0x77, 0x52, 0xec,
+		    0x59, 0x84, 0xbf, 0x6b, 0x8f, 0xce, 0x86, 0x5e,
+		    0x1f, 0x23, 0xe9, 0xfb, 0x08, 0x86, 0xf7, 0x10,
+		    0xb9, 0xf2, 0x44, 0x96, 0x44, 0x63, 0xa9, 0xa8,
+		    0x78, 0x00, 0x23, 0xd6, 0xc7, 0xe7, 0x6e, 0x66,
+		    0x4f, 0xcc, 0xee, 0x15, 0xb3, 0xbd, 0x1d, 0xa0,
+		    0xe5, 0x9c, 0x1b, 0x24, 0x2c, 0x4d, 0x3c, 0x62,
+		    0x35, 0x9c, 0x88, 0x59, 0x09, 0xdd, 0x82, 0x1b,
+		    0xcf, 0x0a, 0x83, 0x6b, 0x3f, 0xae, 0x03, 0xc4,
+		    0xb4, 0xdd, 0x7e, 0x5b, 0x28, 0x76, 0x25, 0x96,
+		    0xd9, 0xc9, 0x9d, 0x5f, 0x86, 0xfa, 0xf6, 0xd7,
+		    0xd2, 0xe6, 0x76, 0x1d, 0x0f, 0xa1, 0xdc, 0x74,
+		    0x05, 0x1b, 0x1d, 0xe0, 0xcd, 0x16, 0xb0, 0xa8,
+		    0x8a, 0x34, 0x7b, 0x15, 0x11, 0x77, 0xe5, 0x7b,
+		    0x7e, 0x20, 0xf7, 0xda, 0x38, 0xda, 0xce, 0x70,
+		    0xe9, 0xf5, 0x6c, 0xd9, 0xbe, 0x0c, 0x4c, 0x95,
+		    0x4c, 0xc2, 0x9b, 0x34, 0x55, 0x55, 0xe1, 0xf3,
+		    0x46, 0x8e, 0x48, 0x74, 0x14, 0x4f, 0x9d, 0xc9,
+		    0xf5, 0xe8, 0x1a, 0xf0, 0x11, 0x4a, 0xc1, 0x8d,
+		    0xe0, 0x93, 0xa0, 0xbe, 0x09, 0x1c, 0x2b, 0x4e,
+		    0x0f, 0xb2, 0x87, 0x8b, 0x84, 0xfe, 0x92, 0x32,
+		    0x14, 0xd7, 0x93, 0xdf, 0xe7, 0x44, 0xbc, 0xc5,
+		    0xae, 0x53, 0x69, 0xd8, 0xb3, 0x79, 0x37, 0x80,
+		    0xe3, 0x17, 0x5c, 0xec, 0x53, 0x00, 0x9a, 0xe3,
+		    0x8e, 0xdc, 0x38, 0xb8, 0x66, 0xf0, 0xd3, 0xad,
+		    0x1d, 0x02, 0x96, 0x86, 0x3e, 0x9d, 0x3b, 0x5d,
+		    0xa5, 0x7f, 0x21, 0x10, 0xf1, 0x1f, 0x13, 0x20,
+		    0xf9, 0x57, 0x87, 0x20, 0xf5, 0x5f, 0xf1, 0x17,
+		    0x48, 0x0a, 0x51, 0x5a, 0xcd, 0x19, 0x03, 0xa6,
+		    0x5a, 0xd1, 0x12, 0x97, 0xe9, 0x48, 0xe2, 0x1d,
+		    0x83, 0x75, 0x50, 0xd9, 0x75, 0x7d, 0x6a, 0x82,
+		    0xa1, 0xf9, 0x4e, 0x54, 0x87, 0x89, 0xc9, 0x0c,
+		    0xb7, 0x5b, 0x6a, 0x91, 0xc1, 0x9c, 0xb2, 0xa9,
+		    0xdc, 0x9a, 0xa4, 0x49, 0x0a, 0x6d, 0x0d, 0xbb,
+		    0xde, 0x86, 0x44, 0xdd, 0x5d, 0x89, 0x2b, 0x96,
+		    0x0f, 0x23, 0x95, 0xad, 0xcc, 0xa2, 0xb3, 0xb9,
+		    0x7e, 0x74, 0x38, 0xba, 0x9f, 0x73, 0xae, 0x5f,
+		    0xf8, 0x68, 0xa2, 0xe0, 0xa9, 0xce, 0xbd, 0x40,
+		    0xd4, 0x4c, 0x6b, 0xd2, 0x56, 0x62, 0xb0, 0xcc,
+		    0x63, 0x7e, 0x5b, 0xd3, 0xae, 0xd1, 0x75, 0xce,
+		    0xbb, 0xb4, 0x5b, 0xa8, 0xf8, 0xb4, 0xac, 0x71,
+		    0x75, 0xaa, 0xc9, 0x9f, 0xbb, 0x6c, 0xad, 0x0f,
+		    0x55, 0x5d, 0xe8, 0x85, 0x7d, 0xf9, 0x21, 0x35,
+		    0xea, 0x92, 0x85, 0x2b, 0x00, 0xec, 0x84, 0x90,
+		    0x0a, 0x63, 0x96, 0xe4, 0x6b, 0xa9, 0x77, 0xb8,
+		    0x91, 0xf8, 0x46, 0x15, 0x72, 0x63, 0x70, 0x01,
+		    0x40, 0xa3, 0xa5, 0x76, 0x62, 0x2b, 0xbf, 0xf1,
+		    0xe5, 0x8d, 0x9f, 0xa3, 0xfa, 0x9b, 0x03, 0xbe,
+		    0xfe, 0x65, 0x6f, 0xa2, 0x29, 0x0d, 0x54, 0xb4,
+		    0x71, 0xce, 0xa9, 0xd6, 0x3d, 0x88, 0xf9, 0xaf,
+		    0x6b, 0xa8, 0x9e, 0xf4, 0x16, 0x96, 0x36, 0xb9,
+		    0x00, 0xdc, 0x10, 0xab, 0xb5, 0x08, 0x31, 0x1f,
+		    0x00, 0xb1, 0x3c, 0xd9, 0x38, 0x3e, 0xc6, 0x04,
+		    0xa7, 0x4e, 0xe8, 0xae, 0xed, 0x98, 0xc2, 0xf7,
+		    0xb9, 0x00, 0x5f, 0x8c, 0x60, 0xd1, 0xe5, 0x15,
+		    0xf7, 0xae, 0x1e, 0x84, 0x88, 0xd1, 0xf6, 0xbc,
+		    0x3a, 0x89, 0x35, 0x22, 0x83, 0x7c, 0xca, 0xf0,
+		    0x33, 0x82, 0x4c, 0x79, 0x3c, 0xfd, 0xb1, 0xae,
+		    0x52, 0x62, 0x55, 0xd2, 0x41, 0x60, 0xc6, 0xbb,
+		    0xfa, 0x0e, 0x59, 0xd6, 0xa8, 0xfe, 0x5d, 0xed,
+		    0x47, 0x3d, 0xe0, 0xea, 0x1f, 0x6e, 0x43, 0x51,
+		    0xec, 0x10, 0x52, 0x56, 0x77, 0x42, 0x6b, 0x52,
+		    0x87, 0xd8, 0xec, 0xe0, 0xaa, 0x76, 0xa5, 0x84,
+		    0x2a, 0x22, 0x24, 0xfd, 0x92, 0x40, 0x88, 0xd5,
+		    0x85, 0x1c, 0x1f, 0x6b, 0x47, 0xa0, 0xc4, 0xe4,
+		    0xef, 0xf4, 0xea, 0xd7, 0x59, 0xac, 0x2a, 0x9e,
+		    0x8c, 0xfa, 0x1f, 0x42, 0x08, 0xfe, 0x4f, 0x74,
+		    0xa0, 0x26, 0xf5, 0xb3, 0x84, 0xf6, 0x58, 0x5f,
+		    0x26, 0x66, 0x3e, 0xd7, 0xe4, 0x22, 0x91, 0x13,
+		    0xc8, 0xac, 0x25, 0x96, 0x23, 0xd8, 0x09, 0xea,
+		    0x45, 0x75, 0x23, 0xb8, 0x5f, 0xc2, 0x90, 0x8b,
+		    0x09, 0xc4, 0xfc, 0x47, 0x6c, 0x6d, 0x0a, 0xef,
+		    0x69, 0xa4, 0x38, 0x19, 0xcf, 0x7d, 0xf9, 0x09,
+		    0x73, 0x9b, 0x60, 0x5a, 0xf7, 0x37, 0xb5, 0xfe,
+		    0x9f, 0xe3, 0x2b, 0x4c, 0x0d, 0x6e, 0x19, 0xf1,
+		    0xd6, 0xc0, 0x70, 0xf3, 0x9d, 0x22, 0x3c, 0xf9,
+		    0x49, 0xce, 0x30, 0x8e, 0x44, 0xb5, 0x76, 0x15,
+		    0x8f, 0x52, 0xfd, 0xa5, 0x04, 0xb8, 0x55, 0x6a,
+		    0x36, 0x59, 0x7c, 0xc4, 0x48, 0xb8, 0xd7, 0xab,
+		    0x05, 0x66, 0xe9, 0x5e, 0x21, 0x6f, 0x6b, 0x36,
+		    0x29, 0xbb, 0xe9, 0xe3, 0xa2, 0x9a, 0xa8, 0xcd,
+		    0x55, 0x25, 0x11, 0xba, 0x5a, 0x58, 0xa0, 0xde,
+		    0xae, 0x19, 0x2a, 0x48, 0x5a, 0xff, 0x36, 0xcd,
+		    0x6d, 0x16, 0x7a, 0x73, 0x38, 0x46, 0xe5, 0x47,
+		    0x59, 0xc8, 0xa2, 0xf6, 0xe2, 0x6c, 0x83, 0xc5,
+		    0x36, 0x2c, 0x83, 0x7d, 0xb4, 0x01, 0x05, 0x69,
+		    0xe7, 0xaf, 0x5c, 0xc4, 0x64, 0x82, 0x12, 0x21,
+		    0xef, 0xf7, 0xd1, 0x7d, 0xb8, 0x8d, 0x8c, 0x98,
+		    0x7c, 0x5f, 0x7d, 0x92, 0x88, 0xb9, 0x94, 0x07,
+		    0x9c, 0xd8, 0xe9, 0x9c, 0x17, 0x38, 0xe3, 0x57,
+		    0x6c, 0xe0, 0xdc, 0xa5, 0x92, 0x42, 0xb3, 0xbd,
+		    0x50, 0xa2, 0x7e, 0xb5, 0xb1, 0x52, 0x72, 0x03,
+		    0x97, 0xd8, 0xaa, 0x9a, 0x1e, 0x75, 0x41, 0x11,
+		    0xa3, 0x4f, 0xcc, 0xd4, 0xe3, 0x73, 0xad, 0x96,
+		    0xdc, 0x47, 0x41, 0x9f, 0xb0, 0xbe, 0x79, 0x91,
+		    0xf5, 0xb6, 0x18, 0xfe, 0xc2, 0x83, 0x18, 0x7d,
+		    0x73, 0xd9, 0x4f, 0x83, 0x84, 0x03, 0xb3, 0xf0,
+		    0x77, 0x66, 0x3d, 0x83, 0x63, 0x2e, 0x2c, 0xf9,
+		    0xdd, 0xa6, 0x1f, 0x89, 0x82, 0xb8, 0x23, 0x42,
+		    0xeb, 0xe2, 0xca, 0x70, 0x82, 0x61, 0x41, 0x0a,
+		    0x6d, 0x5f, 0x75, 0xc5, 0xe2, 0xc4, 0x91, 0x18,
+		    0x44, 0x22, 0xfa, 0x34, 0x10, 0xf5, 0x20, 0xdc,
+		    0xb7, 0xdd, 0x2a, 0x20, 0x77, 0xf5, 0xf9, 0xce,
+		    0xdb, 0xa0, 0x0a, 0x52, 0x2a, 0x4e, 0xdd, 0xcc,
+		    0x97, 0xdf, 0x05, 0xe4, 0x5e, 0xb7, 0xaa, 0xf0,
+		    0xe2, 0x80, 0xff, 0xba, 0x1a, 0x0f, 0xac, 0xdf,
+		    0x02, 0x32, 0xe6, 0xf7, 0xc7, 0x17, 0x13, 0xb7,
+		    0xfc, 0x98, 0x48, 0x8c, 0x0d, 0x82, 0xc9, 0x80,
+		    0x7a, 0xe2, 0x0a, 0xc5, 0xb4, 0xde, 0x7c, 0x3c,
+		    0x79, 0x81, 0x0e, 0x28, 0x65, 0x79, 0x67, 0x82,
+		    0x69, 0x44, 0x66, 0x09, 0xf7, 0x16, 0x1a, 0xf9,
+		    0x7d, 0x80, 0xa1, 0x79, 0x14, 0xa9, 0xc8, 0x20,
+		    0xfb, 0xa2, 0x46, 0xbe, 0x08, 0x35, 0x17, 0x58,
+		    0xc1, 0x1a, 0xda, 0x2a, 0x6b, 0x2e, 0x1e, 0xe6,
+		    0x27, 0x55, 0x7b, 0x19, 0xe2, 0xfb, 0x64, 0xfc,
+		    0x5e, 0x15, 0x54, 0x3c, 0xe7, 0xc2, 0x11, 0x50,
+		    0x30, 0xb8, 0x72, 0x03, 0x0b, 0x1a, 0x9f, 0x86,
+		    0x27, 0x11, 0x5c, 0x06, 0x2b, 0xbd, 0x75, 0x1a,
+		    0x0a, 0xda, 0x01, 0xfa, 0x5c, 0x4a, 0xc1, 0x80,
+		    0x3a, 0x6e, 0x30, 0xc8, 0x2c, 0xeb, 0x56, 0xec,
+		    0x89, 0xfa, 0x35, 0x7b, 0xb2, 0xf0, 0x97, 0x08,
+		    0x86, 0x53, 0xbe, 0xbd, 0x40, 0x41, 0x38, 0x1c,
+		    0xb4, 0x8b, 0x79, 0x2e, 0x18, 0x96, 0x94, 0xde,
+		    0xe8, 0xca, 0xe5, 0x9f, 0x92, 0x9f, 0x15, 0x5d,
+		    0x56, 0x60, 0x5c, 0x09, 0xf9, 0x16, 0xf4, 0x17,
+		    0x0f, 0xf6, 0x4c, 0xda, 0xe6, 0x67, 0x89, 0x9f,
+		    0xca, 0x6c, 0xe7, 0x9b, 0x04, 0x62, 0x0e, 0x26,
+		    0xa6, 0x52, 0xbd, 0x29, 0xff, 0xc7, 0xa4, 0x96,
+		    0xe6, 0x6a, 0x02, 0xa5, 0x2e, 0x7b, 0xfe, 0x97,
+		    0x68, 0x3e, 0x2e, 0x5f, 0x3b, 0x0f, 0x36, 0xd6,
+		    0x98, 0x19, 0x59, 0x48, 0xd2, 0xc6, 0xe1, 0x55,
+		    0x1a, 0x6e, 0xd6, 0xed, 0x2c, 0xba, 0xc3, 0x9e,
+		    0x64, 0xc9, 0x95, 0x86, 0x35, 0x5e, 0x3e, 0x88,
+		    0x69, 0x99, 0x4b, 0xee, 0xbe, 0x9a, 0x99, 0xb5,
+		    0x6e, 0x58, 0xae, 0xdd, 0x22, 0xdb, 0xdd, 0x6b,
+		    0xfc, 0xaf, 0x90, 0xa3, 0x3d, 0xa4, 0xc1, 0x15,
+		    0x92, 0x18, 0x8d, 0xd2, 0x4b, 0x7b, 0x06, 0xd1,
+		    0x37, 0xb5, 0xe2, 0x7c, 0x2c, 0xf0, 0x25, 0xe4,
+		    0x94, 0x2a, 0xbd, 0xe3, 0x82, 0x70, 0x78, 0xa3,
+		    0x82, 0x10, 0x5a, 0x90, 0xd7, 0xa4, 0xfa, 0xaf,
+		    0x1a, 0x88, 0x59, 0xdc, 0x74, 0x12, 0xb4, 0x8e,
+		    0xd7, 0x19, 0x46, 0xf4, 0x84, 0x69, 0x9f, 0xbb,
+		    0x70, 0xa8, 0x4c, 0x52, 0x81, 0xa9, 0xff, 0x76,
+		    0x1c, 0xae, 0xd8, 0x11, 0x3d, 0x7f, 0x7d, 0xc5,
+		    0x12, 0x59, 0x28, 0x18, 0xc2, 0xa2, 0xb7, 0x1c,
+		    0x88, 0xf8, 0xd6, 0x1b, 0xa6, 0x7d, 0x9e, 0xde,
+		    0x29, 0xf8, 0xed, 0xff, 0xeb, 0x92, 0x24, 0x4f,
+		    0x05, 0xaa, 0xd9, 0x49, 0xba, 0x87, 0x59, 0x51,
+		    0xc9, 0x20, 0x5c, 0x9b, 0x74, 0xcf, 0x03, 0xd9,
+		    0x2d, 0x34, 0xc7, 0x5b, 0xa5, 0x40, 0xb2, 0x99,
+		    0xf5, 0xcb, 0xb4, 0xf6, 0xb7, 0x72, 0x4a, 0xd6,
+		    0xbd, 0xb0, 0xf3, 0x93, 0xe0, 0x1b, 0xa8, 0x04,
+		    0x1e, 0x35, 0xd4, 0x80, 0x20, 0xf4, 0x9c, 0x31,
+		    0x6b, 0x45, 0xb9, 0x15, 0xb0, 0x5e, 0xdd, 0x0a,
+		    0x33, 0x9c, 0x83, 0xcd, 0x58, 0x89, 0x50, 0x56,
+		    0xbb, 0x81, 0x00, 0x91, 0x32, 0xf3, 0x1b, 0x3e,
+		    0xcf, 0x45, 0xe1, 0xf9, 0xe1, 0x2c, 0x26, 0x78,
+		    0x93, 0x9a, 0x60, 0x46, 0xc9, 0xb5, 0x5e, 0x6a,
+		    0x28, 0x92, 0x87, 0x3f, 0x63, 0x7b, 0xdb, 0xf7,
+		    0xd0, 0x13, 0x9d, 0x32, 0x40, 0x5e, 0xcf, 0xfb,
+		    0x79, 0x68, 0x47, 0x4c, 0xfd, 0x01, 0x17, 0xe6,
+		    0x97, 0x93, 0x78, 0xbb, 0xa6, 0x27, 0xa3, 0xe8,
+		    0x1a, 0xe8, 0x94, 0x55, 0x7d, 0x08, 0xe5, 0xdc,
+		    0x66, 0xa3, 0x69, 0xc8, 0xca, 0xc5, 0xa1, 0x84,
+		    0x55, 0xde, 0x08, 0x91, 0x16, 0x3a, 0x0c, 0x86,
+		    0xab, 0x27, 0x2b, 0x64, 0x34, 0x02, 0x6c, 0x76,
+		    0x8b, 0xc6, 0xaf, 0xcc, 0xe1, 0xd6, 0x8c, 0x2a,
+		    0x18, 0x3d, 0xa6, 0x1b, 0x37, 0x75, 0x45, 0x73,
+		    0xc2, 0x75, 0xd7, 0x53, 0x78, 0x3a, 0xd6, 0xe8,
+		    0x29, 0xd2, 0x4a, 0xa8, 0x1e, 0x82, 0xf6, 0xb6,
+		    0x81, 0xde, 0x21, 0xed, 0x2b, 0x56, 0xbb, 0xf2,
+		    0xd0, 0x57, 0xc1, 0x7c, 0xd2, 0x6a, 0xd2, 0x56,
+		    0xf5, 0x13, 0x5f, 0x1c, 0x6a, 0x0b, 0x74, 0xfb,
+		    0xe9, 0xfe, 0x9e, 0xea, 0x95, 0xb2, 0x46, 0xab,
+		    0x0a, 0xfc, 0xfd, 0xf3, 0xbb, 0x04, 0x2b, 0x76,
+		    0x1b, 0xa4, 0x74, 0xb0, 0xc1, 0x78, 0xc3, 0x69,
+		    0xe2, 0xb0, 0x01, 0xe1, 0xde, 0x32, 0x4c, 0x8d,
+		    0x1a, 0xb3, 0x38, 0x08, 0xd5, 0xfc, 0x1f, 0xdc,
+		    0x0e, 0x2c, 0x9c, 0xb1, 0xa1, 0x63, 0x17, 0x22,
+		    0xf5, 0x6c, 0x93, 0x70, 0x74, 0x00, 0xf8, 0x39,
+		    0x01, 0x94, 0xd1, 0x32, 0x23, 0x56, 0x5d, 0xa6,
+		    0x02, 0x76, 0x76, 0x93, 0xce, 0x2f, 0x19, 0xe9,
+		    0x17, 0x52, 0xae, 0x6e, 0x2c, 0x6d, 0x61, 0x7f,
+		    0x3b, 0xaa, 0xe0, 0x52, 0x85, 0xc5, 0x65, 0xc1,
+		    0xbb, 0x8e, 0x5b, 0x21, 0xd5, 0xc9, 0x78, 0x83,
+		    0x07, 0x97, 0x4c, 0x62, 0x61, 0x41, 0xd4, 0xfc,
+		    0xc9, 0x39, 0xe3, 0x9b, 0xd0, 0xcc, 0x75, 0xc4,
+		    0x97, 0xe6, 0xdd, 0x2a, 0x5f, 0xa6, 0xe8, 0x59,
+		    0x6c, 0x98, 0xb9, 0x02, 0xe2, 0xa2, 0xd6, 0x68,
+		    0xee, 0x3b, 0x1d, 0xe3, 0x4d, 0x5b, 0x30, 0xef,
+		    0x03, 0xf2, 0xeb, 0x18, 0x57, 0x36, 0xe8, 0xa1,
+		    0xf4, 0x47, 0xfb, 0xcb, 0x8f, 0xcb, 0xc8, 0xf3,
+		    0x4f, 0x74, 0x9d, 0x9d, 0xb1, 0x8d, 0x14, 0x44,
+		    0xd9, 0x19, 0xb4, 0x54, 0x4f, 0x75, 0x19, 0x09,
+		    0xa0, 0x75, 0xbc, 0x3b, 0x82, 0xc6, 0x3f, 0xb8,
+		    0x83, 0x19, 0x6e, 0xd6, 0x37, 0xfe, 0x6e, 0x8a,
+		    0x4e, 0xe0, 0x4a, 0xab, 0x7b, 0xc8, 0xb4, 0x1d,
+		    0xf4, 0xed, 0x27, 0x03, 0x65, 0xa2, 0xa1, 0xae,
+		    0x11, 0xe7, 0x98, 0x78, 0x48, 0x91, 0xd2, 0xd2,
+		    0xd4, 0x23, 0x78, 0x50, 0xb1, 0x5b, 0x85, 0x10,
+		    0x8d, 0xca, 0x5f, 0x0f, 0x71, 0xae, 0x72, 0x9a,
+		    0xf6, 0x25, 0x19, 0x60, 0x06, 0xf7, 0x10, 0x34,
+		    0x18, 0x0d, 0xc9, 0x9f, 0x7b, 0x0c, 0x9b, 0x8f,
+		    0x91, 0x1b, 0x9f, 0xcd, 0x10, 0xee, 0x75, 0xf9,
+		    0x97, 0x66, 0xfc, 0x4d, 0x33, 0x6e, 0x28, 0x2b,
+		    0x92, 0x85, 0x4f, 0xab, 0x43, 0x8d, 0x8f, 0x7d,
+		    0x86, 0xa7, 0xc7, 0xd8, 0xd3, 0x0b, 0x8b, 0x57,
+		    0xb6, 0x1d, 0x95, 0x0d, 0xe9, 0xbc, 0xd9, 0x03,
+		    0xd9, 0x10, 0x19, 0xc3, 0x46, 0x63, 0x55, 0x87,
+		    0x61, 0x79, 0x6c, 0x95, 0x0e, 0x9c, 0xdd, 0xca,
+		    0xc3, 0xf3, 0x64, 0xf0, 0x7d, 0x76, 0xb7, 0x53,
+		    0x67, 0x2b, 0x1e, 0x44, 0x56, 0x81, 0xea, 0x8f,
+		    0x5c, 0x42, 0x16, 0xb8, 0x28, 0xeb, 0x1b, 0x61,
+		    0x10, 0x1e, 0xbf, 0xec, 0xa8 }
+}, {
+	.key	= { 0xb3, 0x35, 0x50, 0x03, 0x54, 0x2e, 0x40, 0x5e,
+		    0x8f, 0x59, 0x8e, 0xc5, 0x90, 0xd5, 0x27, 0x2d,
+		    0xba, 0x29, 0x2e, 0xcb, 0x1b, 0x70, 0x44, 0x1e,
+		    0x65, 0x91, 0x6e, 0x2a, 0x79, 0x22, 0xda, 0x64 },
+	.nonce	= { 0x05, 0xa3, 0x93, 0xed, 0x30, 0xc5, 0xa2, 0x06 },
+	.nlen	= 8,
+	.assoc	= { 0xb1, 0x69, 0x83, 0x87, 0x30, 0xaa, 0x5d, 0xb8,
+		    0x77, 0xe8, 0x21, 0xff, 0x06, 0x59, 0x35, 0xce,
+		    0x75, 0xfe, 0x38, 0xef, 0xb8, 0x91, 0x43, 0x8c,
+		    0xcf, 0x70, 0xdd, 0x0a, 0x68, 0xbf, 0xd4, 0xbc,
+		    0x16, 0x76, 0x99, 0x36, 0x1e, 0x58, 0x79, 0x5e,
+		    0xd4, 0x29, 0xf7, 0x33, 0x93, 0x48, 0xdb, 0x5f,
+		    0x01, 0xae, 0x9c, 0xb6, 0xe4, 0x88, 0x6d, 0x2b,
+		    0x76, 0x75, 0xe0, 0xf3, 0x74, 0xe2, 0xc9 },
+	.alen	= 63,
+	.input	= { 0x52, 0x34, 0xb3, 0x65, 0x3b, 0xb7, 0xe5, 0xd3,
+		    0xab, 0x49, 0x17, 0x60, 0xd2, 0x52, 0x56, 0xdf,
+		    0xdf, 0x34, 0x56, 0x82, 0xe2, 0xbe, 0xe5, 0xe1,
+		    0x28, 0xd1, 0x4e, 0x5f, 0x4f, 0x01, 0x7d, 0x3f,
+		    0x99, 0x6b, 0x30, 0x6e, 0x1a, 0x7c, 0x4c, 0x8e,
+		    0x62, 0x81, 0xae, 0x86, 0x3f, 0x6b, 0xd0, 0xb5,
+		    0xa9, 0xcf, 0x50, 0xf1, 0x02, 0x12, 0xa0, 0x0b,
+		    0x24, 0xe9, 0xe6, 0x72, 0x89, 0x2c, 0x52, 0x1b,
+		    0x34, 0x38, 0xf8, 0x75, 0x5f, 0xa0, 0x74, 0xe2,
+		    0x99, 0xdd, 0xa6, 0x4b, 0x14, 0x50, 0x4e, 0xf1,
+		    0xbe, 0xd6, 0x9e, 0xdb, 0xb2, 0x24, 0x27, 0x74,
+		    0x12, 0x4a, 0x78, 0x78, 0x17, 0xa5, 0x58, 0x8e,
+		    0x2f, 0xf9, 0xf4, 0x8d, 0xee, 0x03, 0x88, 0xae,
+		    0xb8, 0x29, 0xa1, 0x2f, 0x4b, 0xee, 0x92, 0xbd,
+		    0x87, 0xb3, 0xce, 0x34, 0x21, 0x57, 0x46, 0x04,
+		    0x49, 0x0c, 0x80, 0xf2, 0x01, 0x13, 0xa1, 0x55,
+		    0xb3, 0xff, 0x44, 0x30, 0x3c, 0x1c, 0xd0, 0xef,
+		    0xbc, 0x18, 0x74, 0x26, 0xad, 0x41, 0x5b, 0x5b,
+		    0x3e, 0x9a, 0x7a, 0x46, 0x4f, 0x16, 0xd6, 0x74,
+		    0x5a, 0xb7, 0x3a, 0x28, 0x31, 0xd8, 0xae, 0x26,
+		    0xac, 0x50, 0x53, 0x86, 0xf2, 0x56, 0xd7, 0x3f,
+		    0x29, 0xbc, 0x45, 0x68, 0x8e, 0xcb, 0x98, 0x64,
+		    0xdd, 0xc9, 0xba, 0xb8, 0x4b, 0x7b, 0x82, 0xdd,
+		    0x14, 0xa7, 0xcb, 0x71, 0x72, 0x00, 0x5c, 0xad,
+		    0x7b, 0x6a, 0x89, 0xa4, 0x3d, 0xbf, 0xb5, 0x4b,
+		    0x3e, 0x7c, 0x5a, 0xcf, 0xb8, 0xa1, 0xc5, 0x6e,
+		    0xc8, 0xb6, 0x31, 0x57, 0x7b, 0xdf, 0xa5, 0x7e,
+		    0xb1, 0xd6, 0x42, 0x2a, 0x31, 0x36, 0xd1, 0xd0,
+		    0x3f, 0x7a, 0xe5, 0x94, 0xd6, 0x36, 0xa0, 0x6f,
+		    0xb7, 0x40, 0x7d, 0x37, 0xc6, 0x55, 0x7c, 0x50,
+		    0x40, 0x6d, 0x29, 0x89, 0xe3, 0x5a, 0xae, 0x97,
+		    0xe7, 0x44, 0x49, 0x6e, 0xbd, 0x81, 0x3d, 0x03,
+		    0x93, 0x06, 0x12, 0x06, 0xe2, 0x41, 0x12, 0x4a,
+		    0xf1, 0x6a, 0xa4, 0x58, 0xa2, 0xfb, 0xd2, 0x15,
+		    0xba, 0xc9, 0x79, 0xc9, 0xce, 0x5e, 0x13, 0xbb,
+		    0xf1, 0x09, 0x04, 0xcc, 0xfd, 0xe8, 0x51, 0x34,
+		    0x6a, 0xe8, 0x61, 0x88, 0xda, 0xed, 0x01, 0x47,
+		    0x84, 0xf5, 0x73, 0x25, 0xf9, 0x1c, 0x42, 0x86,
+		    0x07, 0xf3, 0x5b, 0x1a, 0x01, 0xb3, 0xeb, 0x24,
+		    0x32, 0x8d, 0xf6, 0xed, 0x7c, 0x4b, 0xeb, 0x3c,
+		    0x36, 0x42, 0x28, 0xdf, 0xdf, 0xb6, 0xbe, 0xd9,
+		    0x8c, 0x52, 0xd3, 0x2b, 0x08, 0x90, 0x8c, 0xe7,
+		    0x98, 0x31, 0xe2, 0x32, 0x8e, 0xfc, 0x11, 0x48,
+		    0x00, 0xa8, 0x6a, 0x42, 0x4a, 0x02, 0xc6, 0x4b,
+		    0x09, 0xf1, 0xe3, 0x49, 0xf3, 0x45, 0x1f, 0x0e,
+		    0xbc, 0x56, 0xe2, 0xe4, 0xdf, 0xfb, 0xeb, 0x61,
+		    0xfa, 0x24, 0xc1, 0x63, 0x75, 0xbb, 0x47, 0x75,
+		    0xaf, 0xe1, 0x53, 0x16, 0x96, 0x21, 0x85, 0x26,
+		    0x11, 0xb3, 0x76, 0xe3, 0x23, 0xa1, 0x6b, 0x74,
+		    0x37, 0xd0, 0xde, 0x06, 0x90, 0x71, 0x5d, 0x43,
+		    0x88, 0x9b, 0x00, 0x54, 0xa6, 0x75, 0x2f, 0xa1,
+		    0xc2, 0x0b, 0x73, 0x20, 0x1d, 0xb6, 0x21, 0x79,
+		    0x57, 0x3f, 0xfa, 0x09, 0xbe, 0x8a, 0x33, 0xc3,
+		    0x52, 0xf0, 0x1d, 0x82, 0x31, 0xd1, 0x55, 0xb5,
+		    0x6c, 0x99, 0x25, 0xcf, 0x5c, 0x32, 0xce, 0xe9,
+		    0x0d, 0xfa, 0x69, 0x2c, 0xd5, 0x0d, 0xc5, 0x6d,
+		    0x86, 0xd0, 0x0c, 0x3b, 0x06, 0x50, 0x79, 0xe8,
+		    0xc3, 0xae, 0x04, 0xe6, 0xcd, 0x51, 0xe4, 0x26,
+		    0x9b, 0x4f, 0x7e, 0xa6, 0x0f, 0xab, 0xd8, 0xe5,
+		    0xde, 0xa9, 0x00, 0x95, 0xbe, 0xa3, 0x9d, 0x5d,
+		    0xb2, 0x09, 0x70, 0x18, 0x1c, 0xf0, 0xac, 0x29,
+		    0x23, 0x02, 0x29, 0x28, 0xd2, 0x74, 0x35, 0x57,
+		    0x62, 0x0f, 0x24, 0xea, 0x5e, 0x33, 0xc2, 0x92,
+		    0xf3, 0x78, 0x4d, 0x30, 0x1e, 0xa1, 0x99, 0xa9,
+		    0x82, 0xb0, 0x42, 0x31, 0x8d, 0xad, 0x8a, 0xbc,
+		    0xfc, 0xd4, 0x57, 0x47, 0x3e, 0xb4, 0x50, 0xdd,
+		    0x6e, 0x2c, 0x80, 0x4d, 0x22, 0xf1, 0xfb, 0x57,
+		    0xc4, 0xdd, 0x17, 0xe1, 0x8a, 0x36, 0x4a, 0xb3,
+		    0x37, 0xca, 0xc9, 0x4e, 0xab, 0xd5, 0x69, 0xc4,
+		    0xf4, 0xbc, 0x0b, 0x3b, 0x44, 0x4b, 0x29, 0x9c,
+		    0xee, 0xd4, 0x35, 0x22, 0x21, 0xb0, 0x1f, 0x27,
+		    0x64, 0xa8, 0x51, 0x1b, 0xf0, 0x9f, 0x19, 0x5c,
+		    0xfb, 0x5a, 0x64, 0x74, 0x70, 0x45, 0x09, 0xf5,
+		    0x64, 0xfe, 0x1a, 0x2d, 0xc9, 0x14, 0x04, 0x14,
+		    0xcf, 0xd5, 0x7d, 0x60, 0xaf, 0x94, 0x39, 0x94,
+		    0xe2, 0x7d, 0x79, 0x82, 0xd0, 0x65, 0x3b, 0x6b,
+		    0x9c, 0x19, 0x84, 0xb4, 0x6d, 0xb3, 0x0c, 0x99,
+		    0xc0, 0x56, 0xa8, 0xbd, 0x73, 0xce, 0x05, 0x84,
+		    0x3e, 0x30, 0xaa, 0xc4, 0x9b, 0x1b, 0x04, 0x2a,
+		    0x9f, 0xd7, 0x43, 0x2b, 0x23, 0xdf, 0xbf, 0xaa,
+		    0xd5, 0xc2, 0x43, 0x2d, 0x70, 0xab, 0xdc, 0x75,
+		    0xad, 0xac, 0xf7, 0xc0, 0xbe, 0x67, 0xb2, 0x74,
+		    0xed, 0x67, 0x10, 0x4a, 0x92, 0x60, 0xc1, 0x40,
+		    0x50, 0x19, 0x8a, 0x8a, 0x8c, 0x09, 0x0e, 0x72,
+		    0xe1, 0x73, 0x5e, 0xe8, 0x41, 0x85, 0x63, 0x9f,
+		    0x3f, 0xd7, 0x7d, 0xc4, 0xfb, 0x22, 0x5d, 0x92,
+		    0x6c, 0xb3, 0x1e, 0xe2, 0x50, 0x2f, 0x82, 0xa8,
+		    0x28, 0xc0, 0xb5, 0xd7, 0x5f, 0x68, 0x0d, 0x2c,
+		    0x2d, 0xaf, 0x7e, 0xfa, 0x2e, 0x08, 0x0f, 0x1f,
+		    0x70, 0x9f, 0xe9, 0x19, 0x72, 0x55, 0xf8, 0xfb,
+		    0x51, 0xd2, 0x33, 0x5d, 0xa0, 0xd3, 0x2b, 0x0a,
+		    0x6c, 0xbc, 0x4e, 0xcf, 0x36, 0x4d, 0xdc, 0x3b,
+		    0xe9, 0x3e, 0x81, 0x7c, 0x61, 0xdb, 0x20, 0x2d,
+		    0x3a, 0xc3, 0xb3, 0x0c, 0x1e, 0x00, 0xb9, 0x7c,
+		    0xf5, 0xca, 0x10, 0x5f, 0x3a, 0x71, 0xb3, 0xe4,
+		    0x20, 0xdb, 0x0c, 0x2a, 0x98, 0x63, 0x45, 0x00,
+		    0x58, 0xf6, 0x68, 0xe4, 0x0b, 0xda, 0x13, 0x3b,
+		    0x60, 0x5c, 0x76, 0xdb, 0xb9, 0x97, 0x71, 0xe4,
+		    0xd9, 0xb7, 0xdb, 0xbd, 0x68, 0xc7, 0x84, 0x84,
+		    0xaa, 0x7c, 0x68, 0x62, 0x5e, 0x16, 0xfc, 0xba,
+		    0x72, 0xaa, 0x9a, 0xa9, 0xeb, 0x7c, 0x75, 0x47,
+		    0x97, 0x7e, 0xad, 0xe2, 0xd9, 0x91, 0xe8, 0xe4,
+		    0xa5, 0x31, 0xd7, 0x01, 0x8e, 0xa2, 0x11, 0x88,
+		    0x95, 0xb9, 0xf2, 0x9b, 0xd3, 0x7f, 0x1b, 0x81,
+		    0x22, 0xf7, 0x98, 0x60, 0x0a, 0x64, 0xa6, 0xc1,
+		    0xf6, 0x49, 0xc7, 0xe3, 0x07, 0x4d, 0x94, 0x7a,
+		    0xcf, 0x6e, 0x68, 0x0c, 0x1b, 0x3f, 0x6e, 0x2e,
+		    0xee, 0x92, 0xfa, 0x52, 0xb3, 0x59, 0xf8, 0xf1,
+		    0x8f, 0x6a, 0x66, 0xa3, 0x82, 0x76, 0x4a, 0x07,
+		    0x1a, 0xc7, 0xdd, 0xf5, 0xda, 0x9c, 0x3c, 0x24,
+		    0xbf, 0xfd, 0x42, 0xa1, 0x10, 0x64, 0x6a, 0x0f,
+		    0x89, 0xee, 0x36, 0xa5, 0xce, 0x99, 0x48, 0x6a,
+		    0xf0, 0x9f, 0x9e, 0x69, 0xa4, 0x40, 0x20, 0xe9,
+		    0x16, 0x15, 0xf7, 0xdb, 0x75, 0x02, 0xcb, 0xe9,
+		    0x73, 0x8b, 0x3b, 0x49, 0x2f, 0xf0, 0xaf, 0x51,
+		    0x06, 0x5c, 0xdf, 0x27, 0x27, 0x49, 0x6a, 0xd1,
+		    0xcc, 0xc7, 0xb5, 0x63, 0xb5, 0xfc, 0xb8, 0x5c,
+		    0x87, 0x7f, 0x84, 0xb4, 0xcc, 0x14, 0xa9, 0x53,
+		    0xda, 0xa4, 0x56, 0xf8, 0xb6, 0x1b, 0xcc, 0x40,
+		    0x27, 0x52, 0x06, 0x5a, 0x13, 0x81, 0xd7, 0x3a,
+		    0xd4, 0x3b, 0xfb, 0x49, 0x65, 0x31, 0x33, 0xb2,
+		    0xfa, 0xcd, 0xad, 0x58, 0x4e, 0x2b, 0xae, 0xd2,
+		    0x20, 0xfb, 0x1a, 0x48, 0xb4, 0x3f, 0x9a, 0xd8,
+		    0x7a, 0x35, 0x4a, 0xc8, 0xee, 0x88, 0x5e, 0x07,
+		    0x66, 0x54, 0xb9, 0xec, 0x9f, 0xa3, 0xe3, 0xb9,
+		    0x37, 0xaa, 0x49, 0x76, 0x31, 0xda, 0x74, 0x2d,
+		    0x3c, 0xa4, 0x65, 0x10, 0x32, 0x38, 0xf0, 0xde,
+		    0xd3, 0x99, 0x17, 0xaa, 0x71, 0xaa, 0x8f, 0x0f,
+		    0x8c, 0xaf, 0xa2, 0xf8, 0x5d, 0x64, 0xba, 0x1d,
+		    0xa3, 0xef, 0x96, 0x73, 0xe8, 0xa1, 0x02, 0x8d,
+		    0x0c, 0x6d, 0xb8, 0x06, 0x90, 0xb8, 0x08, 0x56,
+		    0x2c, 0xa7, 0x06, 0xc9, 0xc2, 0x38, 0xdb, 0x7c,
+		    0x63, 0xb1, 0x57, 0x8e, 0xea, 0x7c, 0x79, 0xf3,
+		    0x49, 0x1d, 0xfe, 0x9f, 0xf3, 0x6e, 0xb1, 0x1d,
+		    0xba, 0x19, 0x80, 0x1a, 0x0a, 0xd3, 0xb0, 0x26,
+		    0x21, 0x40, 0xb1, 0x7c, 0xf9, 0x4d, 0x8d, 0x10,
+		    0xc1, 0x7e, 0xf4, 0xf6, 0x3c, 0xa8, 0xfd, 0x7c,
+		    0xa3, 0x92, 0xb2, 0x0f, 0xaa, 0xcc, 0xa6, 0x11,
+		    0xfe, 0x04, 0xe3, 0xd1, 0x7a, 0x32, 0x89, 0xdf,
+		    0x0d, 0xc4, 0x8f, 0x79, 0x6b, 0xca, 0x16, 0x7c,
+		    0x6e, 0xf9, 0xad, 0x0f, 0xf6, 0xfe, 0x27, 0xdb,
+		    0xc4, 0x13, 0x70, 0xf1, 0x62, 0x1a, 0x4f, 0x79,
+		    0x40, 0xc9, 0x9b, 0x8b, 0x21, 0xea, 0x84, 0xfa,
+		    0xf5, 0xf1, 0x89, 0xce, 0xb7, 0x55, 0x0a, 0x80,
+		    0x39, 0x2f, 0x55, 0x36, 0x16, 0x9c, 0x7b, 0x08,
+		    0xbd, 0x87, 0x0d, 0xa5, 0x32, 0xf1, 0x52, 0x7c,
+		    0xe8, 0x55, 0x60, 0x5b, 0xd7, 0x69, 0xe4, 0xfc,
+		    0xfa, 0x12, 0x85, 0x96, 0xea, 0x50, 0x28, 0xab,
+		    0x8a, 0xf7, 0xbb, 0x0e, 0x53, 0x74, 0xca, 0xa6,
+		    0x27, 0x09, 0xc2, 0xb5, 0xde, 0x18, 0x14, 0xd9,
+		    0xea, 0xe5, 0x29, 0x1c, 0x40, 0x56, 0xcf, 0xd7,
+		    0xae, 0x05, 0x3f, 0x65, 0xaf, 0x05, 0x73, 0xe2,
+		    0x35, 0x96, 0x27, 0x07, 0x14, 0xc0, 0xad, 0x33,
+		    0xf1, 0xdc, 0x44, 0x7a, 0x89, 0x17, 0x77, 0xd2,
+		    0x9c, 0x58, 0x60, 0xf0, 0x3f, 0x7b, 0x2d, 0x2e,
+		    0x57, 0x95, 0x54, 0x87, 0xed, 0xf2, 0xc7, 0x4c,
+		    0xf0, 0xae, 0x56, 0x29, 0x19, 0x7d, 0x66, 0x4b,
+		    0x9b, 0x83, 0x84, 0x42, 0x3b, 0x01, 0x25, 0x66,
+		    0x8e, 0x02, 0xde, 0xb9, 0x83, 0x54, 0x19, 0xf6,
+		    0x9f, 0x79, 0x0d, 0x67, 0xc5, 0x1d, 0x7a, 0x44,
+		    0x02, 0x98, 0xa7, 0x16, 0x1c, 0x29, 0x0d, 0x74,
+		    0xff, 0x85, 0x40, 0x06, 0xef, 0x2c, 0xa9, 0xc6,
+		    0xf5, 0x53, 0x07, 0x06, 0xae, 0xe4, 0xfa, 0x5f,
+		    0xd8, 0x39, 0x4d, 0xf1, 0x9b, 0x6b, 0xd9, 0x24,
+		    0x84, 0xfe, 0x03, 0x4c, 0xb2, 0x3f, 0xdf, 0xa1,
+		    0x05, 0x9e, 0x50, 0x14, 0x5a, 0xd9, 0x1a, 0xa2,
+		    0xa7, 0xfa, 0xfa, 0x17, 0xf7, 0x78, 0xd6, 0xb5,
+		    0x92, 0x61, 0x91, 0xac, 0x36, 0xfa, 0x56, 0x0d,
+		    0x38, 0x32, 0x18, 0x85, 0x08, 0x58, 0x37, 0xf0,
+		    0x4b, 0xdb, 0x59, 0xe7, 0xa4, 0x34, 0xc0, 0x1b,
+		    0x01, 0xaf, 0x2d, 0xde, 0xa1, 0xaa, 0x5d, 0xd3,
+		    0xec, 0xe1, 0xd4, 0xf7, 0xe6, 0x54, 0x68, 0xf0,
+		    0x51, 0x97, 0xa7, 0x89, 0xea, 0x24, 0xad, 0xd3,
+		    0x6e, 0x47, 0x93, 0x8b, 0x4b, 0xb4, 0xf7, 0x1c,
+		    0x42, 0x06, 0x67, 0xe8, 0x99, 0xf6, 0xf5, 0x7b,
+		    0x85, 0xb5, 0x65, 0xb5, 0xb5, 0xd2, 0x37, 0xf5,
+		    0xf3, 0x02, 0xa6, 0x4d, 0x11, 0xa7, 0xdc, 0x51,
+		    0x09, 0x7f, 0xa0, 0xd8, 0x88, 0x1c, 0x13, 0x71,
+		    0xae, 0x9c, 0xb7, 0x7b, 0x34, 0xd6, 0x4e, 0x68,
+		    0x26, 0x83, 0x51, 0xaf, 0x1d, 0xee, 0x8b, 0xbb,
+		    0x69, 0x43, 0x2b, 0x9e, 0x8a, 0xbc, 0x02, 0x0e,
+		    0xa0, 0x1b, 0xe0, 0xa8, 0x5f, 0x6f, 0xaf, 0x1b,
+		    0x8f, 0xe7, 0x64, 0x71, 0x74, 0x11, 0x7e, 0xa8,
+		    0xd8, 0xf9, 0x97, 0x06, 0xc3, 0xb6, 0xfb, 0xfb,
+		    0xb7, 0x3d, 0x35, 0x9d, 0x3b, 0x52, 0xed, 0x54,
+		    0xca, 0xf4, 0x81, 0x01, 0x2d, 0x1b, 0xc3, 0xa7,
+		    0x00, 0x3d, 0x1a, 0x39, 0x54, 0xe1, 0xf6, 0xff,
+		    0xed, 0x6f, 0x0b, 0x5a, 0x68, 0xda, 0x58, 0xdd,
+		    0xa9, 0xcf, 0x5c, 0x4a, 0xe5, 0x09, 0x4e, 0xde,
+		    0x9d, 0xbc, 0x3e, 0xee, 0x5a, 0x00, 0x3b, 0x2c,
+		    0x87, 0x10, 0x65, 0x60, 0xdd, 0xd7, 0x56, 0xd1,
+		    0x4c, 0x64, 0x45, 0xe4, 0x21, 0xec, 0x78, 0xf8,
+		    0x25, 0x7a, 0x3e, 0x16, 0x5d, 0x09, 0x53, 0x14,
+		    0xbe, 0x4f, 0xae, 0x87, 0xd8, 0xd1, 0xaa, 0x3c,
+		    0xf6, 0x3e, 0xa4, 0x70, 0x8c, 0x5e, 0x70, 0xa4,
+		    0xb3, 0x6b, 0x66, 0x73, 0xd3, 0xbf, 0x31, 0x06,
+		    0x19, 0x62, 0x93, 0x15, 0xf2, 0x86, 0xe4, 0x52,
+		    0x7e, 0x53, 0x4c, 0x12, 0x38, 0xcc, 0x34, 0x7d,
+		    0x57, 0xf6, 0x42, 0x93, 0x8a, 0xc4, 0xee, 0x5c,
+		    0x8a, 0xe1, 0x52, 0x8f, 0x56, 0x64, 0xf6, 0xa6,
+		    0xd1, 0x91, 0x57, 0x70, 0xcd, 0x11, 0x76, 0xf5,
+		    0x59, 0x60, 0x60, 0x3c, 0xc1, 0xc3, 0x0b, 0x7f,
+		    0x58, 0x1a, 0x50, 0x91, 0xf1, 0x68, 0x8f, 0x6e,
+		    0x74, 0x74, 0xa8, 0x51, 0x0b, 0xf7, 0x7a, 0x98,
+		    0x37, 0xf2, 0x0a, 0x0e, 0xa4, 0x97, 0x04, 0xb8,
+		    0x9b, 0xfd, 0xa0, 0xea, 0xf7, 0x0d, 0xe1, 0xdb,
+		    0x03, 0xf0, 0x31, 0x29, 0xf8, 0xdd, 0x6b, 0x8b,
+		    0x5d, 0xd8, 0x59, 0xa9, 0x29, 0xcf, 0x9a, 0x79,
+		    0x89, 0x19, 0x63, 0x46, 0x09, 0x79, 0x6a, 0x11,
+		    0xda, 0x63, 0x68, 0x48, 0x77, 0x23, 0xfb, 0x7d,
+		    0x3a, 0x43, 0xcb, 0x02, 0x3b, 0x7a, 0x6d, 0x10,
+		    0x2a, 0x9e, 0xac, 0xf1, 0xd4, 0x19, 0xf8, 0x23,
+		    0x64, 0x1d, 0x2c, 0x5f, 0xf2, 0xb0, 0x5c, 0x23,
+		    0x27, 0xf7, 0x27, 0x30, 0x16, 0x37, 0xb1, 0x90,
+		    0xab, 0x38, 0xfb, 0x55, 0xcd, 0x78, 0x58, 0xd4,
+		    0x7d, 0x43, 0xf6, 0x45, 0x5e, 0x55, 0x8d, 0xb1,
+		    0x02, 0x65, 0x58, 0xb4, 0x13, 0x4b, 0x36, 0xf7,
+		    0xcc, 0xfe, 0x3d, 0x0b, 0x82, 0xe2, 0x12, 0x11,
+		    0xbb, 0xe6, 0xb8, 0x3a, 0x48, 0x71, 0xc7, 0x50,
+		    0x06, 0x16, 0x3a, 0xe6, 0x7c, 0x05, 0xc7, 0xc8,
+		    0x4d, 0x2f, 0x08, 0x6a, 0x17, 0x9a, 0x95, 0x97,
+		    0x50, 0x68, 0xdc, 0x28, 0x18, 0xc4, 0x61, 0x38,
+		    0xb9, 0xe0, 0x3e, 0x78, 0xdb, 0x29, 0xe0, 0x9f,
+		    0x52, 0xdd, 0xf8, 0x4f, 0x91, 0xc1, 0xd0, 0x33,
+		    0xa1, 0x7a, 0x8e, 0x30, 0x13, 0x82, 0x07, 0x9f,
+		    0xd3, 0x31, 0x0f, 0x23, 0xbe, 0x32, 0x5a, 0x75,
+		    0xcf, 0x96, 0xb2, 0xec, 0xb5, 0x32, 0xac, 0x21,
+		    0xd1, 0x82, 0x33, 0xd3, 0x15, 0x74, 0xbd, 0x90,
+		    0xf1, 0x2c, 0xe6, 0x5f, 0x8d, 0xe3, 0x02, 0xe8,
+		    0xe9, 0xc4, 0xca, 0x96, 0xeb, 0x0e, 0xbc, 0x91,
+		    0xf4, 0xb9, 0xea, 0xd9, 0x1b, 0x75, 0xbd, 0xe1,
+		    0xac, 0x2a, 0x05, 0x37, 0x52, 0x9b, 0x1b, 0x3f,
+		    0x5a, 0xdc, 0x21, 0xc3, 0x98, 0xbb, 0xaf, 0xa3,
+		    0xf2, 0x00, 0xbf, 0x0d, 0x30, 0x89, 0x05, 0xcc,
+		    0xa5, 0x76, 0xf5, 0x06, 0xf0, 0xc6, 0x54, 0x8a,
+		    0x5d, 0xd4, 0x1e, 0xc1, 0xf2, 0xce, 0xb0, 0x62,
+		    0xc8, 0xfc, 0x59, 0x42, 0x9a, 0x90, 0x60, 0x55,
+		    0xfe, 0x88, 0xa5, 0x8b, 0xb8, 0x33, 0x0c, 0x23,
+		    0x24, 0x0d, 0x15, 0x70, 0x37, 0x1e, 0x3d, 0xf6,
+		    0xd2, 0xea, 0x92, 0x10, 0xb2, 0xc4, 0x51, 0xac,
+		    0xf2, 0xac, 0xf3, 0x6b, 0x6c, 0xaa, 0xcf, 0x12,
+		    0xc5, 0x6c, 0x90, 0x50, 0xb5, 0x0c, 0xfc, 0x1a,
+		    0x15, 0x52, 0xe9, 0x26, 0xc6, 0x52, 0xa4, 0xe7,
+		    0x81, 0x69, 0xe1, 0xe7, 0x9e, 0x30, 0x01, 0xec,
+		    0x84, 0x89, 0xb2, 0x0d, 0x66, 0xdd, 0xce, 0x28,
+		    0x5c, 0xec, 0x98, 0x46, 0x68, 0x21, 0x9f, 0x88,
+		    0x3f, 0x1f, 0x42, 0x77, 0xce, 0xd0, 0x61, 0xd4,
+		    0x20, 0xa7, 0xff, 0x53, 0xad, 0x37, 0xd0, 0x17,
+		    0x35, 0xc9, 0xfc, 0xba, 0x0a, 0x78, 0x3f, 0xf2,
+		    0xcc, 0x86, 0x89, 0xe8, 0x4b, 0x3c, 0x48, 0x33,
+		    0x09, 0x7f, 0xc6, 0xc0, 0xdd, 0xb8, 0xfd, 0x7a,
+		    0x66, 0x66, 0x65, 0xeb, 0x47, 0xa7, 0x04, 0x28,
+		    0xa3, 0x19, 0x8e, 0xa9, 0xb1, 0x13, 0x67, 0x62,
+		    0x70, 0xcf, 0xd6 },
+	.ilen	= 2027,
+	.result	= { 0x74, 0xa6, 0x3e, 0xe4, 0xb1, 0xcb, 0xaf, 0xb0,
+		    0x40, 0xe5, 0x0f, 0x9e, 0xf1, 0xf2, 0x89, 0xb5,
+		    0x42, 0x34, 0x8a, 0xa1, 0x03, 0xb7, 0xe9, 0x57,
+		    0x46, 0xbe, 0x20, 0xe4, 0x6e, 0xb0, 0xeb, 0xff,
+		    0xea, 0x07, 0x7e, 0xef, 0xe2, 0x55, 0x9f, 0xe5,
+		    0x78, 0x3a, 0xb7, 0x83, 0xc2, 0x18, 0x40, 0x7b,
+		    0xeb, 0xcd, 0x81, 0xfb, 0x90, 0x12, 0x9e, 0x46,
+		    0xa9, 0xd6, 0x4a, 0xba, 0xb0, 0x62, 0xdb, 0x6b,
+		    0x99, 0xc4, 0xdb, 0x54, 0x4b, 0xb8, 0xa5, 0x71,
+		    0xcb, 0xcd, 0x63, 0x32, 0x55, 0xfb, 0x31, 0xf0,
+		    0x38, 0xf5, 0xbe, 0x78, 0xe4, 0x45, 0xce, 0x1b,
+		    0x6a, 0x5b, 0x0e, 0xf4, 0x16, 0xe4, 0xb1, 0x3d,
+		    0xf6, 0x63, 0x7b, 0xa7, 0x0c, 0xde, 0x6f, 0x8f,
+		    0x74, 0xdf, 0xe0, 0x1e, 0x9d, 0xce, 0x8f, 0x24,
+		    0xef, 0x23, 0x35, 0x33, 0x7b, 0x83, 0x34, 0x23,
+		    0x58, 0x74, 0x14, 0x77, 0x1f, 0xc2, 0x4f, 0x4e,
+		    0xc6, 0x89, 0xf9, 0x52, 0x09, 0x37, 0x64, 0x14,
+		    0xc4, 0x01, 0x6b, 0x9d, 0x77, 0xe8, 0x90, 0x5d,
+		    0xa8, 0x4a, 0x2a, 0xef, 0x5c, 0x7f, 0xeb, 0xbb,
+		    0xb2, 0xc6, 0x93, 0x99, 0x66, 0xdc, 0x7f, 0xd4,
+		    0x9e, 0x2a, 0xca, 0x8d, 0xdb, 0xe7, 0x20, 0xcf,
+		    0xe4, 0x73, 0xae, 0x49, 0x7d, 0x64, 0x0f, 0x0e,
+		    0x28, 0x46, 0xa9, 0xa8, 0x32, 0xe4, 0x0e, 0xf6,
+		    0x51, 0x53, 0xb8, 0x3c, 0xb1, 0xff, 0xa3, 0x33,
+		    0x41, 0x75, 0xff, 0xf1, 0x6f, 0xf1, 0xfb, 0xbb,
+		    0x83, 0x7f, 0x06, 0x9b, 0xe7, 0x1b, 0x0a, 0xe0,
+		    0x5c, 0x33, 0x60, 0x5b, 0xdb, 0x5b, 0xed, 0xfe,
+		    0xa5, 0x16, 0x19, 0x72, 0xa3, 0x64, 0x23, 0x00,
+		    0x02, 0xc7, 0xf3, 0x6a, 0x81, 0x3e, 0x44, 0x1d,
+		    0x79, 0x15, 0x5f, 0x9a, 0xde, 0xe2, 0xfd, 0x1b,
+		    0x73, 0xc1, 0xbc, 0x23, 0xba, 0x31, 0xd2, 0x50,
+		    0xd5, 0xad, 0x7f, 0x74, 0xa7, 0xc9, 0xf8, 0x3e,
+		    0x2b, 0x26, 0x10, 0xf6, 0x03, 0x36, 0x74, 0xe4,
+		    0x0e, 0x6a, 0x72, 0xb7, 0x73, 0x0a, 0x42, 0x28,
+		    0xc2, 0xad, 0x5e, 0x03, 0xbe, 0xb8, 0x0b, 0xa8,
+		    0x5b, 0xd4, 0xb8, 0xba, 0x52, 0x89, 0xb1, 0x9b,
+		    0xc1, 0xc3, 0x65, 0x87, 0xed, 0xa5, 0xf4, 0x86,
+		    0xfd, 0x41, 0x80, 0x91, 0x27, 0x59, 0x53, 0x67,
+		    0x15, 0x78, 0x54, 0x8b, 0x2d, 0x3d, 0xc7, 0xff,
+		    0x02, 0x92, 0x07, 0x5f, 0x7a, 0x4b, 0x60, 0x59,
+		    0x3c, 0x6f, 0x5c, 0xd8, 0xec, 0x95, 0xd2, 0xfe,
+		    0xa0, 0x3b, 0xd8, 0x3f, 0xd1, 0x69, 0xa6, 0xd6,
+		    0x41, 0xb2, 0xf4, 0x4d, 0x12, 0xf4, 0x58, 0x3e,
+		    0x66, 0x64, 0x80, 0x31, 0x9b, 0xa8, 0x4c, 0x8b,
+		    0x07, 0xb2, 0xec, 0x66, 0x94, 0x66, 0x47, 0x50,
+		    0x50, 0x5f, 0x18, 0x0b, 0x0e, 0xd6, 0xc0, 0x39,
+		    0x21, 0x13, 0x9e, 0x33, 0xbc, 0x79, 0x36, 0x02,
+		    0x96, 0x70, 0xf0, 0x48, 0x67, 0x2f, 0x26, 0xe9,
+		    0x6d, 0x10, 0xbb, 0xd6, 0x3f, 0xd1, 0x64, 0x7a,
+		    0x2e, 0xbe, 0x0c, 0x61, 0xf0, 0x75, 0x42, 0x38,
+		    0x23, 0xb1, 0x9e, 0x9f, 0x7c, 0x67, 0x66, 0xd9,
+		    0x58, 0x9a, 0xf1, 0xbb, 0x41, 0x2a, 0x8d, 0x65,
+		    0x84, 0x94, 0xfc, 0xdc, 0x6a, 0x50, 0x64, 0xdb,
+		    0x56, 0x33, 0x76, 0x00, 0x10, 0xed, 0xbe, 0xd2,
+		    0x12, 0xf6, 0xf6, 0x1b, 0xa2, 0x16, 0xde, 0xae,
+		    0x31, 0x95, 0xdd, 0xb1, 0x08, 0x7e, 0x4e, 0xee,
+		    0xe7, 0xf9, 0xa5, 0xfb, 0x5b, 0x61, 0x43, 0x00,
+		    0x40, 0xf6, 0x7e, 0x02, 0x04, 0x32, 0x4e, 0x0c,
+		    0xe2, 0x66, 0x0d, 0xd7, 0x07, 0x98, 0x0e, 0xf8,
+		    0x72, 0x34, 0x6d, 0x95, 0x86, 0xd7, 0xcb, 0x31,
+		    0x54, 0x47, 0xd0, 0x38, 0x29, 0x9c, 0x5a, 0x68,
+		    0xd4, 0x87, 0x76, 0xc9, 0xe7, 0x7e, 0xe3, 0xf4,
+		    0x81, 0x6d, 0x18, 0xcb, 0xc9, 0x05, 0xaf, 0xa0,
+		    0xfb, 0x66, 0xf7, 0xf1, 0x1c, 0xc6, 0x14, 0x11,
+		    0x4f, 0x2b, 0x79, 0x42, 0x8b, 0xbc, 0xac, 0xe7,
+		    0x6c, 0xfe, 0x0f, 0x58, 0xe7, 0x7c, 0x78, 0x39,
+		    0x30, 0xb0, 0x66, 0x2c, 0x9b, 0x6d, 0x3a, 0xe1,
+		    0xcf, 0xc9, 0xa4, 0x0e, 0x6d, 0x6d, 0x8a, 0xa1,
+		    0x3a, 0xe7, 0x28, 0xd4, 0x78, 0x4c, 0xa6, 0xa2,
+		    0x2a, 0xa6, 0x03, 0x30, 0xd7, 0xa8, 0x25, 0x66,
+		    0x87, 0x2f, 0x69, 0x5c, 0x4e, 0xdd, 0xa5, 0x49,
+		    0x5d, 0x37, 0x4a, 0x59, 0xc4, 0xaf, 0x1f, 0xa2,
+		    0xe4, 0xf8, 0xa6, 0x12, 0x97, 0xd5, 0x79, 0xf5,
+		    0xe2, 0x4a, 0x2b, 0x5f, 0x61, 0xe4, 0x9e, 0xe3,
+		    0xee, 0xb8, 0xa7, 0x5b, 0x2f, 0xf4, 0x9e, 0x6c,
+		    0xfb, 0xd1, 0xc6, 0x56, 0x77, 0xba, 0x75, 0xaa,
+		    0x3d, 0x1a, 0xa8, 0x0b, 0xb3, 0x68, 0x24, 0x00,
+		    0x10, 0x7f, 0xfd, 0xd7, 0xa1, 0x8d, 0x83, 0x54,
+		    0x4f, 0x1f, 0xd8, 0x2a, 0xbe, 0x8a, 0x0c, 0x87,
+		    0xab, 0xa2, 0xde, 0xc3, 0x39, 0xbf, 0x09, 0x03,
+		    0xa5, 0xf3, 0x05, 0x28, 0xe1, 0xe1, 0xee, 0x39,
+		    0x70, 0x9c, 0xd8, 0x81, 0x12, 0x1e, 0x02, 0x40,
+		    0xd2, 0x6e, 0xf0, 0xeb, 0x1b, 0x3d, 0x22, 0xc6,
+		    0xe5, 0xe3, 0xb4, 0x5a, 0x98, 0xbb, 0xf0, 0x22,
+		    0x28, 0x8d, 0xe5, 0xd3, 0x16, 0x48, 0x24, 0xa5,
+		    0xe6, 0x66, 0x0c, 0xf9, 0x08, 0xf9, 0x7e, 0x1e,
+		    0xe1, 0x28, 0x26, 0x22, 0xc7, 0xc7, 0x0a, 0x32,
+		    0x47, 0xfa, 0xa3, 0xbe, 0x3c, 0xc4, 0xc5, 0x53,
+		    0x0a, 0xd5, 0x94, 0x4a, 0xd7, 0x93, 0xd8, 0x42,
+		    0x99, 0xb9, 0x0a, 0xdb, 0x56, 0xf7, 0xb9, 0x1c,
+		    0x53, 0x4f, 0xfa, 0xd3, 0x74, 0xad, 0xd9, 0x68,
+		    0xf1, 0x1b, 0xdf, 0x61, 0xc6, 0x5e, 0xa8, 0x48,
+		    0xfc, 0xd4, 0x4a, 0x4c, 0x3c, 0x32, 0xf7, 0x1c,
+		    0x96, 0x21, 0x9b, 0xf9, 0xa3, 0xcc, 0x5a, 0xce,
+		    0xd5, 0xd7, 0x08, 0x24, 0xf6, 0x1c, 0xfd, 0xdd,
+		    0x38, 0xc2, 0x32, 0xe9, 0xb8, 0xe7, 0xb6, 0xfa,
+		    0x9d, 0x45, 0x13, 0x2c, 0x83, 0xfd, 0x4a, 0x69,
+		    0x82, 0xcd, 0xdc, 0xb3, 0x76, 0x0c, 0x9e, 0xd8,
+		    0xf4, 0x1b, 0x45, 0x15, 0xb4, 0x97, 0xe7, 0x58,
+		    0x34, 0xe2, 0x03, 0x29, 0x5a, 0xbf, 0xb6, 0xe0,
+		    0x5d, 0x13, 0xd9, 0x2b, 0xb4, 0x80, 0xb2, 0x45,
+		    0x81, 0x6a, 0x2e, 0x6c, 0x89, 0x7d, 0xee, 0xbb,
+		    0x52, 0xdd, 0x1f, 0x18, 0xe7, 0x13, 0x6b, 0x33,
+		    0x0e, 0xea, 0x36, 0x92, 0x77, 0x7b, 0x6d, 0x9c,
+		    0x5a, 0x5f, 0x45, 0x7b, 0x7b, 0x35, 0x62, 0x23,
+		    0xd1, 0xbf, 0x0f, 0xd0, 0x08, 0x1b, 0x2b, 0x80,
+		    0x6b, 0x7e, 0xf1, 0x21, 0x47, 0xb0, 0x57, 0xd1,
+		    0x98, 0x72, 0x90, 0x34, 0x1c, 0x20, 0x04, 0xff,
+		    0x3d, 0x5c, 0xee, 0x0e, 0x57, 0x5f, 0x6f, 0x24,
+		    0x4e, 0x3c, 0xea, 0xfc, 0xa5, 0xa9, 0x83, 0xc9,
+		    0x61, 0xb4, 0x51, 0x24, 0xf8, 0x27, 0x5e, 0x46,
+		    0x8c, 0xb1, 0x53, 0x02, 0x96, 0x35, 0xba, 0xb8,
+		    0x4c, 0x71, 0xd3, 0x15, 0x59, 0x35, 0x22, 0x20,
+		    0xad, 0x03, 0x9f, 0x66, 0x44, 0x3b, 0x9c, 0x35,
+		    0x37, 0x1f, 0x9b, 0xbb, 0xf3, 0xdb, 0x35, 0x63,
+		    0x30, 0x64, 0xaa, 0xa2, 0x06, 0xa8, 0x5d, 0xbb,
+		    0xe1, 0x9f, 0x70, 0xec, 0x82, 0x11, 0x06, 0x36,
+		    0xec, 0x8b, 0x69, 0x66, 0x24, 0x44, 0xc9, 0x4a,
+		    0x57, 0xbb, 0x9b, 0x78, 0x13, 0xce, 0x9c, 0x0c,
+		    0xba, 0x92, 0x93, 0x63, 0xb8, 0xe2, 0x95, 0x0f,
+		    0x0f, 0x16, 0x39, 0x52, 0xfd, 0x3a, 0x6d, 0x02,
+		    0x4b, 0xdf, 0x13, 0xd3, 0x2a, 0x22, 0xb4, 0x03,
+		    0x7c, 0x54, 0x49, 0x96, 0x68, 0x54, 0x10, 0xfa,
+		    0xef, 0xaa, 0x6c, 0xe8, 0x22, 0xdc, 0x71, 0x16,
+		    0x13, 0x1a, 0xf6, 0x28, 0xe5, 0x6d, 0x77, 0x3d,
+		    0xcd, 0x30, 0x63, 0xb1, 0x70, 0x52, 0xa1, 0xc5,
+		    0x94, 0x5f, 0xcf, 0xe8, 0xb8, 0x26, 0x98, 0xf7,
+		    0x06, 0xa0, 0x0a, 0x70, 0xfa, 0x03, 0x80, 0xac,
+		    0xc1, 0xec, 0xd6, 0x4c, 0x54, 0xd7, 0xfe, 0x47,
+		    0xb6, 0x88, 0x4a, 0xf7, 0x71, 0x24, 0xee, 0xf3,
+		    0xd2, 0xc2, 0x4a, 0x7f, 0xfe, 0x61, 0xc7, 0x35,
+		    0xc9, 0x37, 0x67, 0xcb, 0x24, 0x35, 0xda, 0x7e,
+		    0xca, 0x5f, 0xf3, 0x8d, 0xd4, 0x13, 0x8e, 0xd6,
+		    0xcb, 0x4d, 0x53, 0x8f, 0x53, 0x1f, 0xc0, 0x74,
+		    0xf7, 0x53, 0xb9, 0x5e, 0x23, 0x37, 0xba, 0x6e,
+		    0xe3, 0x9d, 0x07, 0x55, 0x25, 0x7b, 0xe6, 0x2a,
+		    0x64, 0xd1, 0x32, 0xdd, 0x54, 0x1b, 0x4b, 0xc0,
+		    0xe1, 0xd7, 0x69, 0x58, 0xf8, 0x93, 0x29, 0xc4,
+		    0xdd, 0x23, 0x2f, 0xa5, 0xfc, 0x9d, 0x7e, 0xf8,
+		    0xd4, 0x90, 0xcd, 0x82, 0x55, 0xdc, 0x16, 0x16,
+		    0x9f, 0x07, 0x52, 0x9b, 0x9d, 0x25, 0xed, 0x32,
+		    0xc5, 0x7b, 0xdf, 0xf6, 0x83, 0x46, 0x3d, 0x65,
+		    0xb7, 0xef, 0x87, 0x7a, 0x12, 0x69, 0x8f, 0x06,
+		    0x7c, 0x51, 0x15, 0x4a, 0x08, 0xe8, 0xac, 0x9a,
+		    0x0c, 0x24, 0xa7, 0x27, 0xd8, 0x46, 0x2f, 0xe7,
+		    0x01, 0x0e, 0x1c, 0xc6, 0x91, 0xb0, 0x6e, 0x85,
+		    0x65, 0xf0, 0x29, 0x0d, 0x2e, 0x6b, 0x3b, 0xfb,
+		    0x4b, 0xdf, 0xe4, 0x80, 0x93, 0x03, 0x66, 0x46,
+		    0x3e, 0x8a, 0x6e, 0xf3, 0x5e, 0x4d, 0x62, 0x0e,
+		    0x49, 0x05, 0xaf, 0xd4, 0xf8, 0x21, 0x20, 0x61,
+		    0x1d, 0x39, 0x17, 0xf4, 0x61, 0x47, 0x95, 0xfb,
+		    0x15, 0x2e, 0xb3, 0x4f, 0xd0, 0x5d, 0xf5, 0x7d,
+		    0x40, 0xda, 0x90, 0x3c, 0x6b, 0xcb, 0x17, 0x00,
+		    0x13, 0x3b, 0x64, 0x34, 0x1b, 0xf0, 0xf2, 0xe5,
+		    0x3b, 0xb2, 0xc7, 0xd3, 0x5f, 0x3a, 0x44, 0xa6,
+		    0x9b, 0xb7, 0x78, 0x0e, 0x42, 0x5d, 0x4c, 0xc1,
+		    0xe9, 0xd2, 0xcb, 0xb7, 0x78, 0xd1, 0xfe, 0x9a,
+		    0xb5, 0x07, 0xe9, 0xe0, 0xbe, 0xe2, 0x8a, 0xa7,
+		    0x01, 0x83, 0x00, 0x8c, 0x5c, 0x08, 0xe6, 0x63,
+		    0x12, 0x92, 0xb7, 0xb7, 0xa6, 0x19, 0x7d, 0x38,
+		    0x13, 0x38, 0x92, 0x87, 0x24, 0xf9, 0x48, 0xb3,
+		    0x5e, 0x87, 0x6a, 0x40, 0x39, 0x5c, 0x3f, 0xed,
+		    0x8f, 0xee, 0xdb, 0x15, 0x82, 0x06, 0xda, 0x49,
+		    0x21, 0x2b, 0xb5, 0xbf, 0x32, 0x7c, 0x9f, 0x42,
+		    0x28, 0x63, 0xcf, 0xaf, 0x1e, 0xf8, 0xc6, 0xa0,
+		    0xd1, 0x02, 0x43, 0x57, 0x62, 0xec, 0x9b, 0x0f,
+		    0x01, 0x9e, 0x71, 0xd8, 0x87, 0x9d, 0x01, 0xc1,
+		    0x58, 0x77, 0xd9, 0xaf, 0xb1, 0x10, 0x7e, 0xdd,
+		    0xa6, 0x50, 0x96, 0xe5, 0xf0, 0x72, 0x00, 0x6d,
+		    0x4b, 0xf8, 0x2a, 0x8f, 0x19, 0xf3, 0x22, 0x88,
+		    0x11, 0x4a, 0x8b, 0x7c, 0xfd, 0xb7, 0xed, 0xe1,
+		    0xf6, 0x40, 0x39, 0xe0, 0xe9, 0xf6, 0x3d, 0x25,
+		    0xe6, 0x74, 0x3c, 0x58, 0x57, 0x7f, 0xe1, 0x22,
+		    0x96, 0x47, 0x31, 0x91, 0xba, 0x70, 0x85, 0x28,
+		    0x6b, 0x9f, 0x6e, 0x25, 0xac, 0x23, 0x66, 0x2f,
+		    0x29, 0x88, 0x28, 0xce, 0x8c, 0x5c, 0x88, 0x53,
+		    0xd1, 0x3b, 0xcc, 0x6a, 0x51, 0xb2, 0xe1, 0x28,
+		    0x3f, 0x91, 0xb4, 0x0d, 0x00, 0x3a, 0xe3, 0xf8,
+		    0xc3, 0x8f, 0xd7, 0x96, 0x62, 0x0e, 0x2e, 0xfc,
+		    0xc8, 0x6c, 0x77, 0xa6, 0x1d, 0x22, 0xc1, 0xb8,
+		    0xe6, 0x61, 0xd7, 0x67, 0x36, 0x13, 0x7b, 0xbb,
+		    0x9b, 0x59, 0x09, 0xa6, 0xdf, 0xf7, 0x6b, 0xa3,
+		    0x40, 0x1a, 0xf5, 0x4f, 0xb4, 0xda, 0xd3, 0xf3,
+		    0x81, 0x93, 0xc6, 0x18, 0xd9, 0x26, 0xee, 0xac,
+		    0xf0, 0xaa, 0xdf, 0xc5, 0x9c, 0xca, 0xc2, 0xa2,
+		    0xcc, 0x7b, 0x5c, 0x24, 0xb0, 0xbc, 0xd0, 0x6a,
+		    0x4d, 0x89, 0x09, 0xb8, 0x07, 0xfe, 0x87, 0xad,
+		    0x0a, 0xea, 0xb8, 0x42, 0xf9, 0x5e, 0xb3, 0x3e,
+		    0x36, 0x4c, 0xaf, 0x75, 0x9e, 0x1c, 0xeb, 0xbd,
+		    0xbc, 0xbb, 0x80, 0x40, 0xa7, 0x3a, 0x30, 0xbf,
+		    0xa8, 0x44, 0xf4, 0xeb, 0x38, 0xad, 0x29, 0xba,
+		    0x23, 0xed, 0x41, 0x0c, 0xea, 0xd2, 0xbb, 0x41,
+		    0x18, 0xd6, 0xb9, 0xba, 0x65, 0x2b, 0xa3, 0x91,
+		    0x6d, 0x1f, 0xa9, 0xf4, 0xd1, 0x25, 0x8d, 0x4d,
+		    0x38, 0xff, 0x64, 0xa0, 0xec, 0xde, 0xa6, 0xb6,
+		    0x79, 0xab, 0x8e, 0x33, 0x6c, 0x47, 0xde, 0xaf,
+		    0x94, 0xa4, 0xa5, 0x86, 0x77, 0x55, 0x09, 0x92,
+		    0x81, 0x31, 0x76, 0xc7, 0x34, 0x22, 0x89, 0x8e,
+		    0x3d, 0x26, 0x26, 0xd7, 0xfc, 0x1e, 0x16, 0x72,
+		    0x13, 0x33, 0x63, 0xd5, 0x22, 0xbe, 0xb8, 0x04,
+		    0x34, 0x84, 0x41, 0xbb, 0x80, 0xd0, 0x9f, 0x46,
+		    0x48, 0x07, 0xa7, 0xfc, 0x2b, 0x3a, 0x75, 0x55,
+		    0x8c, 0xc7, 0x6a, 0xbd, 0x7e, 0x46, 0x08, 0x84,
+		    0x0f, 0xd5, 0x74, 0xc0, 0x82, 0x8e, 0xaa, 0x61,
+		    0x05, 0x01, 0xb2, 0x47, 0x6e, 0x20, 0x6a, 0x2d,
+		    0x58, 0x70, 0x48, 0x32, 0xa7, 0x37, 0xd2, 0xb8,
+		    0x82, 0x1a, 0x51, 0xb9, 0x61, 0xdd, 0xfd, 0x9d,
+		    0x6b, 0x0e, 0x18, 0x97, 0xf8, 0x45, 0x5f, 0x87,
+		    0x10, 0xcf, 0x34, 0x72, 0x45, 0x26, 0x49, 0x70,
+		    0xe7, 0xa3, 0x78, 0xe0, 0x52, 0x89, 0x84, 0x94,
+		    0x83, 0x82, 0xc2, 0x69, 0x8f, 0xe3, 0xe1, 0x3f,
+		    0x60, 0x74, 0x88, 0xc4, 0xf7, 0x75, 0x2c, 0xfb,
+		    0xbd, 0xb6, 0xc4, 0x7e, 0x10, 0x0a, 0x6c, 0x90,
+		    0x04, 0x9e, 0xc3, 0x3f, 0x59, 0x7c, 0xce, 0x31,
+		    0x18, 0x60, 0x57, 0x73, 0x46, 0x94, 0x7d, 0x06,
+		    0xa0, 0x6d, 0x44, 0xec, 0xa2, 0x0a, 0x9e, 0x05,
+		    0x15, 0xef, 0xca, 0x5c, 0xbf, 0x00, 0xeb, 0xf7,
+		    0x3d, 0x32, 0xd4, 0xa5, 0xef, 0x49, 0x89, 0x5e,
+		    0x46, 0xb0, 0xa6, 0x63, 0x5b, 0x8a, 0x73, 0xae,
+		    0x6f, 0xd5, 0x9d, 0xf8, 0x4f, 0x40, 0xb5, 0xb2,
+		    0x6e, 0xd3, 0xb6, 0x01, 0xa9, 0x26, 0xa2, 0x21,
+		    0xcf, 0x33, 0x7a, 0x3a, 0xa4, 0x23, 0x13, 0xb0,
+		    0x69, 0x6a, 0xee, 0xce, 0xd8, 0x9d, 0x01, 0x1d,
+		    0x50, 0xc1, 0x30, 0x6c, 0xb1, 0xcd, 0xa0, 0xf0,
+		    0xf0, 0xa2, 0x64, 0x6f, 0xbb, 0xbf, 0x5e, 0xe6,
+		    0xab, 0x87, 0xb4, 0x0f, 0x4f, 0x15, 0xaf, 0xb5,
+		    0x25, 0xa1, 0xb2, 0xd0, 0x80, 0x2c, 0xfb, 0xf9,
+		    0xfe, 0xd2, 0x33, 0xbb, 0x76, 0xfe, 0x7c, 0xa8,
+		    0x66, 0xf7, 0xe7, 0x85, 0x9f, 0x1f, 0x85, 0x57,
+		    0x88, 0xe1, 0xe9, 0x63, 0xe4, 0xd8, 0x1c, 0xa1,
+		    0xfb, 0xda, 0x44, 0x05, 0x2e, 0x1d, 0x3a, 0x1c,
+		    0xff, 0xc8, 0x3b, 0xc0, 0xfe, 0xda, 0x22, 0x0b,
+		    0x43, 0xd6, 0x88, 0x39, 0x4c, 0x4a, 0xa6, 0x69,
+		    0x18, 0x93, 0x42, 0x4e, 0xb5, 0xcc, 0x66, 0x0d,
+		    0x09, 0xf8, 0x1e, 0x7c, 0xd3, 0x3c, 0x99, 0x0d,
+		    0x50, 0x1d, 0x62, 0xe9, 0x57, 0x06, 0xbf, 0x19,
+		    0x88, 0xdd, 0xad, 0x7b, 0x4f, 0xf9, 0xc7, 0x82,
+		    0x6d, 0x8d, 0xc8, 0xc4, 0xc5, 0x78, 0x17, 0x20,
+		    0x15, 0xc5, 0x52, 0x41, 0xcf, 0x5b, 0xd6, 0x7f,
+		    0x94, 0x02, 0x41, 0xe0, 0x40, 0x22, 0x03, 0x5e,
+		    0xd1, 0x53, 0xd4, 0x86, 0xd3, 0x2c, 0x9f, 0x0f,
+		    0x96, 0xe3, 0x6b, 0x9a, 0x76, 0x32, 0x06, 0x47,
+		    0x4b, 0x11, 0xb3, 0xdd, 0x03, 0x65, 0xbd, 0x9b,
+		    0x01, 0xda, 0x9c, 0xb9, 0x7e, 0x3f, 0x6a, 0xc4,
+		    0x7b, 0xea, 0xd4, 0x3c, 0xb9, 0xfb, 0x5c, 0x6b,
+		    0x64, 0x33, 0x52, 0xba, 0x64, 0x78, 0x8f, 0xa4,
+		    0xaf, 0x7a, 0x61, 0x8d, 0xbc, 0xc5, 0x73, 0xe9,
+		    0x6b, 0x58, 0x97, 0x4b, 0xbf, 0x63, 0x22, 0xd3,
+		    0x37, 0x02, 0x54, 0xc5, 0xb9, 0x16, 0x4a, 0xf0,
+		    0x19, 0xd8, 0x94, 0x57, 0xb8, 0x8a, 0xb3, 0x16,
+		    0x3b, 0xd0, 0x84, 0x8e, 0x67, 0xa6, 0xa3, 0x7d,
+		    0x78, 0xec, 0x00 }
+}, {
+	.key	= { 0xb3, 0x35, 0x50, 0x03, 0x54, 0x2e, 0x40, 0x5e,
+		    0x8f, 0x59, 0x8e, 0xc5, 0x90, 0xd5, 0x27, 0x2d,
+		    0xba, 0x29, 0x2e, 0xcb, 0x1b, 0x70, 0x44, 0x1e,
+		    0x65, 0x91, 0x6e, 0x2a, 0x79, 0x22, 0xda, 0x64 },
+	.nonce	= { 0x05, 0xa3, 0x93, 0xed, 0x30, 0xc5, 0xa2, 0x06 },
+	.nlen	= 8,
+	.assoc	= { 0xb1, 0x69, 0x83, 0x87, 0x30, 0xaa, 0x5d, 0xb8,
+		    0x77, 0xe8, 0x21, 0xff, 0x06, 0x59, 0x35, 0xce,
+		    0x75, 0xfe, 0x38, 0xef, 0xb8, 0x91, 0x43, 0x8c,
+		    0xcf, 0x70, 0xdd, 0x0a, 0x68, 0xbf, 0xd4, 0xbc,
+		    0x16, 0x76, 0x99, 0x36, 0x1e, 0x58, 0x79, 0x5e,
+		    0xd4, 0x29, 0xf7, 0x33, 0x93, 0x48, 0xdb, 0x5f,
+		    0x01, 0xae, 0x9c, 0xb6, 0xe4, 0x88, 0x6d, 0x2b,
+		    0x76, 0x75, 0xe0, 0xf3, 0x74, 0xe2, 0xc9 },
+	.alen	= 63,
+	.input	= { 0x52, 0x34, 0xb3, 0x65, 0x3b, 0xb7, 0xe5, 0xd3,
+		    0xab, 0x49, 0x17, 0x60, 0xd2, 0x52, 0x56, 0xdf,
+		    0xdf, 0x34, 0x56, 0x82, 0xe2, 0xbe, 0xe5, 0xe1,
+		    0x28, 0xd1, 0x4e, 0x5f, 0x4f, 0x01, 0x7d, 0x3f,
+		    0x99, 0x6b, 0x30, 0x6e, 0x1a, 0x7c, 0x4c, 0x8e,
+		    0x62, 0x81, 0xae, 0x86, 0x3f, 0x6b, 0xd0, 0xb5,
+		    0xa9, 0xcf, 0x50, 0xf1, 0x02, 0x12, 0xa0, 0x0b,
+		    0x24, 0xe9, 0xe6, 0x72, 0x89, 0x2c, 0x52, 0x1b,
+		    0x34, 0x38, 0xf8, 0x75, 0x5f, 0xa0, 0x74, 0xe2,
+		    0x99, 0xdd, 0xa6, 0x4b, 0x14, 0x50, 0x4e, 0xf1,
+		    0xbe, 0xd6, 0x9e, 0xdb, 0xb2, 0x24, 0x27, 0x74,
+		    0x12, 0x4a, 0x78, 0x78, 0x17, 0xa5, 0x58, 0x8e,
+		    0x2f, 0xf9, 0xf4, 0x8d, 0xee, 0x03, 0x88, 0xae,
+		    0xb8, 0x29, 0xa1, 0x2f, 0x4b, 0xee, 0x92, 0xbd,
+		    0x87, 0xb3, 0xce, 0x34, 0x21, 0x57, 0x46, 0x04,
+		    0x49, 0x0c, 0x80, 0xf2, 0x01, 0x13, 0xa1, 0x55,
+		    0xb3, 0xff, 0x44, 0x30, 0x3c, 0x1c, 0xd0, 0xef,
+		    0xbc, 0x18, 0x74, 0x26, 0xad, 0x41, 0x5b, 0x5b,
+		    0x3e, 0x9a, 0x7a, 0x46, 0x4f, 0x16, 0xd6, 0x74,
+		    0x5a, 0xb7, 0x3a, 0x28, 0x31, 0xd8, 0xae, 0x26,
+		    0xac, 0x50, 0x53, 0x86, 0xf2, 0x56, 0xd7, 0x3f,
+		    0x29, 0xbc, 0x45, 0x68, 0x8e, 0xcb, 0x98, 0x64,
+		    0xdd, 0xc9, 0xba, 0xb8, 0x4b, 0x7b, 0x82, 0xdd,
+		    0x14, 0xa7, 0xcb, 0x71, 0x72, 0x00, 0x5c, 0xad,
+		    0x7b, 0x6a, 0x89, 0xa4, 0x3d, 0xbf, 0xb5, 0x4b,
+		    0x3e, 0x7c, 0x5a, 0xcf, 0xb8, 0xa1, 0xc5, 0x6e,
+		    0xc8, 0xb6, 0x31, 0x57, 0x7b, 0xdf, 0xa5, 0x7e,
+		    0xb1, 0xd6, 0x42, 0x2a, 0x31, 0x36, 0xd1, 0xd0,
+		    0x3f, 0x7a, 0xe5, 0x94, 0xd6, 0x36, 0xa0, 0x6f,
+		    0xb7, 0x40, 0x7d, 0x37, 0xc6, 0x55, 0x7c, 0x50,
+		    0x40, 0x6d, 0x29, 0x89, 0xe3, 0x5a, 0xae, 0x97,
+		    0xe7, 0x44, 0x49, 0x6e, 0xbd, 0x81, 0x3d, 0x03,
+		    0x93, 0x06, 0x12, 0x06, 0xe2, 0x41, 0x12, 0x4a,
+		    0xf1, 0x6a, 0xa4, 0x58, 0xa2, 0xfb, 0xd2, 0x15,
+		    0xba, 0xc9, 0x79, 0xc9, 0xce, 0x5e, 0x13, 0xbb,
+		    0xf1, 0x09, 0x04, 0xcc, 0xfd, 0xe8, 0x51, 0x34,
+		    0x6a, 0xe8, 0x61, 0x88, 0xda, 0xed, 0x01, 0x47,
+		    0x84, 0xf5, 0x73, 0x25, 0xf9, 0x1c, 0x42, 0x86,
+		    0x07, 0xf3, 0x5b, 0x1a, 0x01, 0xb3, 0xeb, 0x24,
+		    0x32, 0x8d, 0xf6, 0xed, 0x7c, 0x4b, 0xeb, 0x3c,
+		    0x36, 0x42, 0x28, 0xdf, 0xdf, 0xb6, 0xbe, 0xd9,
+		    0x8c, 0x52, 0xd3, 0x2b, 0x08, 0x90, 0x8c, 0xe7,
+		    0x98, 0x31, 0xe2, 0x32, 0x8e, 0xfc, 0x11, 0x48,
+		    0x00, 0xa8, 0x6a, 0x42, 0x4a, 0x02, 0xc6, 0x4b,
+		    0x09, 0xf1, 0xe3, 0x49, 0xf3, 0x45, 0x1f, 0x0e,
+		    0xbc, 0x56, 0xe2, 0xe4, 0xdf, 0xfb, 0xeb, 0x61,
+		    0xfa, 0x24, 0xc1, 0x63, 0x75, 0xbb, 0x47, 0x75,
+		    0xaf, 0xe1, 0x53, 0x16, 0x96, 0x21, 0x85, 0x26,
+		    0x11, 0xb3, 0x76, 0xe3, 0x23, 0xa1, 0x6b, 0x74,
+		    0x37, 0xd0, 0xde, 0x06, 0x90, 0x71, 0x5d, 0x43,
+		    0x88, 0x9b, 0x00, 0x54, 0xa6, 0x75, 0x2f, 0xa1,
+		    0xc2, 0x0b, 0x73, 0x20, 0x1d, 0xb6, 0x21, 0x79,
+		    0x57, 0x3f, 0xfa, 0x09, 0xbe, 0x8a, 0x33, 0xc3,
+		    0x52, 0xf0, 0x1d, 0x82, 0x31, 0xd1, 0x55, 0xb5,
+		    0x6c, 0x99, 0x25, 0xcf, 0x5c, 0x32, 0xce, 0xe9,
+		    0x0d, 0xfa, 0x69, 0x2c, 0xd5, 0x0d, 0xc5, 0x6d,
+		    0x86, 0xd0, 0x0c, 0x3b, 0x06, 0x50, 0x79, 0xe8,
+		    0xc3, 0xae, 0x04, 0xe6, 0xcd, 0x51, 0xe4, 0x26,
+		    0x9b, 0x4f, 0x7e, 0xa6, 0x0f, 0xab, 0xd8, 0xe5,
+		    0xde, 0xa9, 0x00, 0x95, 0xbe, 0xa3, 0x9d, 0x5d,
+		    0xb2, 0x09, 0x70, 0x18, 0x1c, 0xf0, 0xac, 0x29,
+		    0x23, 0x02, 0x29, 0x28, 0xd2, 0x74, 0x35, 0x57,
+		    0x62, 0x0f, 0x24, 0xea, 0x5e, 0x33, 0xc2, 0x92,
+		    0xf3, 0x78, 0x4d, 0x30, 0x1e, 0xa1, 0x99, 0xa9,
+		    0x82, 0xb0, 0x42, 0x31, 0x8d, 0xad, 0x8a, 0xbc,
+		    0xfc, 0xd4, 0x57, 0x47, 0x3e, 0xb4, 0x50, 0xdd,
+		    0x6e, 0x2c, 0x80, 0x4d, 0x22, 0xf1, 0xfb, 0x57,
+		    0xc4, 0xdd, 0x17, 0xe1, 0x8a, 0x36, 0x4a, 0xb3,
+		    0x37, 0xca, 0xc9, 0x4e, 0xab, 0xd5, 0x69, 0xc4,
+		    0xf4, 0xbc, 0x0b, 0x3b, 0x44, 0x4b, 0x29, 0x9c,
+		    0xee, 0xd4, 0x35, 0x22, 0x21, 0xb0, 0x1f, 0x27,
+		    0x64, 0xa8, 0x51, 0x1b, 0xf0, 0x9f, 0x19, 0x5c,
+		    0xfb, 0x5a, 0x64, 0x74, 0x70, 0x45, 0x09, 0xf5,
+		    0x64, 0xfe, 0x1a, 0x2d, 0xc9, 0x14, 0x04, 0x14,
+		    0xcf, 0xd5, 0x7d, 0x60, 0xaf, 0x94, 0x39, 0x94,
+		    0xe2, 0x7d, 0x79, 0x82, 0xd0, 0x65, 0x3b, 0x6b,
+		    0x9c, 0x19, 0x84, 0xb4, 0x6d, 0xb3, 0x0c, 0x99,
+		    0xc0, 0x56, 0xa8, 0xbd, 0x73, 0xce, 0x05, 0x84,
+		    0x3e, 0x30, 0xaa, 0xc4, 0x9b, 0x1b, 0x04, 0x2a,
+		    0x9f, 0xd7, 0x43, 0x2b, 0x23, 0xdf, 0xbf, 0xaa,
+		    0xd5, 0xc2, 0x43, 0x2d, 0x70, 0xab, 0xdc, 0x75,
+		    0xad, 0xac, 0xf7, 0xc0, 0xbe, 0x67, 0xb2, 0x74,
+		    0xed, 0x67, 0x10, 0x4a, 0x92, 0x60, 0xc1, 0x40,
+		    0x50, 0x19, 0x8a, 0x8a, 0x8c, 0x09, 0x0e, 0x72,
+		    0xe1, 0x73, 0x5e, 0xe8, 0x41, 0x85, 0x63, 0x9f,
+		    0x3f, 0xd7, 0x7d, 0xc4, 0xfb, 0x22, 0x5d, 0x92,
+		    0x6c, 0xb3, 0x1e, 0xe2, 0x50, 0x2f, 0x82, 0xa8,
+		    0x28, 0xc0, 0xb5, 0xd7, 0x5f, 0x68, 0x0d, 0x2c,
+		    0x2d, 0xaf, 0x7e, 0xfa, 0x2e, 0x08, 0x0f, 0x1f,
+		    0x70, 0x9f, 0xe9, 0x19, 0x72, 0x55, 0xf8, 0xfb,
+		    0x51, 0xd2, 0x33, 0x5d, 0xa0, 0xd3, 0x2b, 0x0a,
+		    0x6c, 0xbc, 0x4e, 0xcf, 0x36, 0x4d, 0xdc, 0x3b,
+		    0xe9, 0x3e, 0x81, 0x7c, 0x61, 0xdb, 0x20, 0x2d,
+		    0x3a, 0xc3, 0xb3, 0x0c, 0x1e, 0x00, 0xb9, 0x7c,
+		    0xf5, 0xca, 0x10, 0x5f, 0x3a, 0x71, 0xb3, 0xe4,
+		    0x20, 0xdb, 0x0c, 0x2a, 0x98, 0x63, 0x45, 0x00,
+		    0x58, 0xf6, 0x68, 0xe4, 0x0b, 0xda, 0x13, 0x3b,
+		    0x60, 0x5c, 0x76, 0xdb, 0xb9, 0x97, 0x71, 0xe4,
+		    0xd9, 0xb7, 0xdb, 0xbd, 0x68, 0xc7, 0x84, 0x84,
+		    0xaa, 0x7c, 0x68, 0x62, 0x5e, 0x16, 0xfc, 0xba,
+		    0x72, 0xaa, 0x9a, 0xa9, 0xeb, 0x7c, 0x75, 0x47,
+		    0x97, 0x7e, 0xad, 0xe2, 0xd9, 0x91, 0xe8, 0xe4,
+		    0xa5, 0x31, 0xd7, 0x01, 0x8e, 0xa2, 0x11, 0x88,
+		    0x95, 0xb9, 0xf2, 0x9b, 0xd3, 0x7f, 0x1b, 0x81,
+		    0x22, 0xf7, 0x98, 0x60, 0x0a, 0x64, 0xa6, 0xc1,
+		    0xf6, 0x49, 0xc7, 0xe3, 0x07, 0x4d, 0x94, 0x7a,
+		    0xcf, 0x6e, 0x68, 0x0c, 0x1b, 0x3f, 0x6e, 0x2e,
+		    0xee, 0x92, 0xfa, 0x52, 0xb3, 0x59, 0xf8, 0xf1,
+		    0x8f, 0x6a, 0x66, 0xa3, 0x82, 0x76, 0x4a, 0x07,
+		    0x1a, 0xc7, 0xdd, 0xf5, 0xda, 0x9c, 0x3c, 0x24,
+		    0xbf, 0xfd, 0x42, 0xa1, 0x10, 0x64, 0x6a, 0x0f,
+		    0x89, 0xee, 0x36, 0xa5, 0xce, 0x99, 0x48, 0x6a,
+		    0xf0, 0x9f, 0x9e, 0x69, 0xa4, 0x40, 0x20, 0xe9,
+		    0x16, 0x15, 0xf7, 0xdb, 0x75, 0x02, 0xcb, 0xe9,
+		    0x73, 0x8b, 0x3b, 0x49, 0x2f, 0xf0, 0xaf, 0x51,
+		    0x06, 0x5c, 0xdf, 0x27, 0x27, 0x49, 0x6a, 0xd1,
+		    0xcc, 0xc7, 0xb5, 0x63, 0xb5, 0xfc, 0xb8, 0x5c,
+		    0x87, 0x7f, 0x84, 0xb4, 0xcc, 0x14, 0xa9, 0x53,
+		    0xda, 0xa4, 0x56, 0xf8, 0xb6, 0x1b, 0xcc, 0x40,
+		    0x27, 0x52, 0x06, 0x5a, 0x13, 0x81, 0xd7, 0x3a,
+		    0xd4, 0x3b, 0xfb, 0x49, 0x65, 0x31, 0x33, 0xb2,
+		    0xfa, 0xcd, 0xad, 0x58, 0x4e, 0x2b, 0xae, 0xd2,
+		    0x20, 0xfb, 0x1a, 0x48, 0xb4, 0x3f, 0x9a, 0xd8,
+		    0x7a, 0x35, 0x4a, 0xc8, 0xee, 0x88, 0x5e, 0x07,
+		    0x66, 0x54, 0xb9, 0xec, 0x9f, 0xa3, 0xe3, 0xb9,
+		    0x37, 0xaa, 0x49, 0x76, 0x31, 0xda, 0x74, 0x2d,
+		    0x3c, 0xa4, 0x65, 0x10, 0x32, 0x38, 0xf0, 0xde,
+		    0xd3, 0x99, 0x17, 0xaa, 0x71, 0xaa, 0x8f, 0x0f,
+		    0x8c, 0xaf, 0xa2, 0xf8, 0x5d, 0x64, 0xba, 0x1d,
+		    0xa3, 0xef, 0x96, 0x73, 0xe8, 0xa1, 0x02, 0x8d,
+		    0x0c, 0x6d, 0xb8, 0x06, 0x90, 0xb8, 0x08, 0x56,
+		    0x2c, 0xa7, 0x06, 0xc9, 0xc2, 0x38, 0xdb, 0x7c,
+		    0x63, 0xb1, 0x57, 0x8e, 0xea, 0x7c, 0x79, 0xf3,
+		    0x49, 0x1d, 0xfe, 0x9f, 0xf3, 0x6e, 0xb1, 0x1d,
+		    0xba, 0x19, 0x80, 0x1a, 0x0a, 0xd3, 0xb0, 0x26,
+		    0x21, 0x40, 0xb1, 0x7c, 0xf9, 0x4d, 0x8d, 0x10,
+		    0xc1, 0x7e, 0xf4, 0xf6, 0x3c, 0xa8, 0xfd, 0x7c,
+		    0xa3, 0x92, 0xb2, 0x0f, 0xaa, 0xcc, 0xa6, 0x11,
+		    0xfe, 0x04, 0xe3, 0xd1, 0x7a, 0x32, 0x89, 0xdf,
+		    0x0d, 0xc4, 0x8f, 0x79, 0x6b, 0xca, 0x16, 0x7c,
+		    0x6e, 0xf9, 0xad, 0x0f, 0xf6, 0xfe, 0x27, 0xdb,
+		    0xc4, 0x13, 0x70, 0xf1, 0x62, 0x1a, 0x4f, 0x79,
+		    0x40, 0xc9, 0x9b, 0x8b, 0x21, 0xea, 0x84, 0xfa,
+		    0xf5, 0xf1, 0x89, 0xce, 0xb7, 0x55, 0x0a, 0x80,
+		    0x39, 0x2f, 0x55, 0x36, 0x16, 0x9c, 0x7b, 0x08,
+		    0xbd, 0x87, 0x0d, 0xa5, 0x32, 0xf1, 0x52, 0x7c,
+		    0xe8, 0x55, 0x60, 0x5b, 0xd7, 0x69, 0xe4, 0xfc,
+		    0xfa, 0x12, 0x85, 0x96, 0xea, 0x50, 0x28, 0xab,
+		    0x8a, 0xf7, 0xbb, 0x0e, 0x53, 0x74, 0xca, 0xa6,
+		    0x27, 0x09, 0xc2, 0xb5, 0xde, 0x18, 0x14, 0xd9,
+		    0xea, 0xe5, 0x29, 0x1c, 0x40, 0x56, 0xcf, 0xd7,
+		    0xae, 0x05, 0x3f, 0x65, 0xaf, 0x05, 0x73, 0xe2,
+		    0x35, 0x96, 0x27, 0x07, 0x14, 0xc0, 0xad, 0x33,
+		    0xf1, 0xdc, 0x44, 0x7a, 0x89, 0x17, 0x77, 0xd2,
+		    0x9c, 0x58, 0x60, 0xf0, 0x3f, 0x7b, 0x2d, 0x2e,
+		    0x57, 0x95, 0x54, 0x87, 0xed, 0xf2, 0xc7, 0x4c,
+		    0xf0, 0xae, 0x56, 0x29, 0x19, 0x7d, 0x66, 0x4b,
+		    0x9b, 0x83, 0x84, 0x42, 0x3b, 0x01, 0x25, 0x66,
+		    0x8e, 0x02, 0xde, 0xb9, 0x83, 0x54, 0x19, 0xf6,
+		    0x9f, 0x79, 0x0d, 0x67, 0xc5, 0x1d, 0x7a, 0x44,
+		    0x02, 0x98, 0xa7, 0x16, 0x1c, 0x29, 0x0d, 0x74,
+		    0xff, 0x85, 0x40, 0x06, 0xef, 0x2c, 0xa9, 0xc6,
+		    0xf5, 0x53, 0x07, 0x06, 0xae, 0xe4, 0xfa, 0x5f,
+		    0xd8, 0x39, 0x4d, 0xf1, 0x9b, 0x6b, 0xd9, 0x24,
+		    0x84, 0xfe, 0x03, 0x4c, 0xb2, 0x3f, 0xdf, 0xa1,
+		    0x05, 0x9e, 0x50, 0x14, 0x5a, 0xd9, 0x1a, 0xa2,
+		    0xa7, 0xfa, 0xfa, 0x17, 0xf7, 0x78, 0xd6, 0xb5,
+		    0x92, 0x61, 0x91, 0xac, 0x36, 0xfa, 0x56, 0x0d,
+		    0x38, 0x32, 0x18, 0x85, 0x08, 0x58, 0x37, 0xf0,
+		    0x4b, 0xdb, 0x59, 0xe7, 0xa4, 0x34, 0xc0, 0x1b,
+		    0x01, 0xaf, 0x2d, 0xde, 0xa1, 0xaa, 0x5d, 0xd3,
+		    0xec, 0xe1, 0xd4, 0xf7, 0xe6, 0x54, 0x68, 0xf0,
+		    0x51, 0x97, 0xa7, 0x89, 0xea, 0x24, 0xad, 0xd3,
+		    0x6e, 0x47, 0x93, 0x8b, 0x4b, 0xb4, 0xf7, 0x1c,
+		    0x42, 0x06, 0x67, 0xe8, 0x99, 0xf6, 0xf5, 0x7b,
+		    0x85, 0xb5, 0x65, 0xb5, 0xb5, 0xd2, 0x37, 0xf5,
+		    0xf3, 0x02, 0xa6, 0x4d, 0x11, 0xa7, 0xdc, 0x51,
+		    0x09, 0x7f, 0xa0, 0xd8, 0x88, 0x1c, 0x13, 0x71,
+		    0xae, 0x9c, 0xb7, 0x7b, 0x34, 0xd6, 0x4e, 0x68,
+		    0x26, 0x83, 0x51, 0xaf, 0x1d, 0xee, 0x8b, 0xbb,
+		    0x69, 0x43, 0x2b, 0x9e, 0x8a, 0xbc, 0x02, 0x0e,
+		    0xa0, 0x1b, 0xe0, 0xa8, 0x5f, 0x6f, 0xaf, 0x1b,
+		    0x8f, 0xe7, 0x64, 0x71, 0x74, 0x11, 0x7e, 0xa8,
+		    0xd8, 0xf9, 0x97, 0x06, 0xc3, 0xb6, 0xfb, 0xfb,
+		    0xb7, 0x3d, 0x35, 0x9d, 0x3b, 0x52, 0xed, 0x54,
+		    0xca, 0xf4, 0x81, 0x01, 0x2d, 0x1b, 0xc3, 0xa7,
+		    0x00, 0x3d, 0x1a, 0x39, 0x54, 0xe1, 0xf6, 0xff,
+		    0xed, 0x6f, 0x0b, 0x5a, 0x68, 0xda, 0x58, 0xdd,
+		    0xa9, 0xcf, 0x5c, 0x4a, 0xe5, 0x09, 0x4e, 0xde,
+		    0x9d, 0xbc, 0x3e, 0xee, 0x5a, 0x00, 0x3b, 0x2c,
+		    0x87, 0x10, 0x65, 0x60, 0xdd, 0xd7, 0x56, 0xd1,
+		    0x4c, 0x64, 0x45, 0xe4, 0x21, 0xec, 0x78, 0xf8,
+		    0x25, 0x7a, 0x3e, 0x16, 0x5d, 0x09, 0x53, 0x14,
+		    0xbe, 0x4f, 0xae, 0x87, 0xd8, 0xd1, 0xaa, 0x3c,
+		    0xf6, 0x3e, 0xa4, 0x70, 0x8c, 0x5e, 0x70, 0xa4,
+		    0xb3, 0x6b, 0x66, 0x73, 0xd3, 0xbf, 0x31, 0x06,
+		    0x19, 0x62, 0x93, 0x15, 0xf2, 0x86, 0xe4, 0x52,
+		    0x7e, 0x53, 0x4c, 0x12, 0x38, 0xcc, 0x34, 0x7d,
+		    0x57, 0xf6, 0x42, 0x93, 0x8a, 0xc4, 0xee, 0x5c,
+		    0x8a, 0xe1, 0x52, 0x8f, 0x56, 0x64, 0xf6, 0xa6,
+		    0xd1, 0x91, 0x57, 0x70, 0xcd, 0x11, 0x76, 0xf5,
+		    0x59, 0x60, 0x60, 0x3c, 0xc1, 0xc3, 0x0b, 0x7f,
+		    0x58, 0x1a, 0x50, 0x91, 0xf1, 0x68, 0x8f, 0x6e,
+		    0x74, 0x74, 0xa8, 0x51, 0x0b, 0xf7, 0x7a, 0x98,
+		    0x37, 0xf2, 0x0a, 0x0e, 0xa4, 0x97, 0x04, 0xb8,
+		    0x9b, 0xfd, 0xa0, 0xea, 0xf7, 0x0d, 0xe1, 0xdb,
+		    0x03, 0xf0, 0x31, 0x29, 0xf8, 0xdd, 0x6b, 0x8b,
+		    0x5d, 0xd8, 0x59, 0xa9, 0x29, 0xcf, 0x9a, 0x79,
+		    0x89, 0x19, 0x63, 0x46, 0x09, 0x79, 0x6a, 0x11,
+		    0xda, 0x63, 0x68, 0x48, 0x77, 0x23, 0xfb, 0x7d,
+		    0x3a, 0x43, 0xcb, 0x02, 0x3b, 0x7a, 0x6d, 0x10,
+		    0x2a, 0x9e, 0xac, 0xf1, 0xd4, 0x19, 0xf8, 0x23,
+		    0x64, 0x1d, 0x2c, 0x5f, 0xf2, 0xb0, 0x5c, 0x23,
+		    0x27, 0xf7, 0x27, 0x30, 0x16, 0x37, 0xb1, 0x90,
+		    0xab, 0x38, 0xfb, 0x55, 0xcd, 0x78, 0x58, 0xd4,
+		    0x7d, 0x43, 0xf6, 0x45, 0x5e, 0x55, 0x8d, 0xb1,
+		    0x02, 0x65, 0x58, 0xb4, 0x13, 0x4b, 0x36, 0xf7,
+		    0xcc, 0xfe, 0x3d, 0x0b, 0x82, 0xe2, 0x12, 0x11,
+		    0xbb, 0xe6, 0xb8, 0x3a, 0x48, 0x71, 0xc7, 0x50,
+		    0x06, 0x16, 0x3a, 0xe6, 0x7c, 0x05, 0xc7, 0xc8,
+		    0x4d, 0x2f, 0x08, 0x6a, 0x17, 0x9a, 0x95, 0x97,
+		    0x50, 0x68, 0xdc, 0x28, 0x18, 0xc4, 0x61, 0x38,
+		    0xb9, 0xe0, 0x3e, 0x78, 0xdb, 0x29, 0xe0, 0x9f,
+		    0x52, 0xdd, 0xf8, 0x4f, 0x91, 0xc1, 0xd0, 0x33,
+		    0xa1, 0x7a, 0x8e, 0x30, 0x13, 0x82, 0x07, 0x9f,
+		    0xd3, 0x31, 0x0f, 0x23, 0xbe, 0x32, 0x5a, 0x75,
+		    0xcf, 0x96, 0xb2, 0xec, 0xb5, 0x32, 0xac, 0x21,
+		    0xd1, 0x82, 0x33, 0xd3, 0x15, 0x74, 0xbd, 0x90,
+		    0xf1, 0x2c, 0xe6, 0x5f, 0x8d, 0xe3, 0x02, 0xe8,
+		    0xe9, 0xc4, 0xca, 0x96, 0xeb, 0x0e, 0xbc, 0x91,
+		    0xf4, 0xb9, 0xea, 0xd9, 0x1b, 0x75, 0xbd, 0xe1,
+		    0xac, 0x2a, 0x05, 0x37, 0x52, 0x9b, 0x1b, 0x3f,
+		    0x5a, 0xdc, 0x21, 0xc3, 0x98, 0xbb, 0xaf, 0xa3,
+		    0xf2, 0x00, 0xbf, 0x0d, 0x30, 0x89, 0x05, 0xcc,
+		    0xa5, 0x76, 0xf5, 0x06, 0xf0, 0xc6, 0x54, 0x8a,
+		    0x5d, 0xd4, 0x1e, 0xc1, 0xf2, 0xce, 0xb0, 0x62,
+		    0xc8, 0xfc, 0x59, 0x42, 0x9a, 0x90, 0x60, 0x55,
+		    0xfe, 0x88, 0xa5, 0x8b, 0xb8, 0x33, 0x0c, 0x23,
+		    0x24, 0x0d, 0x15, 0x70, 0x37, 0x1e, 0x3d, 0xf6,
+		    0xd2, 0xea, 0x92, 0x10, 0xb2, 0xc4, 0x51, 0xac,
+		    0xf2, 0xac, 0xf3, 0x6b, 0x6c, 0xaa, 0xcf, 0x12,
+		    0xc5, 0x6c, 0x90, 0x50, 0xb5, 0x0c, 0xfc, 0x1a,
+		    0x15, 0x52, 0xe9, 0x26, 0xc6, 0x52, 0xa4, 0xe7,
+		    0x81, 0x69, 0xe1, 0xe7, 0x9e, 0x30, 0x01, 0xec,
+		    0x84, 0x89, 0xb2, 0x0d, 0x66, 0xdd, 0xce, 0x28,
+		    0x5c, 0xec, 0x98, 0x46, 0x68, 0x21, 0x9f, 0x88,
+		    0x3f, 0x1f, 0x42, 0x77, 0xce, 0xd0, 0x61, 0xd4,
+		    0x20, 0xa7, 0xff, 0x53, 0xad, 0x37, 0xd0, 0x17,
+		    0x35, 0xc9, 0xfc, 0xba, 0x0a, 0x78, 0x3f, 0xf2,
+		    0xcc, 0x86, 0x89, 0xe8, 0x4b, 0x3c, 0x48, 0x33,
+		    0x09, 0x7f, 0xc6, 0xc0, 0xdd, 0xb8, 0xfd, 0x7a,
+		    0x66, 0x66, 0x65, 0xeb, 0x47, 0xa7, 0x04, 0x28,
+		    0xa3, 0x19, 0x8e, 0xa9, 0xb1, 0x13, 0x67, 0x62,
+		    0x70, 0xcf, 0xd7 },
+	.ilen	= 2027,
+	.result	= { 0x74, 0xa6, 0x3e, 0xe4, 0xb1, 0xcb, 0xaf, 0xb0,
+		    0x40, 0xe5, 0x0f, 0x9e, 0xf1, 0xf2, 0x89, 0xb5,
+		    0x42, 0x34, 0x8a, 0xa1, 0x03, 0xb7, 0xe9, 0x57,
+		    0x46, 0xbe, 0x20, 0xe4, 0x6e, 0xb0, 0xeb, 0xff,
+		    0xea, 0x07, 0x7e, 0xef, 0xe2, 0x55, 0x9f, 0xe5,
+		    0x78, 0x3a, 0xb7, 0x83, 0xc2, 0x18, 0x40, 0x7b,
+		    0xeb, 0xcd, 0x81, 0xfb, 0x90, 0x12, 0x9e, 0x46,
+		    0xa9, 0xd6, 0x4a, 0xba, 0xb0, 0x62, 0xdb, 0x6b,
+		    0x99, 0xc4, 0xdb, 0x54, 0x4b, 0xb8, 0xa5, 0x71,
+		    0xcb, 0xcd, 0x63, 0x32, 0x55, 0xfb, 0x31, 0xf0,
+		    0x38, 0xf5, 0xbe, 0x78, 0xe4, 0x45, 0xce, 0x1b,
+		    0x6a, 0x5b, 0x0e, 0xf4, 0x16, 0xe4, 0xb1, 0x3d,
+		    0xf6, 0x63, 0x7b, 0xa7, 0x0c, 0xde, 0x6f, 0x8f,
+		    0x74, 0xdf, 0xe0, 0x1e, 0x9d, 0xce, 0x8f, 0x24,
+		    0xef, 0x23, 0x35, 0x33, 0x7b, 0x83, 0x34, 0x23,
+		    0x58, 0x74, 0x14, 0x77, 0x1f, 0xc2, 0x4f, 0x4e,
+		    0xc6, 0x89, 0xf9, 0x52, 0x09, 0x37, 0x64, 0x14,
+		    0xc4, 0x01, 0x6b, 0x9d, 0x77, 0xe8, 0x90, 0x5d,
+		    0xa8, 0x4a, 0x2a, 0xef, 0x5c, 0x7f, 0xeb, 0xbb,
+		    0xb2, 0xc6, 0x93, 0x99, 0x66, 0xdc, 0x7f, 0xd4,
+		    0x9e, 0x2a, 0xca, 0x8d, 0xdb, 0xe7, 0x20, 0xcf,
+		    0xe4, 0x73, 0xae, 0x49, 0x7d, 0x64, 0x0f, 0x0e,
+		    0x28, 0x46, 0xa9, 0xa8, 0x32, 0xe4, 0x0e, 0xf6,
+		    0x51, 0x53, 0xb8, 0x3c, 0xb1, 0xff, 0xa3, 0x33,
+		    0x41, 0x75, 0xff, 0xf1, 0x6f, 0xf1, 0xfb, 0xbb,
+		    0x83, 0x7f, 0x06, 0x9b, 0xe7, 0x1b, 0x0a, 0xe0,
+		    0x5c, 0x33, 0x60, 0x5b, 0xdb, 0x5b, 0xed, 0xfe,
+		    0xa5, 0x16, 0x19, 0x72, 0xa3, 0x64, 0x23, 0x00,
+		    0x02, 0xc7, 0xf3, 0x6a, 0x81, 0x3e, 0x44, 0x1d,
+		    0x79, 0x15, 0x5f, 0x9a, 0xde, 0xe2, 0xfd, 0x1b,
+		    0x73, 0xc1, 0xbc, 0x23, 0xba, 0x31, 0xd2, 0x50,
+		    0xd5, 0xad, 0x7f, 0x74, 0xa7, 0xc9, 0xf8, 0x3e,
+		    0x2b, 0x26, 0x10, 0xf6, 0x03, 0x36, 0x74, 0xe4,
+		    0x0e, 0x6a, 0x72, 0xb7, 0x73, 0x0a, 0x42, 0x28,
+		    0xc2, 0xad, 0x5e, 0x03, 0xbe, 0xb8, 0x0b, 0xa8,
+		    0x5b, 0xd4, 0xb8, 0xba, 0x52, 0x89, 0xb1, 0x9b,
+		    0xc1, 0xc3, 0x65, 0x87, 0xed, 0xa5, 0xf4, 0x86,
+		    0xfd, 0x41, 0x80, 0x91, 0x27, 0x59, 0x53, 0x67,
+		    0x15, 0x78, 0x54, 0x8b, 0x2d, 0x3d, 0xc7, 0xff,
+		    0x02, 0x92, 0x07, 0x5f, 0x7a, 0x4b, 0x60, 0x59,
+		    0x3c, 0x6f, 0x5c, 0xd8, 0xec, 0x95, 0xd2, 0xfe,
+		    0xa0, 0x3b, 0xd8, 0x3f, 0xd1, 0x69, 0xa6, 0xd6,
+		    0x41, 0xb2, 0xf4, 0x4d, 0x12, 0xf4, 0x58, 0x3e,
+		    0x66, 0x64, 0x80, 0x31, 0x9b, 0xa8, 0x4c, 0x8b,
+		    0x07, 0xb2, 0xec, 0x66, 0x94, 0x66, 0x47, 0x50,
+		    0x50, 0x5f, 0x18, 0x0b, 0x0e, 0xd6, 0xc0, 0x39,
+		    0x21, 0x13, 0x9e, 0x33, 0xbc, 0x79, 0x36, 0x02,
+		    0x96, 0x70, 0xf0, 0x48, 0x67, 0x2f, 0x26, 0xe9,
+		    0x6d, 0x10, 0xbb, 0xd6, 0x3f, 0xd1, 0x64, 0x7a,
+		    0x2e, 0xbe, 0x0c, 0x61, 0xf0, 0x75, 0x42, 0x38,
+		    0x23, 0xb1, 0x9e, 0x9f, 0x7c, 0x67, 0x66, 0xd9,
+		    0x58, 0x9a, 0xf1, 0xbb, 0x41, 0x2a, 0x8d, 0x65,
+		    0x84, 0x94, 0xfc, 0xdc, 0x6a, 0x50, 0x64, 0xdb,
+		    0x56, 0x33, 0x76, 0x00, 0x10, 0xed, 0xbe, 0xd2,
+		    0x12, 0xf6, 0xf6, 0x1b, 0xa2, 0x16, 0xde, 0xae,
+		    0x31, 0x95, 0xdd, 0xb1, 0x08, 0x7e, 0x4e, 0xee,
+		    0xe7, 0xf9, 0xa5, 0xfb, 0x5b, 0x61, 0x43, 0x00,
+		    0x40, 0xf6, 0x7e, 0x02, 0x04, 0x32, 0x4e, 0x0c,
+		    0xe2, 0x66, 0x0d, 0xd7, 0x07, 0x98, 0x0e, 0xf8,
+		    0x72, 0x34, 0x6d, 0x95, 0x86, 0xd7, 0xcb, 0x31,
+		    0x54, 0x47, 0xd0, 0x38, 0x29, 0x9c, 0x5a, 0x68,
+		    0xd4, 0x87, 0x76, 0xc9, 0xe7, 0x7e, 0xe3, 0xf4,
+		    0x81, 0x6d, 0x18, 0xcb, 0xc9, 0x05, 0xaf, 0xa0,
+		    0xfb, 0x66, 0xf7, 0xf1, 0x1c, 0xc6, 0x14, 0x11,
+		    0x4f, 0x2b, 0x79, 0x42, 0x8b, 0xbc, 0xac, 0xe7,
+		    0x6c, 0xfe, 0x0f, 0x58, 0xe7, 0x7c, 0x78, 0x39,
+		    0x30, 0xb0, 0x66, 0x2c, 0x9b, 0x6d, 0x3a, 0xe1,
+		    0xcf, 0xc9, 0xa4, 0x0e, 0x6d, 0x6d, 0x8a, 0xa1,
+		    0x3a, 0xe7, 0x28, 0xd4, 0x78, 0x4c, 0xa6, 0xa2,
+		    0x2a, 0xa6, 0x03, 0x30, 0xd7, 0xa8, 0x25, 0x66,
+		    0x87, 0x2f, 0x69, 0x5c, 0x4e, 0xdd, 0xa5, 0x49,
+		    0x5d, 0x37, 0x4a, 0x59, 0xc4, 0xaf, 0x1f, 0xa2,
+		    0xe4, 0xf8, 0xa6, 0x12, 0x97, 0xd5, 0x79, 0xf5,
+		    0xe2, 0x4a, 0x2b, 0x5f, 0x61, 0xe4, 0x9e, 0xe3,
+		    0xee, 0xb8, 0xa7, 0x5b, 0x2f, 0xf4, 0x9e, 0x6c,
+		    0xfb, 0xd1, 0xc6, 0x56, 0x77, 0xba, 0x75, 0xaa,
+		    0x3d, 0x1a, 0xa8, 0x0b, 0xb3, 0x68, 0x24, 0x00,
+		    0x10, 0x7f, 0xfd, 0xd7, 0xa1, 0x8d, 0x83, 0x54,
+		    0x4f, 0x1f, 0xd8, 0x2a, 0xbe, 0x8a, 0x0c, 0x87,
+		    0xab, 0xa2, 0xde, 0xc3, 0x39, 0xbf, 0x09, 0x03,
+		    0xa5, 0xf3, 0x05, 0x28, 0xe1, 0xe1, 0xee, 0x39,
+		    0x70, 0x9c, 0xd8, 0x81, 0x12, 0x1e, 0x02, 0x40,
+		    0xd2, 0x6e, 0xf0, 0xeb, 0x1b, 0x3d, 0x22, 0xc6,
+		    0xe5, 0xe3, 0xb4, 0x5a, 0x98, 0xbb, 0xf0, 0x22,
+		    0x28, 0x8d, 0xe5, 0xd3, 0x16, 0x48, 0x24, 0xa5,
+		    0xe6, 0x66, 0x0c, 0xf9, 0x08, 0xf9, 0x7e, 0x1e,
+		    0xe1, 0x28, 0x26, 0x22, 0xc7, 0xc7, 0x0a, 0x32,
+		    0x47, 0xfa, 0xa3, 0xbe, 0x3c, 0xc4, 0xc5, 0x53,
+		    0x0a, 0xd5, 0x94, 0x4a, 0xd7, 0x93, 0xd8, 0x42,
+		    0x99, 0xb9, 0x0a, 0xdb, 0x56, 0xf7, 0xb9, 0x1c,
+		    0x53, 0x4f, 0xfa, 0xd3, 0x74, 0xad, 0xd9, 0x68,
+		    0xf1, 0x1b, 0xdf, 0x61, 0xc6, 0x5e, 0xa8, 0x48,
+		    0xfc, 0xd4, 0x4a, 0x4c, 0x3c, 0x32, 0xf7, 0x1c,
+		    0x96, 0x21, 0x9b, 0xf9, 0xa3, 0xcc, 0x5a, 0xce,
+		    0xd5, 0xd7, 0x08, 0x24, 0xf6, 0x1c, 0xfd, 0xdd,
+		    0x38, 0xc2, 0x32, 0xe9, 0xb8, 0xe7, 0xb6, 0xfa,
+		    0x9d, 0x45, 0x13, 0x2c, 0x83, 0xfd, 0x4a, 0x69,
+		    0x82, 0xcd, 0xdc, 0xb3, 0x76, 0x0c, 0x9e, 0xd8,
+		    0xf4, 0x1b, 0x45, 0x15, 0xb4, 0x97, 0xe7, 0x58,
+		    0x34, 0xe2, 0x03, 0x29, 0x5a, 0xbf, 0xb6, 0xe0,
+		    0x5d, 0x13, 0xd9, 0x2b, 0xb4, 0x80, 0xb2, 0x45,
+		    0x81, 0x6a, 0x2e, 0x6c, 0x89, 0x7d, 0xee, 0xbb,
+		    0x52, 0xdd, 0x1f, 0x18, 0xe7, 0x13, 0x6b, 0x33,
+		    0x0e, 0xea, 0x36, 0x92, 0x77, 0x7b, 0x6d, 0x9c,
+		    0x5a, 0x5f, 0x45, 0x7b, 0x7b, 0x35, 0x62, 0x23,
+		    0xd1, 0xbf, 0x0f, 0xd0, 0x08, 0x1b, 0x2b, 0x80,
+		    0x6b, 0x7e, 0xf1, 0x21, 0x47, 0xb0, 0x57, 0xd1,
+		    0x98, 0x72, 0x90, 0x34, 0x1c, 0x20, 0x04, 0xff,
+		    0x3d, 0x5c, 0xee, 0x0e, 0x57, 0x5f, 0x6f, 0x24,
+		    0x4e, 0x3c, 0xea, 0xfc, 0xa5, 0xa9, 0x83, 0xc9,
+		    0x61, 0xb4, 0x51, 0x24, 0xf8, 0x27, 0x5e, 0x46,
+		    0x8c, 0xb1, 0x53, 0x02, 0x96, 0x35, 0xba, 0xb8,
+		    0x4c, 0x71, 0xd3, 0x15, 0x59, 0x35, 0x22, 0x20,
+		    0xad, 0x03, 0x9f, 0x66, 0x44, 0x3b, 0x9c, 0x35,
+		    0x37, 0x1f, 0x9b, 0xbb, 0xf3, 0xdb, 0x35, 0x63,
+		    0x30, 0x64, 0xaa, 0xa2, 0x06, 0xa8, 0x5d, 0xbb,
+		    0xe1, 0x9f, 0x70, 0xec, 0x82, 0x11, 0x06, 0x36,
+		    0xec, 0x8b, 0x69, 0x66, 0x24, 0x44, 0xc9, 0x4a,
+		    0x57, 0xbb, 0x9b, 0x78, 0x13, 0xce, 0x9c, 0x0c,
+		    0xba, 0x92, 0x93, 0x63, 0xb8, 0xe2, 0x95, 0x0f,
+		    0x0f, 0x16, 0x39, 0x52, 0xfd, 0x3a, 0x6d, 0x02,
+		    0x4b, 0xdf, 0x13, 0xd3, 0x2a, 0x22, 0xb4, 0x03,
+		    0x7c, 0x54, 0x49, 0x96, 0x68, 0x54, 0x10, 0xfa,
+		    0xef, 0xaa, 0x6c, 0xe8, 0x22, 0xdc, 0x71, 0x16,
+		    0x13, 0x1a, 0xf6, 0x28, 0xe5, 0x6d, 0x77, 0x3d,
+		    0xcd, 0x30, 0x63, 0xb1, 0x70, 0x52, 0xa1, 0xc5,
+		    0x94, 0x5f, 0xcf, 0xe8, 0xb8, 0x26, 0x98, 0xf7,
+		    0x06, 0xa0, 0x0a, 0x70, 0xfa, 0x03, 0x80, 0xac,
+		    0xc1, 0xec, 0xd6, 0x4c, 0x54, 0xd7, 0xfe, 0x47,
+		    0xb6, 0x88, 0x4a, 0xf7, 0x71, 0x24, 0xee, 0xf3,
+		    0xd2, 0xc2, 0x4a, 0x7f, 0xfe, 0x61, 0xc7, 0x35,
+		    0xc9, 0x37, 0x67, 0xcb, 0x24, 0x35, 0xda, 0x7e,
+		    0xca, 0x5f, 0xf3, 0x8d, 0xd4, 0x13, 0x8e, 0xd6,
+		    0xcb, 0x4d, 0x53, 0x8f, 0x53, 0x1f, 0xc0, 0x74,
+		    0xf7, 0x53, 0xb9, 0x5e, 0x23, 0x37, 0xba, 0x6e,
+		    0xe3, 0x9d, 0x07, 0x55, 0x25, 0x7b, 0xe6, 0x2a,
+		    0x64, 0xd1, 0x32, 0xdd, 0x54, 0x1b, 0x4b, 0xc0,
+		    0xe1, 0xd7, 0x69, 0x58, 0xf8, 0x93, 0x29, 0xc4,
+		    0xdd, 0x23, 0x2f, 0xa5, 0xfc, 0x9d, 0x7e, 0xf8,
+		    0xd4, 0x90, 0xcd, 0x82, 0x55, 0xdc, 0x16, 0x16,
+		    0x9f, 0x07, 0x52, 0x9b, 0x9d, 0x25, 0xed, 0x32,
+		    0xc5, 0x7b, 0xdf, 0xf6, 0x83, 0x46, 0x3d, 0x65,
+		    0xb7, 0xef, 0x87, 0x7a, 0x12, 0x69, 0x8f, 0x06,
+		    0x7c, 0x51, 0x15, 0x4a, 0x08, 0xe8, 0xac, 0x9a,
+		    0x0c, 0x24, 0xa7, 0x27, 0xd8, 0x46, 0x2f, 0xe7,
+		    0x01, 0x0e, 0x1c, 0xc6, 0x91, 0xb0, 0x6e, 0x85,
+		    0x65, 0xf0, 0x29, 0x0d, 0x2e, 0x6b, 0x3b, 0xfb,
+		    0x4b, 0xdf, 0xe4, 0x80, 0x93, 0x03, 0x66, 0x46,
+		    0x3e, 0x8a, 0x6e, 0xf3, 0x5e, 0x4d, 0x62, 0x0e,
+		    0x49, 0x05, 0xaf, 0xd4, 0xf8, 0x21, 0x20, 0x61,
+		    0x1d, 0x39, 0x17, 0xf4, 0x61, 0x47, 0x95, 0xfb,
+		    0x15, 0x2e, 0xb3, 0x4f, 0xd0, 0x5d, 0xf5, 0x7d,
+		    0x40, 0xda, 0x90, 0x3c, 0x6b, 0xcb, 0x17, 0x00,
+		    0x13, 0x3b, 0x64, 0x34, 0x1b, 0xf0, 0xf2, 0xe5,
+		    0x3b, 0xb2, 0xc7, 0xd3, 0x5f, 0x3a, 0x44, 0xa6,
+		    0x9b, 0xb7, 0x78, 0x0e, 0x42, 0x5d, 0x4c, 0xc1,
+		    0xe9, 0xd2, 0xcb, 0xb7, 0x78, 0xd1, 0xfe, 0x9a,
+		    0xb5, 0x07, 0xe9, 0xe0, 0xbe, 0xe2, 0x8a, 0xa7,
+		    0x01, 0x83, 0x00, 0x8c, 0x5c, 0x08, 0xe6, 0x63,
+		    0x12, 0x92, 0xb7, 0xb7, 0xa6, 0x19, 0x7d, 0x38,
+		    0x13, 0x38, 0x92, 0x87, 0x24, 0xf9, 0x48, 0xb3,
+		    0x5e, 0x87, 0x6a, 0x40, 0x39, 0x5c, 0x3f, 0xed,
+		    0x8f, 0xee, 0xdb, 0x15, 0x82, 0x06, 0xda, 0x49,
+		    0x21, 0x2b, 0xb5, 0xbf, 0x32, 0x7c, 0x9f, 0x42,
+		    0x28, 0x63, 0xcf, 0xaf, 0x1e, 0xf8, 0xc6, 0xa0,
+		    0xd1, 0x02, 0x43, 0x57, 0x62, 0xec, 0x9b, 0x0f,
+		    0x01, 0x9e, 0x71, 0xd8, 0x87, 0x9d, 0x01, 0xc1,
+		    0x58, 0x77, 0xd9, 0xaf, 0xb1, 0x10, 0x7e, 0xdd,
+		    0xa6, 0x50, 0x96, 0xe5, 0xf0, 0x72, 0x00, 0x6d,
+		    0x4b, 0xf8, 0x2a, 0x8f, 0x19, 0xf3, 0x22, 0x88,
+		    0x11, 0x4a, 0x8b, 0x7c, 0xfd, 0xb7, 0xed, 0xe1,
+		    0xf6, 0x40, 0x39, 0xe0, 0xe9, 0xf6, 0x3d, 0x25,
+		    0xe6, 0x74, 0x3c, 0x58, 0x57, 0x7f, 0xe1, 0x22,
+		    0x96, 0x47, 0x31, 0x91, 0xba, 0x70, 0x85, 0x28,
+		    0x6b, 0x9f, 0x6e, 0x25, 0xac, 0x23, 0x66, 0x2f,
+		    0x29, 0x88, 0x28, 0xce, 0x8c, 0x5c, 0x88, 0x53,
+		    0xd1, 0x3b, 0xcc, 0x6a, 0x51, 0xb2, 0xe1, 0x28,
+		    0x3f, 0x91, 0xb4, 0x0d, 0x00, 0x3a, 0xe3, 0xf8,
+		    0xc3, 0x8f, 0xd7, 0x96, 0x62, 0x0e, 0x2e, 0xfc,
+		    0xc8, 0x6c, 0x77, 0xa6, 0x1d, 0x22, 0xc1, 0xb8,
+		    0xe6, 0x61, 0xd7, 0x67, 0x36, 0x13, 0x7b, 0xbb,
+		    0x9b, 0x59, 0x09, 0xa6, 0xdf, 0xf7, 0x6b, 0xa3,
+		    0x40, 0x1a, 0xf5, 0x4f, 0xb4, 0xda, 0xd3, 0xf3,
+		    0x81, 0x93, 0xc6, 0x18, 0xd9, 0x26, 0xee, 0xac,
+		    0xf0, 0xaa, 0xdf, 0xc5, 0x9c, 0xca, 0xc2, 0xa2,
+		    0xcc, 0x7b, 0x5c, 0x24, 0xb0, 0xbc, 0xd0, 0x6a,
+		    0x4d, 0x89, 0x09, 0xb8, 0x07, 0xfe, 0x87, 0xad,
+		    0x0a, 0xea, 0xb8, 0x42, 0xf9, 0x5e, 0xb3, 0x3e,
+		    0x36, 0x4c, 0xaf, 0x75, 0x9e, 0x1c, 0xeb, 0xbd,
+		    0xbc, 0xbb, 0x80, 0x40, 0xa7, 0x3a, 0x30, 0xbf,
+		    0xa8, 0x44, 0xf4, 0xeb, 0x38, 0xad, 0x29, 0xba,
+		    0x23, 0xed, 0x41, 0x0c, 0xea, 0xd2, 0xbb, 0x41,
+		    0x18, 0xd6, 0xb9, 0xba, 0x65, 0x2b, 0xa3, 0x91,
+		    0x6d, 0x1f, 0xa9, 0xf4, 0xd1, 0x25, 0x8d, 0x4d,
+		    0x38, 0xff, 0x64, 0xa0, 0xec, 0xde, 0xa6, 0xb6,
+		    0x79, 0xab, 0x8e, 0x33, 0x6c, 0x47, 0xde, 0xaf,
+		    0x94, 0xa4, 0xa5, 0x86, 0x77, 0x55, 0x09, 0x92,
+		    0x81, 0x31, 0x76, 0xc7, 0x34, 0x22, 0x89, 0x8e,
+		    0x3d, 0x26, 0x26, 0xd7, 0xfc, 0x1e, 0x16, 0x72,
+		    0x13, 0x33, 0x63, 0xd5, 0x22, 0xbe, 0xb8, 0x04,
+		    0x34, 0x84, 0x41, 0xbb, 0x80, 0xd0, 0x9f, 0x46,
+		    0x48, 0x07, 0xa7, 0xfc, 0x2b, 0x3a, 0x75, 0x55,
+		    0x8c, 0xc7, 0x6a, 0xbd, 0x7e, 0x46, 0x08, 0x84,
+		    0x0f, 0xd5, 0x74, 0xc0, 0x82, 0x8e, 0xaa, 0x61,
+		    0x05, 0x01, 0xb2, 0x47, 0x6e, 0x20, 0x6a, 0x2d,
+		    0x58, 0x70, 0x48, 0x32, 0xa7, 0x37, 0xd2, 0xb8,
+		    0x82, 0x1a, 0x51, 0xb9, 0x61, 0xdd, 0xfd, 0x9d,
+		    0x6b, 0x0e, 0x18, 0x97, 0xf8, 0x45, 0x5f, 0x87,
+		    0x10, 0xcf, 0x34, 0x72, 0x45, 0x26, 0x49, 0x70,
+		    0xe7, 0xa3, 0x78, 0xe0, 0x52, 0x89, 0x84, 0x94,
+		    0x83, 0x82, 0xc2, 0x69, 0x8f, 0xe3, 0xe1, 0x3f,
+		    0x60, 0x74, 0x88, 0xc4, 0xf7, 0x75, 0x2c, 0xfb,
+		    0xbd, 0xb6, 0xc4, 0x7e, 0x10, 0x0a, 0x6c, 0x90,
+		    0x04, 0x9e, 0xc3, 0x3f, 0x59, 0x7c, 0xce, 0x31,
+		    0x18, 0x60, 0x57, 0x73, 0x46, 0x94, 0x7d, 0x06,
+		    0xa0, 0x6d, 0x44, 0xec, 0xa2, 0x0a, 0x9e, 0x05,
+		    0x15, 0xef, 0xca, 0x5c, 0xbf, 0x00, 0xeb, 0xf7,
+		    0x3d, 0x32, 0xd4, 0xa5, 0xef, 0x49, 0x89, 0x5e,
+		    0x46, 0xb0, 0xa6, 0x63, 0x5b, 0x8a, 0x73, 0xae,
+		    0x6f, 0xd5, 0x9d, 0xf8, 0x4f, 0x40, 0xb5, 0xb2,
+		    0x6e, 0xd3, 0xb6, 0x01, 0xa9, 0x26, 0xa2, 0x21,
+		    0xcf, 0x33, 0x7a, 0x3a, 0xa4, 0x23, 0x13, 0xb0,
+		    0x69, 0x6a, 0xee, 0xce, 0xd8, 0x9d, 0x01, 0x1d,
+		    0x50, 0xc1, 0x30, 0x6c, 0xb1, 0xcd, 0xa0, 0xf0,
+		    0xf0, 0xa2, 0x64, 0x6f, 0xbb, 0xbf, 0x5e, 0xe6,
+		    0xab, 0x87, 0xb4, 0x0f, 0x4f, 0x15, 0xaf, 0xb5,
+		    0x25, 0xa1, 0xb2, 0xd0, 0x80, 0x2c, 0xfb, 0xf9,
+		    0xfe, 0xd2, 0x33, 0xbb, 0x76, 0xfe, 0x7c, 0xa8,
+		    0x66, 0xf7, 0xe7, 0x85, 0x9f, 0x1f, 0x85, 0x57,
+		    0x88, 0xe1, 0xe9, 0x63, 0xe4, 0xd8, 0x1c, 0xa1,
+		    0xfb, 0xda, 0x44, 0x05, 0x2e, 0x1d, 0x3a, 0x1c,
+		    0xff, 0xc8, 0x3b, 0xc0, 0xfe, 0xda, 0x22, 0x0b,
+		    0x43, 0xd6, 0x88, 0x39, 0x4c, 0x4a, 0xa6, 0x69,
+		    0x18, 0x93, 0x42, 0x4e, 0xb5, 0xcc, 0x66, 0x0d,
+		    0x09, 0xf8, 0x1e, 0x7c, 0xd3, 0x3c, 0x99, 0x0d,
+		    0x50, 0x1d, 0x62, 0xe9, 0x57, 0x06, 0xbf, 0x19,
+		    0x88, 0xdd, 0xad, 0x7b, 0x4f, 0xf9, 0xc7, 0x82,
+		    0x6d, 0x8d, 0xc8, 0xc4, 0xc5, 0x78, 0x17, 0x20,
+		    0x15, 0xc5, 0x52, 0x41, 0xcf, 0x5b, 0xd6, 0x7f,
+		    0x94, 0x02, 0x41, 0xe0, 0x40, 0x22, 0x03, 0x5e,
+		    0xd1, 0x53, 0xd4, 0x86, 0xd3, 0x2c, 0x9f, 0x0f,
+		    0x96, 0xe3, 0x6b, 0x9a, 0x76, 0x32, 0x06, 0x47,
+		    0x4b, 0x11, 0xb3, 0xdd, 0x03, 0x65, 0xbd, 0x9b,
+		    0x01, 0xda, 0x9c, 0xb9, 0x7e, 0x3f, 0x6a, 0xc4,
+		    0x7b, 0xea, 0xd4, 0x3c, 0xb9, 0xfb, 0x5c, 0x6b,
+		    0x64, 0x33, 0x52, 0xba, 0x64, 0x78, 0x8f, 0xa4,
+		    0xaf, 0x7a, 0x61, 0x8d, 0xbc, 0xc5, 0x73, 0xe9,
+		    0x6b, 0x58, 0x97, 0x4b, 0xbf, 0x63, 0x22, 0xd3,
+		    0x37, 0x02, 0x54, 0xc5, 0xb9, 0x16, 0x4a, 0xf0,
+		    0x19, 0xd8, 0x94, 0x57, 0xb8, 0x8a, 0xb3, 0x16,
+		    0x3b, 0xd0, 0x84, 0x8e, 0x67, 0xa6, 0xa3, 0x7d,
+		    0x78, 0xec, 0x00 },
+	.failure = true
+} };
+
+static const struct chacha20poly1305_testvec
+xchacha20poly1305_enc_vectors[] __initconst = { {
+	.key	= { 0x1c, 0x92, 0x40, 0xa5, 0xeb, 0x55, 0xd3, 0x8a,
+		    0xf3, 0x33, 0x88, 0x86, 0x04, 0xf6, 0xb5, 0xf0,
+		    0x47, 0x39, 0x17, 0xc1, 0x40, 0x2b, 0x80, 0x09,
+		    0x9d, 0xca, 0x5c, 0xbc, 0x20, 0x70, 0x75, 0xc0 },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+		    0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17 },
+	.nlen	= 8,
+	.assoc	= { 0xf3, 0x33, 0x88, 0x86, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x4e, 0x91 },
+	.alen	= 12,
+	.input	= { 0x49, 0x6e, 0x74, 0x65, 0x72, 0x6e, 0x65, 0x74,
+		    0x2d, 0x44, 0x72, 0x61, 0x66, 0x74, 0x73, 0x20,
+		    0x61, 0x72, 0x65, 0x20, 0x64, 0x72, 0x61, 0x66,
+		    0x74, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+		    0x6e, 0x74, 0x73, 0x20, 0x76, 0x61, 0x6c, 0x69,
+		    0x64, 0x20, 0x66, 0x6f, 0x72, 0x20, 0x61, 0x20,
+		    0x6d, 0x61, 0x78, 0x69, 0x6d, 0x75, 0x6d, 0x20,
+		    0x6f, 0x66, 0x20, 0x73, 0x69, 0x78, 0x20, 0x6d,
+		    0x6f, 0x6e, 0x74, 0x68, 0x73, 0x20, 0x61, 0x6e,
+		    0x64, 0x20, 0x6d, 0x61, 0x79, 0x20, 0x62, 0x65,
+		    0x20, 0x75, 0x70, 0x64, 0x61, 0x74, 0x65, 0x64,
+		    0x2c, 0x20, 0x72, 0x65, 0x70, 0x6c, 0x61, 0x63,
+		    0x65, 0x64, 0x2c, 0x20, 0x6f, 0x72, 0x20, 0x6f,
+		    0x62, 0x73, 0x6f, 0x6c, 0x65, 0x74, 0x65, 0x64,
+		    0x20, 0x62, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65,
+		    0x72, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+		    0x6e, 0x74, 0x73, 0x20, 0x61, 0x74, 0x20, 0x61,
+		    0x6e, 0x79, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x2e,
+		    0x20, 0x49, 0x74, 0x20, 0x69, 0x73, 0x20, 0x69,
+		    0x6e, 0x61, 0x70, 0x70, 0x72, 0x6f, 0x70, 0x72,
+		    0x69, 0x61, 0x74, 0x65, 0x20, 0x74, 0x6f, 0x20,
+		    0x75, 0x73, 0x65, 0x20, 0x49, 0x6e, 0x74, 0x65,
+		    0x72, 0x6e, 0x65, 0x74, 0x2d, 0x44, 0x72, 0x61,
+		    0x66, 0x74, 0x73, 0x20, 0x61, 0x73, 0x20, 0x72,
+		    0x65, 0x66, 0x65, 0x72, 0x65, 0x6e, 0x63, 0x65,
+		    0x20, 0x6d, 0x61, 0x74, 0x65, 0x72, 0x69, 0x61,
+		    0x6c, 0x20, 0x6f, 0x72, 0x20, 0x74, 0x6f, 0x20,
+		    0x63, 0x69, 0x74, 0x65, 0x20, 0x74, 0x68, 0x65,
+		    0x6d, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20,
+		    0x74, 0x68, 0x61, 0x6e, 0x20, 0x61, 0x73, 0x20,
+		    0x2f, 0xe2, 0x80, 0x9c, 0x77, 0x6f, 0x72, 0x6b,
+		    0x20, 0x69, 0x6e, 0x20, 0x70, 0x72, 0x6f, 0x67,
+		    0x72, 0x65, 0x73, 0x73, 0x2e, 0x2f, 0xe2, 0x80,
+		    0x9d },
+	.ilen	= 265,
+	.result	= { 0x1a, 0x6e, 0x3a, 0xd9, 0xfd, 0x41, 0x3f, 0x77,
+		    0x54, 0x72, 0x0a, 0x70, 0x9a, 0xa0, 0x29, 0x92,
+		    0x2e, 0xed, 0x93, 0xcf, 0x0f, 0x71, 0x88, 0x18,
+		    0x7a, 0x9d, 0x2d, 0x24, 0xe0, 0xf5, 0xea, 0x3d,
+		    0x55, 0x64, 0xd7, 0xad, 0x2a, 0x1a, 0x1f, 0x7e,
+		    0x86, 0x6d, 0xb0, 0xce, 0x80, 0x41, 0x72, 0x86,
+		    0x26, 0xee, 0x84, 0xd7, 0xef, 0x82, 0x9e, 0xe2,
+		    0x60, 0x9d, 0x5a, 0xfc, 0xf0, 0xe4, 0x19, 0x85,
+		    0xea, 0x09, 0xc6, 0xfb, 0xb3, 0xa9, 0x50, 0x09,
+		    0xec, 0x5e, 0x11, 0x90, 0xa1, 0xc5, 0x4e, 0x49,
+		    0xef, 0x50, 0xd8, 0x8f, 0xe0, 0x78, 0xd7, 0xfd,
+		    0xb9, 0x3b, 0xc9, 0xf2, 0x91, 0xc8, 0x25, 0xc8,
+		    0xa7, 0x63, 0x60, 0xce, 0x10, 0xcd, 0xc6, 0x7f,
+		    0xf8, 0x16, 0xf8, 0xe1, 0x0a, 0xd9, 0xde, 0x79,
+		    0x50, 0x33, 0xf2, 0x16, 0x0f, 0x17, 0xba, 0xb8,
+		    0x5d, 0xd8, 0xdf, 0x4e, 0x51, 0xa8, 0x39, 0xd0,
+		    0x85, 0xca, 0x46, 0x6a, 0x10, 0xa7, 0xa3, 0x88,
+		    0xef, 0x79, 0xb9, 0xf8, 0x24, 0xf3, 0xe0, 0x71,
+		    0x7b, 0x76, 0x28, 0x46, 0x3a, 0x3a, 0x1b, 0x91,
+		    0xb6, 0xd4, 0x3e, 0x23, 0xe5, 0x44, 0x15, 0xbf,
+		    0x60, 0x43, 0x9d, 0xa4, 0xbb, 0xd5, 0x5f, 0x89,
+		    0xeb, 0xef, 0x8e, 0xfd, 0xdd, 0xb4, 0x0d, 0x46,
+		    0xf0, 0x69, 0x23, 0x63, 0xae, 0x94, 0xf5, 0x5e,
+		    0xa5, 0xad, 0x13, 0x1c, 0x41, 0x76, 0xe6, 0x90,
+		    0xd6, 0x6d, 0xa2, 0x8f, 0x97, 0x4c, 0xa8, 0x0b,
+		    0xcf, 0x8d, 0x43, 0x2b, 0x9c, 0x9b, 0xc5, 0x58,
+		    0xa5, 0xb6, 0x95, 0x9a, 0xbf, 0x81, 0xc6, 0x54,
+		    0xc9, 0x66, 0x0c, 0xe5, 0x4f, 0x6a, 0x53, 0xa1,
+		    0xe5, 0x0c, 0xba, 0x31, 0xde, 0x34, 0x64, 0x73,
+		    0x8a, 0x3b, 0xbd, 0x92, 0x01, 0xdb, 0x71, 0x69,
+		    0xf3, 0x58, 0x99, 0xbc, 0xd1, 0xcb, 0x4a, 0x05,
+		    0xe2, 0x58, 0x9c, 0x25, 0x17, 0xcd, 0xdc, 0x83,
+		    0xb7, 0xff, 0xfb, 0x09, 0x61, 0xad, 0xbf, 0x13,
+		    0x5b, 0x5e, 0xed, 0x46, 0x82, 0x6f, 0x22, 0xd8,
+		    0x93, 0xa6, 0x85, 0x5b, 0x40, 0x39, 0x5c, 0xc5,
+		    0x9c }
+} };
+static const struct chacha20poly1305_testvec
+xchacha20poly1305_dec_vectors[] __initconst = {	{
+	.key	= { 0x1c, 0x92, 0x40, 0xa5, 0xeb, 0x55, 0xd3, 0x8a,
+		    0xf3, 0x33, 0x88, 0x86, 0x04, 0xf6, 0xb5, 0xf0,
+		    0x47, 0x39, 0x17, 0xc1, 0x40, 0x2b, 0x80, 0x09,
+		    0x9d, 0xca, 0x5c, 0xbc, 0x20, 0x70, 0x75, 0xc0 },
+	.nonce	= { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+		    0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
+		    0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17 },
+	.nlen	= 8,
+	.assoc	= { 0xf3, 0x33, 0x88, 0x86, 0x00, 0x00, 0x00, 0x00,
+		    0x00, 0x00, 0x4e, 0x91 },
+	.alen	= 12,
+	.input	= { 0x1a, 0x6e, 0x3a, 0xd9, 0xfd, 0x41, 0x3f, 0x77,
+		    0x54, 0x72, 0x0a, 0x70, 0x9a, 0xa0, 0x29, 0x92,
+		    0x2e, 0xed, 0x93, 0xcf, 0x0f, 0x71, 0x88, 0x18,
+		    0x7a, 0x9d, 0x2d, 0x24, 0xe0, 0xf5, 0xea, 0x3d,
+		    0x55, 0x64, 0xd7, 0xad, 0x2a, 0x1a, 0x1f, 0x7e,
+		    0x86, 0x6d, 0xb0, 0xce, 0x80, 0x41, 0x72, 0x86,
+		    0x26, 0xee, 0x84, 0xd7, 0xef, 0x82, 0x9e, 0xe2,
+		    0x60, 0x9d, 0x5a, 0xfc, 0xf0, 0xe4, 0x19, 0x85,
+		    0xea, 0x09, 0xc6, 0xfb, 0xb3, 0xa9, 0x50, 0x09,
+		    0xec, 0x5e, 0x11, 0x90, 0xa1, 0xc5, 0x4e, 0x49,
+		    0xef, 0x50, 0xd8, 0x8f, 0xe0, 0x78, 0xd7, 0xfd,
+		    0xb9, 0x3b, 0xc9, 0xf2, 0x91, 0xc8, 0x25, 0xc8,
+		    0xa7, 0x63, 0x60, 0xce, 0x10, 0xcd, 0xc6, 0x7f,
+		    0xf8, 0x16, 0xf8, 0xe1, 0x0a, 0xd9, 0xde, 0x79,
+		    0x50, 0x33, 0xf2, 0x16, 0x0f, 0x17, 0xba, 0xb8,
+		    0x5d, 0xd8, 0xdf, 0x4e, 0x51, 0xa8, 0x39, 0xd0,
+		    0x85, 0xca, 0x46, 0x6a, 0x10, 0xa7, 0xa3, 0x88,
+		    0xef, 0x79, 0xb9, 0xf8, 0x24, 0xf3, 0xe0, 0x71,
+		    0x7b, 0x76, 0x28, 0x46, 0x3a, 0x3a, 0x1b, 0x91,
+		    0xb6, 0xd4, 0x3e, 0x23, 0xe5, 0x44, 0x15, 0xbf,
+		    0x60, 0x43, 0x9d, 0xa4, 0xbb, 0xd5, 0x5f, 0x89,
+		    0xeb, 0xef, 0x8e, 0xfd, 0xdd, 0xb4, 0x0d, 0x46,
+		    0xf0, 0x69, 0x23, 0x63, 0xae, 0x94, 0xf5, 0x5e,
+		    0xa5, 0xad, 0x13, 0x1c, 0x41, 0x76, 0xe6, 0x90,
+		    0xd6, 0x6d, 0xa2, 0x8f, 0x97, 0x4c, 0xa8, 0x0b,
+		    0xcf, 0x8d, 0x43, 0x2b, 0x9c, 0x9b, 0xc5, 0x58,
+		    0xa5, 0xb6, 0x95, 0x9a, 0xbf, 0x81, 0xc6, 0x54,
+		    0xc9, 0x66, 0x0c, 0xe5, 0x4f, 0x6a, 0x53, 0xa1,
+		    0xe5, 0x0c, 0xba, 0x31, 0xde, 0x34, 0x64, 0x73,
+		    0x8a, 0x3b, 0xbd, 0x92, 0x01, 0xdb, 0x71, 0x69,
+		    0xf3, 0x58, 0x99, 0xbc, 0xd1, 0xcb, 0x4a, 0x05,
+		    0xe2, 0x58, 0x9c, 0x25, 0x17, 0xcd, 0xdc, 0x83,
+		    0xb7, 0xff, 0xfb, 0x09, 0x61, 0xad, 0xbf, 0x13,
+		    0x5b, 0x5e, 0xed, 0x46, 0x82, 0x6f, 0x22, 0xd8,
+		    0x93, 0xa6, 0x85, 0x5b, 0x40, 0x39, 0x5c, 0xc5,
+		    0x9c },
+	.ilen	= 281,
+	.result	= { 0x49, 0x6e, 0x74, 0x65, 0x72, 0x6e, 0x65, 0x74,
+		    0x2d, 0x44, 0x72, 0x61, 0x66, 0x74, 0x73, 0x20,
+		    0x61, 0x72, 0x65, 0x20, 0x64, 0x72, 0x61, 0x66,
+		    0x74, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+		    0x6e, 0x74, 0x73, 0x20, 0x76, 0x61, 0x6c, 0x69,
+		    0x64, 0x20, 0x66, 0x6f, 0x72, 0x20, 0x61, 0x20,
+		    0x6d, 0x61, 0x78, 0x69, 0x6d, 0x75, 0x6d, 0x20,
+		    0x6f, 0x66, 0x20, 0x73, 0x69, 0x78, 0x20, 0x6d,
+		    0x6f, 0x6e, 0x74, 0x68, 0x73, 0x20, 0x61, 0x6e,
+		    0x64, 0x20, 0x6d, 0x61, 0x79, 0x20, 0x62, 0x65,
+		    0x20, 0x75, 0x70, 0x64, 0x61, 0x74, 0x65, 0x64,
+		    0x2c, 0x20, 0x72, 0x65, 0x70, 0x6c, 0x61, 0x63,
+		    0x65, 0x64, 0x2c, 0x20, 0x6f, 0x72, 0x20, 0x6f,
+		    0x62, 0x73, 0x6f, 0x6c, 0x65, 0x74, 0x65, 0x64,
+		    0x20, 0x62, 0x79, 0x20, 0x6f, 0x74, 0x68, 0x65,
+		    0x72, 0x20, 0x64, 0x6f, 0x63, 0x75, 0x6d, 0x65,
+		    0x6e, 0x74, 0x73, 0x20, 0x61, 0x74, 0x20, 0x61,
+		    0x6e, 0x79, 0x20, 0x74, 0x69, 0x6d, 0x65, 0x2e,
+		    0x20, 0x49, 0x74, 0x20, 0x69, 0x73, 0x20, 0x69,
+		    0x6e, 0x61, 0x70, 0x70, 0x72, 0x6f, 0x70, 0x72,
+		    0x69, 0x61, 0x74, 0x65, 0x20, 0x74, 0x6f, 0x20,
+		    0x75, 0x73, 0x65, 0x20, 0x49, 0x6e, 0x74, 0x65,
+		    0x72, 0x6e, 0x65, 0x74, 0x2d, 0x44, 0x72, 0x61,
+		    0x66, 0x74, 0x73, 0x20, 0x61, 0x73, 0x20, 0x72,
+		    0x65, 0x66, 0x65, 0x72, 0x65, 0x6e, 0x63, 0x65,
+		    0x20, 0x6d, 0x61, 0x74, 0x65, 0x72, 0x69, 0x61,
+		    0x6c, 0x20, 0x6f, 0x72, 0x20, 0x74, 0x6f, 0x20,
+		    0x63, 0x69, 0x74, 0x65, 0x20, 0x74, 0x68, 0x65,
+		    0x6d, 0x20, 0x6f, 0x74, 0x68, 0x65, 0x72, 0x20,
+		    0x74, 0x68, 0x61, 0x6e, 0x20, 0x61, 0x73, 0x20,
+		    0x2f, 0xe2, 0x80, 0x9c, 0x77, 0x6f, 0x72, 0x6b,
+		    0x20, 0x69, 0x6e, 0x20, 0x70, 0x72, 0x6f, 0x67,
+		    0x72, 0x65, 0x73, 0x73, 0x2e, 0x2f, 0xe2, 0x80,
+		    0x9d }
+} };
+
+static inline void
+chacha20poly1305_selftest_encrypt_bignonce(u8 *dst, const u8 *src,
+					   const size_t src_len, const u8 *ad,
+					   const size_t ad_len,
+					   const u8 nonce[12],
+					   const u8 key[CHACHA20POLY1305_KEYLEN])
+{
+	simd_context_t simd_context = simd_get();
+	struct poly1305_ctx poly1305_state;
+	struct chacha20_ctx chacha20_state;
+	union {
+		u8 block0[POLY1305_KEY_SIZE];
+		__le64 lens[2];
+	} b = {{ 0 }};
+
+	chacha20_init(&chacha20_state, key, 0);
+	chacha20_state.counter[1] = get_unaligned_le32(nonce + 0);
+	chacha20_state.counter[2] = get_unaligned_le32(nonce + 4);
+	chacha20_state.counter[3] = get_unaligned_le32(nonce + 8);
+	chacha20(&chacha20_state, b.block0, b.block0, sizeof(b.block0),
+		 simd_context);
+	poly1305_init(&poly1305_state, b.block0, simd_context);
+	poly1305_update(&poly1305_state, ad, ad_len, simd_context);
+	poly1305_update(&poly1305_state, pad0, (0x10 - ad_len) & 0xf,
+			simd_context);
+	chacha20(&chacha20_state, dst, src, src_len, simd_context);
+	poly1305_update(&poly1305_state, dst, src_len, simd_context);
+	poly1305_update(&poly1305_state, pad0, (0x10 - src_len) & 0xf,
+			simd_context);
+	b.lens[0] = cpu_to_le64(ad_len);
+	b.lens[1] = cpu_to_le64(src_len);
+	poly1305_update(&poly1305_state, (u8 *)b.lens, sizeof(b.lens),
+			simd_context);
+	poly1305_final(&poly1305_state, dst + src_len, simd_context);
+	simd_put(simd_context);
+	memzero_explicit(&chacha20_state, sizeof(chacha20_state));
+	memzero_explicit(&b, sizeof(b));
+}
+
+static inline void
+chacha20poly1305_selftest_encrypt(u8 *dst, const u8 *src, const size_t src_len,
+				  const u8 *ad, const size_t ad_len,
+				  const u8 *nonce, const size_t nonce_len,
+				  const u8 key[CHACHA20POLY1305_KEYLEN])
+{
+	if (nonce_len == 8)
+		chacha20poly1305_encrypt(dst, src, src_len, ad, ad_len,
+					 get_unaligned_le64(nonce), key);
+	else if (nonce_len == 12)
+		chacha20poly1305_selftest_encrypt_bignonce(dst, src, src_len,
+							   ad, ad_len, nonce,
+							   key);
+	else
+		BUG();
+}
+
+static inline bool
+decryption_success(bool func_ret, bool expect_failure, int memcmp_result)
+{
+	if (expect_failure)
+		return !func_ret;
+	return func_ret && !memcmp_result;
+}
+
+enum { MAXIMUM_TEST_BUFFER_LEN = 3000 };
+
+bool __init chacha20poly1305_selftest(void)
+{
+	size_t i;
+	u8 computed_result[MAXIMUM_TEST_BUFFER_LEN], *heap_src, *heap_dst;
+	bool success = true, ret, simd_context;
+	struct scatterlist sg_src, sg_dst;
+
+	heap_src = kmalloc(MAXIMUM_TEST_BUFFER_LEN, GFP_KERNEL);
+	heap_dst = kmalloc(MAXIMUM_TEST_BUFFER_LEN, GFP_KERNEL);
+	if (!heap_src || !heap_dst) {
+		kfree(heap_src);
+		kfree(heap_dst);
+		pr_info("chacha20poly1305 self-test malloc: FAIL\n");
+		return false;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(chacha20poly1305_enc_vectors); ++i) {
+		memset(computed_result, 0, sizeof(computed_result));
+		chacha20poly1305_selftest_encrypt(computed_result,
+					chacha20poly1305_enc_vectors[i].input,
+					chacha20poly1305_enc_vectors[i].ilen,
+					chacha20poly1305_enc_vectors[i].assoc,
+					chacha20poly1305_enc_vectors[i].alen,
+					chacha20poly1305_enc_vectors[i].nonce,
+					chacha20poly1305_enc_vectors[i].nlen,
+					chacha20poly1305_enc_vectors[i].key);
+		if (memcmp(computed_result,
+			   chacha20poly1305_enc_vectors[i].result,
+			   chacha20poly1305_enc_vectors[i].ilen +
+							POLY1305_MAC_SIZE)) {
+			pr_info("chacha20poly1305 encryption self-test %zu: FAIL\n",
+				i + 1);
+			success = false;
+		}
+	}
+	simd_context = simd_get();
+	for (i = 0; i < ARRAY_SIZE(chacha20poly1305_enc_vectors); ++i) {
+		if (chacha20poly1305_enc_vectors[i].nlen != 8)
+			continue;
+		memset(heap_dst, 0, MAXIMUM_TEST_BUFFER_LEN);
+		memcpy(heap_src, chacha20poly1305_enc_vectors[i].input,
+		       chacha20poly1305_enc_vectors[i].ilen);
+		sg_init_one(&sg_src, heap_src,
+			    chacha20poly1305_enc_vectors[i].ilen);
+		sg_init_one(&sg_dst, heap_dst,
+			    chacha20poly1305_enc_vectors[i].ilen +
+				POLY1305_MAC_SIZE);
+		ret = chacha20poly1305_encrypt_sg(&sg_dst, &sg_src,
+			chacha20poly1305_enc_vectors[i].ilen,
+			chacha20poly1305_enc_vectors[i].assoc,
+			chacha20poly1305_enc_vectors[i].alen,
+			get_unaligned_le64(chacha20poly1305_enc_vectors[i].nonce),
+			chacha20poly1305_enc_vectors[i].key,
+			simd_context);
+		if (!ret || memcmp(heap_dst,
+				   chacha20poly1305_enc_vectors[i].result,
+				   chacha20poly1305_enc_vectors[i].ilen +
+							POLY1305_MAC_SIZE)) {
+			pr_info("chacha20poly1305 sg encryption self-test %zu: FAIL\n",
+				i + 1);
+			success = false;
+		}
+	}
+	simd_put(simd_context);
+	for (i = 0; i < ARRAY_SIZE(chacha20poly1305_dec_vectors); ++i) {
+		memset(computed_result, 0, sizeof(computed_result));
+		ret = chacha20poly1305_decrypt(computed_result,
+			chacha20poly1305_dec_vectors[i].input,
+			chacha20poly1305_dec_vectors[i].ilen,
+			chacha20poly1305_dec_vectors[i].assoc,
+			chacha20poly1305_dec_vectors[i].alen,
+			get_unaligned_le64(chacha20poly1305_dec_vectors[i].nonce),
+			chacha20poly1305_dec_vectors[i].key);
+		if (!decryption_success(ret,
+				chacha20poly1305_dec_vectors[i].failure,
+				memcmp(computed_result,
+				       chacha20poly1305_dec_vectors[i].result,
+				       chacha20poly1305_dec_vectors[i].ilen -
+							POLY1305_MAC_SIZE))) {
+			pr_info("chacha20poly1305 decryption self-test %zu: FAIL\n",
+				i + 1);
+			success = false;
+		}
+	}
+	simd_context = simd_get();
+	for (i = 0; i < ARRAY_SIZE(chacha20poly1305_dec_vectors); ++i) {
+		memset(heap_dst, 0, MAXIMUM_TEST_BUFFER_LEN);
+		memcpy(heap_src, chacha20poly1305_dec_vectors[i].input,
+		       chacha20poly1305_dec_vectors[i].ilen);
+		sg_init_one(&sg_src, heap_src,
+			    chacha20poly1305_dec_vectors[i].ilen);
+		sg_init_one(&sg_dst, heap_dst,
+			    chacha20poly1305_dec_vectors[i].ilen -
+							POLY1305_MAC_SIZE);
+		ret = chacha20poly1305_decrypt_sg(&sg_dst, &sg_src,
+			chacha20poly1305_dec_vectors[i].ilen,
+			chacha20poly1305_dec_vectors[i].assoc,
+			chacha20poly1305_dec_vectors[i].alen,
+			get_unaligned_le64(chacha20poly1305_dec_vectors[i].nonce),
+			chacha20poly1305_dec_vectors[i].key, simd_context);
+		if (!decryption_success(ret,
+			chacha20poly1305_dec_vectors[i].failure,
+			memcmp(heap_dst, chacha20poly1305_dec_vectors[i].result,
+			       chacha20poly1305_dec_vectors[i].ilen -
+							POLY1305_MAC_SIZE))) {
+			pr_info("chacha20poly1305 sg decryption self-test %zu: FAIL\n",
+				i + 1);
+			success = false;
+		}
+	}
+	simd_put(simd_context);
+	for (i = 0; i < ARRAY_SIZE(xchacha20poly1305_enc_vectors); ++i) {
+		memset(computed_result, 0, sizeof(computed_result));
+		xchacha20poly1305_encrypt(computed_result,
+					xchacha20poly1305_enc_vectors[i].input,
+					xchacha20poly1305_enc_vectors[i].ilen,
+					xchacha20poly1305_enc_vectors[i].assoc,
+					xchacha20poly1305_enc_vectors[i].alen,
+					xchacha20poly1305_enc_vectors[i].nonce,
+					xchacha20poly1305_enc_vectors[i].key);
+		if (memcmp(computed_result,
+			   xchacha20poly1305_enc_vectors[i].result,
+			   xchacha20poly1305_enc_vectors[i].ilen +
+							POLY1305_MAC_SIZE)) {
+			pr_info("xchacha20poly1305 encryption self-test %zu: FAIL\n",
+				i + 1);
+			success = false;
+		}
+	}
+	for (i = 0; i < ARRAY_SIZE(xchacha20poly1305_dec_vectors); ++i) {
+		memset(computed_result, 0, sizeof(computed_result));
+		ret = xchacha20poly1305_decrypt(computed_result,
+					xchacha20poly1305_dec_vectors[i].input,
+					xchacha20poly1305_dec_vectors[i].ilen,
+					xchacha20poly1305_dec_vectors[i].assoc,
+					xchacha20poly1305_dec_vectors[i].alen,
+					xchacha20poly1305_dec_vectors[i].nonce,
+					xchacha20poly1305_dec_vectors[i].key);
+		if (!decryption_success(ret,
+				xchacha20poly1305_dec_vectors[i].failure,
+				memcmp(computed_result,
+				       xchacha20poly1305_dec_vectors[i].result,
+				       xchacha20poly1305_dec_vectors[i].ilen -
+							POLY1305_MAC_SIZE))) {
+			pr_info("xchacha20poly1305 decryption self-test %zu: FAIL\n",
+				i + 1);
+			success = false;
+		}
+	}
+	if (success)
+		pr_info("chacha20poly1305 self-tests: pass\n");
+	kfree(heap_src);
+	kfree(heap_dst);
+	return success;
+}
+#endif
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 12/20] zinc: BLAKE2s generic C implementation and selftest
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (10 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 11/20] zinc: ChaCha20Poly1305 construction and selftest Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 13/20] zinc: BLAKE2s x86_64 implementation Jason A. Donenfeld
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson

The C implementation was originally based on Samuel Neves' public
domain reference implementation but has since been heavily modified
for the kernel. We're able to do compile-time optimizations by moving
some scaffolding around the final function into the header file.

Information: https://blake2.net/

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
---
 include/zinc/blake2s.h      |  101 ++
 lib/zinc/Kconfig            |    4 +
 lib/zinc/Makefile           |    4 +
 lib/zinc/blake2s/blake2s.c  |  274 +++++
 lib/zinc/main.c             |    5 +
 lib/zinc/selftest/blake2s.h | 2095 +++++++++++++++++++++++++++++++++++
 6 files changed, 2483 insertions(+)
 create mode 100644 include/zinc/blake2s.h
 create mode 100644 lib/zinc/blake2s/blake2s.c
 create mode 100644 lib/zinc/selftest/blake2s.h

diff --git a/include/zinc/blake2s.h b/include/zinc/blake2s.h
new file mode 100644
index 000000000000..5e32d576e86a
--- /dev/null
+++ b/include/zinc/blake2s.h
@@ -0,0 +1,101 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_BLAKE2S_H
+#define _ZINC_BLAKE2S_H
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <crypto/algapi.h>
+
+enum blake2s_lengths {
+	BLAKE2S_BLOCKBYTES = 64,
+	BLAKE2S_OUTBYTES = 32,
+	BLAKE2S_KEYBYTES = 32
+};
+
+struct blake2s_state {
+	u32 h[8];
+	u32 t[2];
+	u32 f[2];
+	u8 buf[BLAKE2S_BLOCKBYTES];
+	size_t buflen;
+	u8 last_node;
+};
+
+void blake2s_init(struct blake2s_state *state, const size_t outlen);
+void blake2s_init_key(struct blake2s_state *state, const size_t outlen,
+		      const void *key, const size_t keylen);
+void blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen);
+void __blake2s_final(struct blake2s_state *state);
+static inline void blake2s_final(struct blake2s_state *state, u8 *out,
+				 const size_t outlen)
+{
+	int i;
+
+#ifdef DEBUG
+	BUG_ON(!out || !outlen || outlen > BLAKE2S_OUTBYTES);
+#endif
+	__blake2s_final(state);
+
+	if (__builtin_constant_p(outlen) && !(outlen % sizeof(u32))) {
+		if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) ||
+		    IS_ALIGNED((unsigned long)out, __alignof__(u32))) {
+			__le32 *outwords = (__le32 *)out;
+
+			for (i = 0; i < outlen / sizeof(u32); ++i)
+				outwords[i] = cpu_to_le32(state->h[i]);
+		} else {
+			__le32 buffer[BLAKE2S_OUTBYTES];
+
+			for (i = 0; i < outlen / sizeof(u32); ++i)
+				buffer[i] = cpu_to_le32(state->h[i]);
+			memcpy(out, buffer, outlen);
+			memzero_explicit(buffer, sizeof(buffer));
+		}
+	} else {
+		u8 buffer[BLAKE2S_OUTBYTES] __aligned(__alignof__(u32));
+		__le32 *outwords = (__le32 *)buffer;
+
+		for (i = 0; i < 8; ++i)
+			outwords[i] = cpu_to_le32(state->h[i]);
+		memcpy(out, buffer, outlen);
+		memzero_explicit(buffer, sizeof(buffer));
+	}
+
+	memzero_explicit(state, sizeof(*state));
+}
+
+static inline void blake2s(u8 *out, const u8 *in, const u8 *key,
+			   const size_t outlen, const size_t inlen,
+			   const size_t keylen)
+{
+	struct blake2s_state state;
+
+#ifdef DEBUG
+	BUG_ON((!in && inlen > 0) || !out || !outlen ||
+	       outlen > BLAKE2S_OUTBYTES || keylen > BLAKE2S_KEYBYTES ||
+	       (!key && keylen));
+#endif
+
+	if (keylen)
+		blake2s_init_key(&state, outlen, key, keylen);
+	else
+		blake2s_init(&state, outlen);
+
+	blake2s_update(&state, in, inlen);
+	blake2s_final(&state, out, outlen);
+}
+
+void blake2s_hmac(u8 *out, const u8 *in, const u8 *key, const size_t outlen,
+		  const size_t inlen, const size_t keylen);
+
+void blake2s_fpu_init(void);
+
+#ifdef DEBUG
+bool blake2s_selftest(void);
+#endif
+
+#endif /* _ZINC_BLAKE2S_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index 98d095ed2418..4db30012c2d6 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -17,6 +17,10 @@ config ZINC_CHACHA20POLY1305
 	select ZINC_POLY1305
 	select CRYPTO_BLKCIPHER
 
+config ZINC_BLAKE2S
+	bool
+	select ZINC
+
 config ZINC_DEBUG
 	bool "Zinc cryptography library debugging and self-tests"
 	depends on ZINC
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 1664871aaf71..45817ec5539c 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -51,6 +51,10 @@ ifeq ($(CONFIG_ZINC_CHACHA20POLY1305),y)
 zinc-y += chacha20poly1305.o
 endif
 
+ifeq ($(CONFIG_ZINC_BLAKE2S),y)
+zinc-y += blake2s/blake2s.o
+endif
+
 zinc-y += main.o
 
 obj-$(CONFIG_ZINC) := zinc.o
diff --git a/lib/zinc/blake2s/blake2s.c b/lib/zinc/blake2s/blake2s.c
new file mode 100644
index 000000000000..13765d327589
--- /dev/null
+++ b/lib/zinc/blake2s/blake2s.c
@@ -0,0 +1,274 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2012 Samuel Neves <sneves@dei.uc.pt>. All Rights Reserved.
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is an implementation of the BLAKE2s hash and PRF functions.
+ *
+ * Information: https://blake2.net/
+ *
+ */
+
+#include <zinc/blake2s.h>
+
+#include <linux/types.h>
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <linux/bug.h>
+#include <asm/unaligned.h>
+
+typedef union {
+	struct {
+		u8 digest_length;
+		u8 key_length;
+		u8 fanout;
+		u8 depth;
+		u32 leaf_length;
+		u32 node_offset;
+		u16 xof_length;
+		u8 node_depth;
+		u8 inner_length;
+		u8 salt[8];
+		u8 personal[8];
+	};
+	__le32 words[8];
+} __packed blake2s_param;
+
+static const u32 blake2s_iv[8] = {
+	0x6A09E667UL, 0xBB67AE85UL, 0x3C6EF372UL, 0xA54FF53AUL,
+	0x510E527FUL, 0x9B05688CUL, 0x1F83D9ABUL, 0x5BE0CD19UL
+};
+
+static const u8 blake2s_sigma[10][16] = {
+	{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 },
+	{ 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 },
+	{ 11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4 },
+	{ 7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8 },
+	{ 9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13 },
+	{ 2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9 },
+	{ 12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11 },
+	{ 13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10 },
+	{ 6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5 },
+	{ 10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13, 0 },
+};
+
+static inline void blake2s_set_lastblock(struct blake2s_state *state)
+{
+	if (state->last_node)
+		state->f[1] = -1;
+	state->f[0] = -1;
+}
+
+static inline void blake2s_increment_counter(struct blake2s_state *state,
+					     const u32 inc)
+{
+	state->t[0] += inc;
+	state->t[1] += (state->t[0] < inc);
+}
+
+static inline void blake2s_init_param(struct blake2s_state *state,
+				      const blake2s_param *param)
+{
+	int i;
+
+	memset(state, 0, sizeof(*state));
+	for (i = 0; i < 8; ++i)
+		state->h[i] = blake2s_iv[i] ^ le32_to_cpu(param->words[i]);
+}
+
+void blake2s_init(struct blake2s_state *state, const size_t outlen)
+{
+	blake2s_param param __aligned(__alignof__(u32)) = {
+		.digest_length = outlen,
+		.fanout = 1,
+		.depth = 1
+	};
+
+#ifdef DEBUG
+	BUG_ON(!outlen || outlen > BLAKE2S_OUTBYTES);
+#endif
+	blake2s_init_param(state, &param);
+}
+EXPORT_SYMBOL(blake2s_init);
+
+void blake2s_init_key(struct blake2s_state *state, const size_t outlen,
+		      const void *key, const size_t keylen)
+{
+	blake2s_param param = { .digest_length = outlen,
+				.key_length = keylen,
+				.fanout = 1,
+				.depth = 1 };
+	u8 block[BLAKE2S_BLOCKBYTES] = { 0 };
+
+#ifdef DEBUG
+	BUG_ON(!outlen || outlen > BLAKE2S_OUTBYTES || !key || !keylen ||
+	       keylen > BLAKE2S_KEYBYTES);
+#endif
+	blake2s_init_param(state, &param);
+	memcpy(block, key, keylen);
+	blake2s_update(state, block, BLAKE2S_BLOCKBYTES);
+	memzero_explicit(block, BLAKE2S_BLOCKBYTES);
+}
+EXPORT_SYMBOL(blake2s_init_key);
+
+#ifndef HAVE_BLAKE2S_ARCH_IMPLEMENTATION
+void __init blake2s_fpu_init(void)
+{
+}
+static inline bool blake2s_arch(struct blake2s_state *state, const u8 *block,
+				const size_t nblocks, const u32 inc)
+{
+	return false;
+}
+#endif
+
+static inline void blake2s_compress(struct blake2s_state *state,
+				    const u8 *block, size_t nblocks,
+				    const u32 inc)
+{
+	u32 m[16];
+	u32 v[16];
+	int i;
+
+#ifdef DEBUG
+	BUG_ON(nblocks > 1 && inc != BLAKE2S_BLOCKBYTES);
+#endif
+
+	if (blake2s_arch(state, block, nblocks, inc))
+		return;
+
+	while (nblocks > 0) {
+		blake2s_increment_counter(state, inc);
+
+#ifdef __LITTLE_ENDIAN
+		memcpy(m, block, BLAKE2S_BLOCKBYTES);
+#else
+		for (i = 0; i < 16; ++i)
+			m[i] = get_unaligned_le32(block + i * sizeof(m[i]));
+#endif
+		memcpy(v, state->h, 32);
+		v[ 8] = blake2s_iv[0];
+		v[ 9] = blake2s_iv[1];
+		v[10] = blake2s_iv[2];
+		v[11] = blake2s_iv[3];
+		v[12] = blake2s_iv[4] ^ state->t[0];
+		v[13] = blake2s_iv[5] ^ state->t[1];
+		v[14] = blake2s_iv[6] ^ state->f[0];
+		v[15] = blake2s_iv[7] ^ state->f[1];
+
+#define G(r, i, a, b, c, d) do { \
+	a += b + m[blake2s_sigma[r][2 * i + 0]]; \
+	d = ror32(d ^ a, 16); \
+	c += d; \
+	b = ror32(b ^ c, 12); \
+	a += b + m[blake2s_sigma[r][2 * i + 1]]; \
+	d = ror32(d ^ a, 8); \
+	c += d; \
+	b = ror32(b ^ c, 7); \
+} while (0)
+
+#define ROUND(r) do { \
+	G(r, 0, v[0], v[ 4], v[ 8], v[12]); \
+	G(r, 1, v[1], v[ 5], v[ 9], v[13]); \
+	G(r, 2, v[2], v[ 6], v[10], v[14]); \
+	G(r, 3, v[3], v[ 7], v[11], v[15]); \
+	G(r, 4, v[0], v[ 5], v[10], v[15]); \
+	G(r, 5, v[1], v[ 6], v[11], v[12]); \
+	G(r, 6, v[2], v[ 7], v[ 8], v[13]); \
+	G(r, 7, v[3], v[ 4], v[ 9], v[14]); \
+} while (0)
+		ROUND(0);
+		ROUND(1);
+		ROUND(2);
+		ROUND(3);
+		ROUND(4);
+		ROUND(5);
+		ROUND(6);
+		ROUND(7);
+		ROUND(8);
+		ROUND(9);
+
+#undef G
+#undef ROUND
+
+		for (i = 0; i < 8; ++i)
+			state->h[i] ^= v[i] ^ v[i + 8];
+
+		block += BLAKE2S_BLOCKBYTES;
+		--nblocks;
+	}
+}
+
+void blake2s_update(struct blake2s_state *state, const u8 *in, size_t inlen)
+{
+	const size_t fill = BLAKE2S_BLOCKBYTES - state->buflen;
+
+	if (unlikely(!inlen))
+		return;
+	if (inlen > fill) {
+		memcpy(state->buf + state->buflen, in, fill);
+		blake2s_compress(state, state->buf, 1, BLAKE2S_BLOCKBYTES);
+		state->buflen = 0;
+		in += fill;
+		inlen -= fill;
+	}
+	if (inlen > BLAKE2S_BLOCKBYTES) {
+		const size_t nblocks =
+			(inlen + BLAKE2S_BLOCKBYTES - 1) / BLAKE2S_BLOCKBYTES;
+		/* Hash one less (full) block than strictly possible */
+		blake2s_compress(state, in, nblocks - 1, BLAKE2S_BLOCKBYTES);
+		in += BLAKE2S_BLOCKBYTES * (nblocks - 1);
+		inlen -= BLAKE2S_BLOCKBYTES * (nblocks - 1);
+	}
+	memcpy(state->buf + state->buflen, in, inlen);
+	state->buflen += inlen;
+}
+EXPORT_SYMBOL(blake2s_update);
+
+void __blake2s_final(struct blake2s_state *state)
+{
+	blake2s_set_lastblock(state);
+	memset(state->buf + state->buflen, 0,
+	       BLAKE2S_BLOCKBYTES - state->buflen); /* Padding */
+	blake2s_compress(state, state->buf, 1, state->buflen);
+}
+EXPORT_SYMBOL(__blake2s_final);
+
+void blake2s_hmac(u8 *out, const u8 *in, const u8 *key, const size_t outlen,
+		  const size_t inlen, const size_t keylen)
+{
+	struct blake2s_state state;
+	u8 x_key[BLAKE2S_BLOCKBYTES] __aligned(__alignof__(u32)) = { 0 };
+	u8 i_hash[BLAKE2S_OUTBYTES] __aligned(__alignof__(u32));
+	int i;
+
+	if (keylen > BLAKE2S_BLOCKBYTES) {
+		blake2s_init(&state, BLAKE2S_OUTBYTES);
+		blake2s_update(&state, key, keylen);
+		blake2s_final(&state, x_key, BLAKE2S_OUTBYTES);
+	} else
+		memcpy(x_key, key, keylen);
+
+	for (i = 0; i < BLAKE2S_BLOCKBYTES; ++i)
+		x_key[i] ^= 0x36;
+
+	blake2s_init(&state, BLAKE2S_OUTBYTES);
+	blake2s_update(&state, x_key, BLAKE2S_BLOCKBYTES);
+	blake2s_update(&state, in, inlen);
+	blake2s_final(&state, i_hash, BLAKE2S_OUTBYTES);
+
+	for (i = 0; i < BLAKE2S_BLOCKBYTES; ++i)
+		x_key[i] ^= 0x5c ^ 0x36;
+
+	blake2s_init(&state, BLAKE2S_OUTBYTES);
+	blake2s_update(&state, x_key, BLAKE2S_BLOCKBYTES);
+	blake2s_update(&state, i_hash, BLAKE2S_OUTBYTES);
+	blake2s_final(&state, i_hash, BLAKE2S_OUTBYTES);
+
+	memcpy(out, i_hash, outlen);
+	memzero_explicit(x_key, BLAKE2S_BLOCKBYTES);
+	memzero_explicit(i_hash, BLAKE2S_OUTBYTES);
+}
+EXPORT_SYMBOL(blake2s_hmac);
+
+#include "../selftest/blake2s.h"
diff --git a/lib/zinc/main.c b/lib/zinc/main.c
index a15b27fd0e4c..b1d650d82edb 100644
--- a/lib/zinc/main.c
+++ b/lib/zinc/main.c
@@ -6,6 +6,7 @@
 #include <zinc/chacha20poly1305.h>
 #include <zinc/chacha20.h>
 #include <zinc/poly1305.h>
+#include <zinc/blake2s.h>
 
 #include <linux/init.h>
 #include <linux/module.h>
@@ -30,6 +31,10 @@ static int __init mod_init(void)
 #endif
 #ifdef CONFIG_ZINC_CHACHA20POLY1305
 	selftest(chacha20poly1305);
+#endif
+#ifdef CONFIG_ZINC_BLAKE2S
+	blake2s_fpu_init();
+	selftest(blake2s);
 #endif
 	return 0;
 }
diff --git a/lib/zinc/selftest/blake2s.h b/lib/zinc/selftest/blake2s.h
new file mode 100644
index 000000000000..ddd6a862b344
--- /dev/null
+++ b/lib/zinc/selftest/blake2s.h
@@ -0,0 +1,2095 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+static const u8 blake2s_testvecs[][BLAKE2S_OUTBYTES] __initconst = {
+	{ 0x69, 0x21, 0x7a, 0x30, 0x79, 0x90, 0x80, 0x94,
+	  0xe1, 0x11, 0x21, 0xd0, 0x42, 0x35, 0x4a, 0x7c,
+	  0x1f, 0x55, 0xb6, 0x48, 0x2c, 0xa1, 0xa5, 0x1e,
+	  0x1b, 0x25, 0x0d, 0xfd, 0x1e, 0xd0, 0xee, 0xf9 },
+	{ 0xe3, 0x4d, 0x74, 0xdb, 0xaf, 0x4f, 0xf4, 0xc6,
+	  0xab, 0xd8, 0x71, 0xcc, 0x22, 0x04, 0x51, 0xd2,
+	  0xea, 0x26, 0x48, 0x84, 0x6c, 0x77, 0x57, 0xfb,
+	  0xaa, 0xc8, 0x2f, 0xe5, 0x1a, 0xd6, 0x4b, 0xea },
+	{ 0xdd, 0xad, 0x9a, 0xb1, 0x5d, 0xac, 0x45, 0x49,
+	  0xba, 0x42, 0xf4, 0x9d, 0x26, 0x24, 0x96, 0xbe,
+	  0xf6, 0xc0, 0xba, 0xe1, 0xdd, 0x34, 0x2a, 0x88,
+	  0x08, 0xf8, 0xea, 0x26, 0x7c, 0x6e, 0x21, 0x0c },
+	{ 0xe8, 0xf9, 0x1c, 0x6e, 0xf2, 0x32, 0xa0, 0x41,
+	  0x45, 0x2a, 0xb0, 0xe1, 0x49, 0x07, 0x0c, 0xdd,
+	  0x7d, 0xd1, 0x76, 0x9e, 0x75, 0xb3, 0xa5, 0x92,
+	  0x1b, 0xe3, 0x78, 0x76, 0xc4, 0x5c, 0x99, 0x00 },
+	{ 0x0c, 0xc7, 0x0e, 0x00, 0x34, 0x8b, 0x86, 0xba,
+	  0x29, 0x44, 0xd0, 0xc3, 0x20, 0x38, 0xb2, 0x5c,
+	  0x55, 0x58, 0x4f, 0x90, 0xdf, 0x23, 0x04, 0xf5,
+	  0x5f, 0xa3, 0x32, 0xaf, 0x5f, 0xb0, 0x1e, 0x20 },
+	{ 0xec, 0x19, 0x64, 0x19, 0x10, 0x87, 0xa4, 0xfe,
+	  0x9d, 0xf1, 0xc7, 0x95, 0x34, 0x2a, 0x02, 0xff,
+	  0xc1, 0x91, 0xa5, 0xb2, 0x51, 0x76, 0x48, 0x56,
+	  0xae, 0x5b, 0x8b, 0x57, 0x69, 0xf0, 0xc6, 0xcd },
+	{ 0xe1, 0xfa, 0x51, 0x61, 0x8d, 0x7d, 0xf4, 0xeb,
+	  0x70, 0xcf, 0x0d, 0x5a, 0x9e, 0x90, 0x6f, 0x80,
+	  0x6e, 0x9d, 0x19, 0xf7, 0xf4, 0xf0, 0x1e, 0x3b,
+	  0x62, 0x12, 0x88, 0xe4, 0x12, 0x04, 0x05, 0xd6 },
+	{ 0x59, 0x80, 0x01, 0xfa, 0xfb, 0xe8, 0xf9, 0x4e,
+	  0xc6, 0x6d, 0xc8, 0x27, 0xd0, 0x12, 0xcf, 0xcb,
+	  0xba, 0x22, 0x28, 0x56, 0x9f, 0x44, 0x8e, 0x89,
+	  0xea, 0x22, 0x08, 0xc8, 0xbf, 0x76, 0x92, 0x93 },
+	{ 0xc7, 0xe8, 0x87, 0xb5, 0x46, 0x62, 0x36, 0x35,
+	  0xe9, 0x3e, 0x04, 0x95, 0x59, 0x8f, 0x17, 0x26,
+	  0x82, 0x19, 0x96, 0xc2, 0x37, 0x77, 0x05, 0xb9,
+	  0x3a, 0x1f, 0x63, 0x6f, 0x87, 0x2b, 0xfa, 0x2d },
+	{ 0xc3, 0x15, 0xa4, 0x37, 0xdd, 0x28, 0x06, 0x2a,
+	  0x77, 0x0d, 0x48, 0x19, 0x67, 0x13, 0x6b, 0x1b,
+	  0x5e, 0xb8, 0x8b, 0x21, 0xee, 0x53, 0xd0, 0x32,
+	  0x9c, 0x58, 0x97, 0x12, 0x6e, 0x9d, 0xb0, 0x2c },
+	{ 0xbb, 0x47, 0x3d, 0xed, 0xdc, 0x05, 0x5f, 0xea,
+	  0x62, 0x28, 0xf2, 0x07, 0xda, 0x57, 0x53, 0x47,
+	  0xbb, 0x00, 0x40, 0x4c, 0xd3, 0x49, 0xd3, 0x8c,
+	  0x18, 0x02, 0x63, 0x07, 0xa2, 0x24, 0xcb, 0xff },
+	{ 0x68, 0x7e, 0x18, 0x73, 0xa8, 0x27, 0x75, 0x91,
+	  0xbb, 0x33, 0xd9, 0xad, 0xf9, 0xa1, 0x39, 0x12,
+	  0xef, 0xef, 0xe5, 0x57, 0xca, 0xfc, 0x39, 0xa7,
+	  0x95, 0x26, 0x23, 0xe4, 0x72, 0x55, 0xf1, 0x6d },
+	{ 0x1a, 0xc7, 0xba, 0x75, 0x4d, 0x6e, 0x2f, 0x94,
+	  0xe0, 0xe8, 0x6c, 0x46, 0xbf, 0xb2, 0x62, 0xab,
+	  0xbb, 0x74, 0xf4, 0x50, 0xef, 0x45, 0x6d, 0x6b,
+	  0x4d, 0x97, 0xaa, 0x80, 0xce, 0x6d, 0xa7, 0x67 },
+	{ 0x01, 0x2c, 0x97, 0x80, 0x96, 0x14, 0x81, 0x6b,
+	  0x5d, 0x94, 0x94, 0x47, 0x7d, 0x4b, 0x68, 0x7d,
+	  0x15, 0xb9, 0x6e, 0xb6, 0x9c, 0x0e, 0x80, 0x74,
+	  0xa8, 0x51, 0x6f, 0x31, 0x22, 0x4b, 0x5c, 0x98 },
+	{ 0x91, 0xff, 0xd2, 0x6c, 0xfa, 0x4d, 0xa5, 0x13,
+	  0x4c, 0x7e, 0xa2, 0x62, 0xf7, 0x88, 0x9c, 0x32,
+	  0x9f, 0x61, 0xf6, 0xa6, 0x57, 0x22, 0x5c, 0xc2,
+	  0x12, 0xf4, 0x00, 0x56, 0xd9, 0x86, 0xb3, 0xf4 },
+	{ 0xd9, 0x7c, 0x82, 0x8d, 0x81, 0x82, 0xa7, 0x21,
+	  0x80, 0xa0, 0x6a, 0x78, 0x26, 0x83, 0x30, 0x67,
+	  0x3f, 0x7c, 0x4e, 0x06, 0x35, 0x94, 0x7c, 0x04,
+	  0xc0, 0x23, 0x23, 0xfd, 0x45, 0xc0, 0xa5, 0x2d },
+	{ 0xef, 0xc0, 0x4c, 0xdc, 0x39, 0x1c, 0x7e, 0x91,
+	  0x19, 0xbd, 0x38, 0x66, 0x8a, 0x53, 0x4e, 0x65,
+	  0xfe, 0x31, 0x03, 0x6d, 0x6a, 0x62, 0x11, 0x2e,
+	  0x44, 0xeb, 0xeb, 0x11, 0xf9, 0xc5, 0x70, 0x80 },
+	{ 0x99, 0x2c, 0xf5, 0xc0, 0x53, 0x44, 0x2a, 0x5f,
+	  0xbc, 0x4f, 0xaf, 0x58, 0x3e, 0x04, 0xe5, 0x0b,
+	  0xb7, 0x0d, 0x2f, 0x39, 0xfb, 0xb6, 0xa5, 0x03,
+	  0xf8, 0x9e, 0x56, 0xa6, 0x3e, 0x18, 0x57, 0x8a },
+	{ 0x38, 0x64, 0x0e, 0x9f, 0x21, 0x98, 0x3e, 0x67,
+	  0xb5, 0x39, 0xca, 0xcc, 0xae, 0x5e, 0xcf, 0x61,
+	  0x5a, 0xe2, 0x76, 0x4f, 0x75, 0xa0, 0x9c, 0x9c,
+	  0x59, 0xb7, 0x64, 0x83, 0xc1, 0xfb, 0xc7, 0x35 },
+	{ 0x21, 0x3d, 0xd3, 0x4c, 0x7e, 0xfe, 0x4f, 0xb2,
+	  0x7a, 0x6b, 0x35, 0xf6, 0xb4, 0x00, 0x0d, 0x1f,
+	  0xe0, 0x32, 0x81, 0xaf, 0x3c, 0x72, 0x3e, 0x5c,
+	  0x9f, 0x94, 0x74, 0x7a, 0x5f, 0x31, 0xcd, 0x3b },
+	{ 0xec, 0x24, 0x6e, 0xee, 0xb9, 0xce, 0xd3, 0xf7,
+	  0xad, 0x33, 0xed, 0x28, 0x66, 0x0d, 0xd9, 0xbb,
+	  0x07, 0x32, 0x51, 0x3d, 0xb4, 0xe2, 0xfa, 0x27,
+	  0x8b, 0x60, 0xcd, 0xe3, 0x68, 0x2a, 0x4c, 0xcd },
+	{ 0xac, 0x9b, 0x61, 0xd4, 0x46, 0x64, 0x8c, 0x30,
+	  0x05, 0xd7, 0x89, 0x2b, 0xf3, 0xa8, 0x71, 0x9f,
+	  0x4c, 0x81, 0x81, 0xcf, 0xdc, 0xbc, 0x2b, 0x79,
+	  0xfe, 0xf1, 0x0a, 0x27, 0x9b, 0x91, 0x10, 0x95 },
+	{ 0x7b, 0xf8, 0xb2, 0x29, 0x59, 0xe3, 0x4e, 0x3a,
+	  0x43, 0xf7, 0x07, 0x92, 0x23, 0xe8, 0x3a, 0x97,
+	  0x54, 0x61, 0x7d, 0x39, 0x1e, 0x21, 0x3d, 0xfd,
+	  0x80, 0x8e, 0x41, 0xb9, 0xbe, 0xad, 0x4c, 0xe7 },
+	{ 0x68, 0xd4, 0xb5, 0xd4, 0xfa, 0x0e, 0x30, 0x2b,
+	  0x64, 0xcc, 0xc5, 0xaf, 0x79, 0x29, 0x13, 0xac,
+	  0x4c, 0x88, 0xec, 0x95, 0xc0, 0x7d, 0xdf, 0x40,
+	  0x69, 0x42, 0x56, 0xeb, 0x88, 0xce, 0x9f, 0x3d },
+	{ 0xb2, 0xc2, 0x42, 0x0f, 0x05, 0xf9, 0xab, 0xe3,
+	  0x63, 0x15, 0x91, 0x93, 0x36, 0xb3, 0x7e, 0x4e,
+	  0x0f, 0xa3, 0x3f, 0xf7, 0xe7, 0x6a, 0x49, 0x27,
+	  0x67, 0x00, 0x6f, 0xdb, 0x5d, 0x93, 0x54, 0x62 },
+	{ 0x13, 0x4f, 0x61, 0xbb, 0xd0, 0xbb, 0xb6, 0x9a,
+	  0xed, 0x53, 0x43, 0x90, 0x45, 0x51, 0xa3, 0xe6,
+	  0xc1, 0xaa, 0x7d, 0xcd, 0xd7, 0x7e, 0x90, 0x3e,
+	  0x70, 0x23, 0xeb, 0x7c, 0x60, 0x32, 0x0a, 0xa7 },
+	{ 0x46, 0x93, 0xf9, 0xbf, 0xf7, 0xd4, 0xf3, 0x98,
+	  0x6a, 0x7d, 0x17, 0x6e, 0x6e, 0x06, 0xf7, 0x2a,
+	  0xd1, 0x49, 0x0d, 0x80, 0x5c, 0x99, 0xe2, 0x53,
+	  0x47, 0xb8, 0xde, 0x77, 0xb4, 0xdb, 0x6d, 0x9b },
+	{ 0x85, 0x3e, 0x26, 0xf7, 0x41, 0x95, 0x3b, 0x0f,
+	  0xd5, 0xbd, 0xb4, 0x24, 0xe8, 0xab, 0x9e, 0x8b,
+	  0x37, 0x50, 0xea, 0xa8, 0xef, 0x61, 0xe4, 0x79,
+	  0x02, 0xc9, 0x1e, 0x55, 0x4e, 0x9c, 0x73, 0xb9 },
+	{ 0xf7, 0xde, 0x53, 0x63, 0x61, 0xab, 0xaa, 0x0e,
+	  0x15, 0x81, 0x56, 0xcf, 0x0e, 0xa4, 0xf6, 0x3a,
+	  0x99, 0xb5, 0xe4, 0x05, 0x4f, 0x8f, 0xa4, 0xc9,
+	  0xd4, 0x5f, 0x62, 0x85, 0xca, 0xd5, 0x56, 0x94 },
+	{ 0x4c, 0x23, 0x06, 0x08, 0x86, 0x0a, 0x99, 0xae,
+	  0x8d, 0x7b, 0xd5, 0xc2, 0xcc, 0x17, 0xfa, 0x52,
+	  0x09, 0x6b, 0x9a, 0x61, 0xbe, 0xdb, 0x17, 0xcb,
+	  0x76, 0x17, 0x86, 0x4a, 0xd2, 0x9c, 0xa7, 0xa6 },
+	{ 0xae, 0xb9, 0x20, 0xea, 0x87, 0x95, 0x2d, 0xad,
+	  0xb1, 0xfb, 0x75, 0x92, 0x91, 0xe3, 0x38, 0x81,
+	  0x39, 0xa8, 0x72, 0x86, 0x50, 0x01, 0x88, 0x6e,
+	  0xd8, 0x47, 0x52, 0xe9, 0x3c, 0x25, 0x0c, 0x2a },
+	{ 0xab, 0xa4, 0xad, 0x9b, 0x48, 0x0b, 0x9d, 0xf3,
+	  0xd0, 0x8c, 0xa5, 0xe8, 0x7b, 0x0c, 0x24, 0x40,
+	  0xd4, 0xe4, 0xea, 0x21, 0x22, 0x4c, 0x2e, 0xb4,
+	  0x2c, 0xba, 0xe4, 0x69, 0xd0, 0x89, 0xb9, 0x31 },
+	{ 0x05, 0x82, 0x56, 0x07, 0xd7, 0xfd, 0xf2, 0xd8,
+	  0x2e, 0xf4, 0xc3, 0xc8, 0xc2, 0xae, 0xa9, 0x61,
+	  0xad, 0x98, 0xd6, 0x0e, 0xdf, 0xf7, 0xd0, 0x18,
+	  0x98, 0x3e, 0x21, 0x20, 0x4c, 0x0d, 0x93, 0xd1 },
+	{ 0xa7, 0x42, 0xf8, 0xb6, 0xaf, 0x82, 0xd8, 0xa6,
+	  0xca, 0x23, 0x57, 0xc5, 0xf1, 0xcf, 0x91, 0xde,
+	  0xfb, 0xd0, 0x66, 0x26, 0x7d, 0x75, 0xc0, 0x48,
+	  0xb3, 0x52, 0x36, 0x65, 0x85, 0x02, 0x59, 0x62 },
+	{ 0x2b, 0xca, 0xc8, 0x95, 0x99, 0x00, 0x0b, 0x42,
+	  0xc9, 0x5a, 0xe2, 0x38, 0x35, 0xa7, 0x13, 0x70,
+	  0x4e, 0xd7, 0x97, 0x89, 0xc8, 0x4f, 0xef, 0x14,
+	  0x9a, 0x87, 0x4f, 0xf7, 0x33, 0xf0, 0x17, 0xa2 },
+	{ 0xac, 0x1e, 0xd0, 0x7d, 0x04, 0x8f, 0x10, 0x5a,
+	  0x9e, 0x5b, 0x7a, 0xb8, 0x5b, 0x09, 0xa4, 0x92,
+	  0xd5, 0xba, 0xff, 0x14, 0xb8, 0xbf, 0xb0, 0xe9,
+	  0xfd, 0x78, 0x94, 0x86, 0xee, 0xa2, 0xb9, 0x74 },
+	{ 0xe4, 0x8d, 0x0e, 0xcf, 0xaf, 0x49, 0x7d, 0x5b,
+	  0x27, 0xc2, 0x5d, 0x99, 0xe1, 0x56, 0xcb, 0x05,
+	  0x79, 0xd4, 0x40, 0xd6, 0xe3, 0x1f, 0xb6, 0x24,
+	  0x73, 0x69, 0x6d, 0xbf, 0x95, 0xe0, 0x10, 0xe4 },
+	{ 0x12, 0xa9, 0x1f, 0xad, 0xf8, 0xb2, 0x16, 0x44,
+	  0xfd, 0x0f, 0x93, 0x4f, 0x3c, 0x4a, 0x8f, 0x62,
+	  0xba, 0x86, 0x2f, 0xfd, 0x20, 0xe8, 0xe9, 0x61,
+	  0x15, 0x4c, 0x15, 0xc1, 0x38, 0x84, 0xed, 0x3d },
+	{ 0x7c, 0xbe, 0xe9, 0x6e, 0x13, 0x98, 0x97, 0xdc,
+	  0x98, 0xfb, 0xef, 0x3b, 0xe8, 0x1a, 0xd4, 0xd9,
+	  0x64, 0xd2, 0x35, 0xcb, 0x12, 0x14, 0x1f, 0xb6,
+	  0x67, 0x27, 0xe6, 0xe5, 0xdf, 0x73, 0xa8, 0x78 },
+	{ 0xeb, 0xf6, 0x6a, 0xbb, 0x59, 0x7a, 0xe5, 0x72,
+	  0xa7, 0x29, 0x7c, 0xb0, 0x87, 0x1e, 0x35, 0x5a,
+	  0xcc, 0xaf, 0xad, 0x83, 0x77, 0xb8, 0xe7, 0x8b,
+	  0xf1, 0x64, 0xce, 0x2a, 0x18, 0xde, 0x4b, 0xaf },
+	{ 0x71, 0xb9, 0x33, 0xb0, 0x7e, 0x4f, 0xf7, 0x81,
+	  0x8c, 0xe0, 0x59, 0xd0, 0x08, 0x82, 0x9e, 0x45,
+	  0x3c, 0x6f, 0xf0, 0x2e, 0xc0, 0xa7, 0xdb, 0x39,
+	  0x3f, 0xc2, 0xd8, 0x70, 0xf3, 0x7a, 0x72, 0x86 },
+	{ 0x7c, 0xf7, 0xc5, 0x13, 0x31, 0x22, 0x0b, 0x8d,
+	  0x3e, 0xba, 0xed, 0x9c, 0x29, 0x39, 0x8a, 0x16,
+	  0xd9, 0x81, 0x56, 0xe2, 0x61, 0x3c, 0xb0, 0x88,
+	  0xf2, 0xb0, 0xe0, 0x8a, 0x1b, 0xe4, 0xcf, 0x4f },
+	{ 0x3e, 0x41, 0xa1, 0x08, 0xe0, 0xf6, 0x4a, 0xd2,
+	  0x76, 0xb9, 0x79, 0xe1, 0xce, 0x06, 0x82, 0x79,
+	  0xe1, 0x6f, 0x7b, 0xc7, 0xe4, 0xaa, 0x1d, 0x21,
+	  0x1e, 0x17, 0xb8, 0x11, 0x61, 0xdf, 0x16, 0x02 },
+	{ 0x88, 0x65, 0x02, 0xa8, 0x2a, 0xb4, 0x7b, 0xa8,
+	  0xd8, 0x67, 0x10, 0xaa, 0x9d, 0xe3, 0xd4, 0x6e,
+	  0xa6, 0x5c, 0x47, 0xaf, 0x6e, 0xe8, 0xde, 0x45,
+	  0x0c, 0xce, 0xb8, 0xb1, 0x1b, 0x04, 0x5f, 0x50 },
+	{ 0xc0, 0x21, 0xbc, 0x5f, 0x09, 0x54, 0xfe, 0xe9,
+	  0x4f, 0x46, 0xea, 0x09, 0x48, 0x7e, 0x10, 0xa8,
+	  0x48, 0x40, 0xd0, 0x2f, 0x64, 0x81, 0x0b, 0xc0,
+	  0x8d, 0x9e, 0x55, 0x1f, 0x7d, 0x41, 0x68, 0x14 },
+	{ 0x20, 0x30, 0x51, 0x6e, 0x8a, 0x5f, 0xe1, 0x9a,
+	  0xe7, 0x9c, 0x33, 0x6f, 0xce, 0x26, 0x38, 0x2a,
+	  0x74, 0x9d, 0x3f, 0xd0, 0xec, 0x91, 0xe5, 0x37,
+	  0xd4, 0xbd, 0x23, 0x58, 0xc1, 0x2d, 0xfb, 0x22 },
+	{ 0x55, 0x66, 0x98, 0xda, 0xc8, 0x31, 0x7f, 0xd3,
+	  0x6d, 0xfb, 0xdf, 0x25, 0xa7, 0x9c, 0xb1, 0x12,
+	  0xd5, 0x42, 0x58, 0x60, 0x60, 0x5c, 0xba, 0xf5,
+	  0x07, 0xf2, 0x3b, 0xf7, 0xe9, 0xf4, 0x2a, 0xfe },
+	{ 0x2f, 0x86, 0x7b, 0xa6, 0x77, 0x73, 0xfd, 0xc3,
+	  0xe9, 0x2f, 0xce, 0xd9, 0x9a, 0x64, 0x09, 0xad,
+	  0x39, 0xd0, 0xb8, 0x80, 0xfd, 0xe8, 0xf1, 0x09,
+	  0xa8, 0x17, 0x30, 0xc4, 0x45, 0x1d, 0x01, 0x78 },
+	{ 0x17, 0x2e, 0xc2, 0x18, 0xf1, 0x19, 0xdf, 0xae,
+	  0x98, 0x89, 0x6d, 0xff, 0x29, 0xdd, 0x98, 0x76,
+	  0xc9, 0x4a, 0xf8, 0x74, 0x17, 0xf9, 0xae, 0x4c,
+	  0x70, 0x14, 0xbb, 0x4e, 0x4b, 0x96, 0xaf, 0xc7 },
+	{ 0x3f, 0x85, 0x81, 0x4a, 0x18, 0x19, 0x5f, 0x87,
+	  0x9a, 0xa9, 0x62, 0xf9, 0x5d, 0x26, 0xbd, 0x82,
+	  0xa2, 0x78, 0xf2, 0xb8, 0x23, 0x20, 0x21, 0x8f,
+	  0x6b, 0x3b, 0xd6, 0xf7, 0xf6, 0x67, 0xa6, 0xd9 },
+	{ 0x1b, 0x61, 0x8f, 0xba, 0xa5, 0x66, 0xb3, 0xd4,
+	  0x98, 0xc1, 0x2e, 0x98, 0x2c, 0x9e, 0xc5, 0x2e,
+	  0x4d, 0xa8, 0x5a, 0x8c, 0x54, 0xf3, 0x8f, 0x34,
+	  0xc0, 0x90, 0x39, 0x4f, 0x23, 0xc1, 0x84, 0xc1 },
+	{ 0x0c, 0x75, 0x8f, 0xb5, 0x69, 0x2f, 0xfd, 0x41,
+	  0xa3, 0x57, 0x5d, 0x0a, 0xf0, 0x0c, 0xc7, 0xfb,
+	  0xf2, 0xcb, 0xe5, 0x90, 0x5a, 0x58, 0x32, 0x3a,
+	  0x88, 0xae, 0x42, 0x44, 0xf6, 0xe4, 0xc9, 0x93 },
+	{ 0xa9, 0x31, 0x36, 0x0c, 0xad, 0x62, 0x8c, 0x7f,
+	  0x12, 0xa6, 0xc1, 0xc4, 0xb7, 0x53, 0xb0, 0xf4,
+	  0x06, 0x2a, 0xef, 0x3c, 0xe6, 0x5a, 0x1a, 0xe3,
+	  0xf1, 0x93, 0x69, 0xda, 0xdf, 0x3a, 0xe2, 0x3d },
+	{ 0xcb, 0xac, 0x7d, 0x77, 0x3b, 0x1e, 0x3b, 0x3c,
+	  0x66, 0x91, 0xd7, 0xab, 0xb7, 0xe9, 0xdf, 0x04,
+	  0x5c, 0x8b, 0xa1, 0x92, 0x68, 0xde, 0xd1, 0x53,
+	  0x20, 0x7f, 0x5e, 0x80, 0x43, 0x52, 0xec, 0x5d },
+	{ 0x23, 0xa1, 0x96, 0xd3, 0x80, 0x2e, 0xd3, 0xc1,
+	  0xb3, 0x84, 0x01, 0x9a, 0x82, 0x32, 0x58, 0x40,
+	  0xd3, 0x2f, 0x71, 0x95, 0x0c, 0x45, 0x80, 0xb0,
+	  0x34, 0x45, 0xe0, 0x89, 0x8e, 0x14, 0x05, 0x3c },
+	{ 0xf4, 0x49, 0x54, 0x70, 0xf2, 0x26, 0xc8, 0xc2,
+	  0x14, 0xbe, 0x08, 0xfd, 0xfa, 0xd4, 0xbc, 0x4a,
+	  0x2a, 0x9d, 0xbe, 0xa9, 0x13, 0x6a, 0x21, 0x0d,
+	  0xf0, 0xd4, 0xb6, 0x49, 0x29, 0xe6, 0xfc, 0x14 },
+	{ 0xe2, 0x90, 0xdd, 0x27, 0x0b, 0x46, 0x7f, 0x34,
+	  0xab, 0x1c, 0x00, 0x2d, 0x34, 0x0f, 0xa0, 0x16,
+	  0x25, 0x7f, 0xf1, 0x9e, 0x58, 0x33, 0xfd, 0xbb,
+	  0xf2, 0xcb, 0x40, 0x1c, 0x3b, 0x28, 0x17, 0xde },
+	{ 0x9f, 0xc7, 0xb5, 0xde, 0xd3, 0xc1, 0x50, 0x42,
+	  0xb2, 0xa6, 0x58, 0x2d, 0xc3, 0x9b, 0xe0, 0x16,
+	  0xd2, 0x4a, 0x68, 0x2d, 0x5e, 0x61, 0xad, 0x1e,
+	  0xff, 0x9c, 0x63, 0x30, 0x98, 0x48, 0xf7, 0x06 },
+	{ 0x8c, 0xca, 0x67, 0xa3, 0x6d, 0x17, 0xd5, 0xe6,
+	  0x34, 0x1c, 0xb5, 0x92, 0xfd, 0x7b, 0xef, 0x99,
+	  0x26, 0xc9, 0xe3, 0xaa, 0x10, 0x27, 0xea, 0x11,
+	  0xa7, 0xd8, 0xbd, 0x26, 0x0b, 0x57, 0x6e, 0x04 },
+	{ 0x40, 0x93, 0x92, 0xf5, 0x60, 0xf8, 0x68, 0x31,
+	  0xda, 0x43, 0x73, 0xee, 0x5e, 0x00, 0x74, 0x26,
+	  0x05, 0x95, 0xd7, 0xbc, 0x24, 0x18, 0x3b, 0x60,
+	  0xed, 0x70, 0x0d, 0x45, 0x83, 0xd3, 0xf6, 0xf0 },
+	{ 0x28, 0x02, 0x16, 0x5d, 0xe0, 0x90, 0x91, 0x55,
+	  0x46, 0xf3, 0x39, 0x8c, 0xd8, 0x49, 0x16, 0x4a,
+	  0x19, 0xf9, 0x2a, 0xdb, 0xc3, 0x61, 0xad, 0xc9,
+	  0x9b, 0x0f, 0x20, 0xc8, 0xea, 0x07, 0x10, 0x54 },
+	{ 0xad, 0x83, 0x91, 0x68, 0xd9, 0xf8, 0xa4, 0xbe,
+	  0x95, 0xba, 0x9e, 0xf9, 0xa6, 0x92, 0xf0, 0x72,
+	  0x56, 0xae, 0x43, 0xfe, 0x6f, 0x98, 0x64, 0xe2,
+	  0x90, 0x69, 0x1b, 0x02, 0x56, 0xce, 0x50, 0xa9 },
+	{ 0x75, 0xfd, 0xaa, 0x50, 0x38, 0xc2, 0x84, 0xb8,
+	  0x6d, 0x6e, 0x8a, 0xff, 0xe8, 0xb2, 0x80, 0x7e,
+	  0x46, 0x7b, 0x86, 0x60, 0x0e, 0x79, 0xaf, 0x36,
+	  0x89, 0xfb, 0xc0, 0x63, 0x28, 0xcb, 0xf8, 0x94 },
+	{ 0xe5, 0x7c, 0xb7, 0x94, 0x87, 0xdd, 0x57, 0x90,
+	  0x24, 0x32, 0xb2, 0x50, 0x73, 0x38, 0x13, 0xbd,
+	  0x96, 0xa8, 0x4e, 0xfc, 0xe5, 0x9f, 0x65, 0x0f,
+	  0xac, 0x26, 0xe6, 0x69, 0x6a, 0xef, 0xaf, 0xc3 },
+	{ 0x56, 0xf3, 0x4e, 0x8b, 0x96, 0x55, 0x7e, 0x90,
+	  0xc1, 0xf2, 0x4b, 0x52, 0xd0, 0xc8, 0x9d, 0x51,
+	  0x08, 0x6a, 0xcf, 0x1b, 0x00, 0xf6, 0x34, 0xcf,
+	  0x1d, 0xde, 0x92, 0x33, 0xb8, 0xea, 0xaa, 0x3e },
+	{ 0x1b, 0x53, 0xee, 0x94, 0xaa, 0xf3, 0x4e, 0x4b,
+	  0x15, 0x9d, 0x48, 0xde, 0x35, 0x2c, 0x7f, 0x06,
+	  0x61, 0xd0, 0xa4, 0x0e, 0xdf, 0xf9, 0x5a, 0x0b,
+	  0x16, 0x39, 0xb4, 0x09, 0x0e, 0x97, 0x44, 0x72 },
+	{ 0x05, 0x70, 0x5e, 0x2a, 0x81, 0x75, 0x7c, 0x14,
+	  0xbd, 0x38, 0x3e, 0xa9, 0x8d, 0xda, 0x54, 0x4e,
+	  0xb1, 0x0e, 0x6b, 0xc0, 0x7b, 0xae, 0x43, 0x5e,
+	  0x25, 0x18, 0xdb, 0xe1, 0x33, 0x52, 0x53, 0x75 },
+	{ 0xd8, 0xb2, 0x86, 0x6e, 0x8a, 0x30, 0x9d, 0xb5,
+	  0x3e, 0x52, 0x9e, 0xc3, 0x29, 0x11, 0xd8, 0x2f,
+	  0x5c, 0xa1, 0x6c, 0xff, 0x76, 0x21, 0x68, 0x91,
+	  0xa9, 0x67, 0x6a, 0xa3, 0x1a, 0xaa, 0x6c, 0x42 },
+	{ 0xf5, 0x04, 0x1c, 0x24, 0x12, 0x70, 0xeb, 0x04,
+	  0xc7, 0x1e, 0xc2, 0xc9, 0x5d, 0x4c, 0x38, 0xd8,
+	  0x03, 0xb1, 0x23, 0x7b, 0x0f, 0x29, 0xfd, 0x4d,
+	  0xb3, 0xeb, 0x39, 0x76, 0x69, 0xe8, 0x86, 0x99 },
+	{ 0x9a, 0x4c, 0xe0, 0x77, 0xc3, 0x49, 0x32, 0x2f,
+	  0x59, 0x5e, 0x0e, 0xe7, 0x9e, 0xd0, 0xda, 0x5f,
+	  0xab, 0x66, 0x75, 0x2c, 0xbf, 0xef, 0x8f, 0x87,
+	  0xd0, 0xe9, 0xd0, 0x72, 0x3c, 0x75, 0x30, 0xdd },
+	{ 0x65, 0x7b, 0x09, 0xf3, 0xd0, 0xf5, 0x2b, 0x5b,
+	  0x8f, 0x2f, 0x97, 0x16, 0x3a, 0x0e, 0xdf, 0x0c,
+	  0x04, 0xf0, 0x75, 0x40, 0x8a, 0x07, 0xbb, 0xeb,
+	  0x3a, 0x41, 0x01, 0xa8, 0x91, 0x99, 0x0d, 0x62 },
+	{ 0x1e, 0x3f, 0x7b, 0xd5, 0xa5, 0x8f, 0xa5, 0x33,
+	  0x34, 0x4a, 0xa8, 0xed, 0x3a, 0xc1, 0x22, 0xbb,
+	  0x9e, 0x70, 0xd4, 0xef, 0x50, 0xd0, 0x04, 0x53,
+	  0x08, 0x21, 0x94, 0x8f, 0x5f, 0xe6, 0x31, 0x5a },
+	{ 0x80, 0xdc, 0xcf, 0x3f, 0xd8, 0x3d, 0xfd, 0x0d,
+	  0x35, 0xaa, 0x28, 0x58, 0x59, 0x22, 0xab, 0x89,
+	  0xd5, 0x31, 0x39, 0x97, 0x67, 0x3e, 0xaf, 0x90,
+	  0x5c, 0xea, 0x9c, 0x0b, 0x22, 0x5c, 0x7b, 0x5f },
+	{ 0x8a, 0x0d, 0x0f, 0xbf, 0x63, 0x77, 0xd8, 0x3b,
+	  0xb0, 0x8b, 0x51, 0x4b, 0x4b, 0x1c, 0x43, 0xac,
+	  0xc9, 0x5d, 0x75, 0x17, 0x14, 0xf8, 0x92, 0x56,
+	  0x45, 0xcb, 0x6b, 0xc8, 0x56, 0xca, 0x15, 0x0a },
+	{ 0x9f, 0xa5, 0xb4, 0x87, 0x73, 0x8a, 0xd2, 0x84,
+	  0x4c, 0xc6, 0x34, 0x8a, 0x90, 0x19, 0x18, 0xf6,
+	  0x59, 0xa3, 0xb8, 0x9e, 0x9c, 0x0d, 0xfe, 0xea,
+	  0xd3, 0x0d, 0xd9, 0x4b, 0xcf, 0x42, 0xef, 0x8e },
+	{ 0x80, 0x83, 0x2c, 0x4a, 0x16, 0x77, 0xf5, 0xea,
+	  0x25, 0x60, 0xf6, 0x68, 0xe9, 0x35, 0x4d, 0xd3,
+	  0x69, 0x97, 0xf0, 0x37, 0x28, 0xcf, 0xa5, 0x5e,
+	  0x1b, 0x38, 0x33, 0x7c, 0x0c, 0x9e, 0xf8, 0x18 },
+	{ 0xab, 0x37, 0xdd, 0xb6, 0x83, 0x13, 0x7e, 0x74,
+	  0x08, 0x0d, 0x02, 0x6b, 0x59, 0x0b, 0x96, 0xae,
+	  0x9b, 0xb4, 0x47, 0x72, 0x2f, 0x30, 0x5a, 0x5a,
+	  0xc5, 0x70, 0xec, 0x1d, 0xf9, 0xb1, 0x74, 0x3c },
+	{ 0x3e, 0xe7, 0x35, 0xa6, 0x94, 0xc2, 0x55, 0x9b,
+	  0x69, 0x3a, 0xa6, 0x86, 0x29, 0x36, 0x1e, 0x15,
+	  0xd1, 0x22, 0x65, 0xad, 0x6a, 0x3d, 0xed, 0xf4,
+	  0x88, 0xb0, 0xb0, 0x0f, 0xac, 0x97, 0x54, 0xba },
+	{ 0xd6, 0xfc, 0xd2, 0x32, 0x19, 0xb6, 0x47, 0xe4,
+	  0xcb, 0xd5, 0xeb, 0x2d, 0x0a, 0xd0, 0x1e, 0xc8,
+	  0x83, 0x8a, 0x4b, 0x29, 0x01, 0xfc, 0x32, 0x5c,
+	  0xc3, 0x70, 0x19, 0x81, 0xca, 0x6c, 0x88, 0x8b },
+	{ 0x05, 0x20, 0xec, 0x2f, 0x5b, 0xf7, 0xa7, 0x55,
+	  0xda, 0xcb, 0x50, 0xc6, 0xbf, 0x23, 0x3e, 0x35,
+	  0x15, 0x43, 0x47, 0x63, 0xdb, 0x01, 0x39, 0xcc,
+	  0xd9, 0xfa, 0xef, 0xbb, 0x82, 0x07, 0x61, 0x2d },
+	{ 0xaf, 0xf3, 0xb7, 0x5f, 0x3f, 0x58, 0x12, 0x64,
+	  0xd7, 0x66, 0x16, 0x62, 0xb9, 0x2f, 0x5a, 0xd3,
+	  0x7c, 0x1d, 0x32, 0xbd, 0x45, 0xff, 0x81, 0xa4,
+	  0xed, 0x8a, 0xdc, 0x9e, 0xf3, 0x0d, 0xd9, 0x89 },
+	{ 0xd0, 0xdd, 0x65, 0x0b, 0xef, 0xd3, 0xba, 0x63,
+	  0xdc, 0x25, 0x10, 0x2c, 0x62, 0x7c, 0x92, 0x1b,
+	  0x9c, 0xbe, 0xb0, 0xb1, 0x30, 0x68, 0x69, 0x35,
+	  0xb5, 0xc9, 0x27, 0xcb, 0x7c, 0xcd, 0x5e, 0x3b },
+	{ 0xe1, 0x14, 0x98, 0x16, 0xb1, 0x0a, 0x85, 0x14,
+	  0xfb, 0x3e, 0x2c, 0xab, 0x2c, 0x08, 0xbe, 0xe9,
+	  0xf7, 0x3c, 0xe7, 0x62, 0x21, 0x70, 0x12, 0x46,
+	  0xa5, 0x89, 0xbb, 0xb6, 0x73, 0x02, 0xd8, 0xa9 },
+	{ 0x7d, 0xa3, 0xf4, 0x41, 0xde, 0x90, 0x54, 0x31,
+	  0x7e, 0x72, 0xb5, 0xdb, 0xf9, 0x79, 0xda, 0x01,
+	  0xe6, 0xbc, 0xee, 0xbb, 0x84, 0x78, 0xea, 0xe6,
+	  0xa2, 0x28, 0x49, 0xd9, 0x02, 0x92, 0x63, 0x5c },
+	{ 0x12, 0x30, 0xb1, 0xfc, 0x8a, 0x7d, 0x92, 0x15,
+	  0xed, 0xc2, 0xd4, 0xa2, 0xde, 0xcb, 0xdd, 0x0a,
+	  0x6e, 0x21, 0x6c, 0x92, 0x42, 0x78, 0xc9, 0x1f,
+	  0xc5, 0xd1, 0x0e, 0x7d, 0x60, 0x19, 0x2d, 0x94 },
+	{ 0x57, 0x50, 0xd7, 0x16, 0xb4, 0x80, 0x8f, 0x75,
+	  0x1f, 0xeb, 0xc3, 0x88, 0x06, 0xba, 0x17, 0x0b,
+	  0xf6, 0xd5, 0x19, 0x9a, 0x78, 0x16, 0xbe, 0x51,
+	  0x4e, 0x3f, 0x93, 0x2f, 0xbe, 0x0c, 0xb8, 0x71 },
+	{ 0x6f, 0xc5, 0x9b, 0x2f, 0x10, 0xfe, 0xba, 0x95,
+	  0x4a, 0xa6, 0x82, 0x0b, 0x3c, 0xa9, 0x87, 0xee,
+	  0x81, 0xd5, 0xcc, 0x1d, 0xa3, 0xc6, 0x3c, 0xe8,
+	  0x27, 0x30, 0x1c, 0x56, 0x9d, 0xfb, 0x39, 0xce },
+	{ 0xc7, 0xc3, 0xfe, 0x1e, 0xeb, 0xdc, 0x7b, 0x5a,
+	  0x93, 0x93, 0x26, 0xe8, 0xdd, 0xb8, 0x3e, 0x8b,
+	  0xf2, 0xb7, 0x80, 0xb6, 0x56, 0x78, 0xcb, 0x62,
+	  0xf2, 0x08, 0xb0, 0x40, 0xab, 0xdd, 0x35, 0xe2 },
+	{ 0x0c, 0x75, 0xc1, 0xa1, 0x5c, 0xf3, 0x4a, 0x31,
+	  0x4e, 0xe4, 0x78, 0xf4, 0xa5, 0xce, 0x0b, 0x8a,
+	  0x6b, 0x36, 0x52, 0x8e, 0xf7, 0xa8, 0x20, 0x69,
+	  0x6c, 0x3e, 0x42, 0x46, 0xc5, 0xa1, 0x58, 0x64 },
+	{ 0x21, 0x6d, 0xc1, 0x2a, 0x10, 0x85, 0x69, 0xa3,
+	  0xc7, 0xcd, 0xde, 0x4a, 0xed, 0x43, 0xa6, 0xc3,
+	  0x30, 0x13, 0x9d, 0xda, 0x3c, 0xcc, 0x4a, 0x10,
+	  0x89, 0x05, 0xdb, 0x38, 0x61, 0x89, 0x90, 0x50 },
+	{ 0xa5, 0x7b, 0xe6, 0xae, 0x67, 0x56, 0xf2, 0x8b,
+	  0x02, 0xf5, 0x9d, 0xad, 0xf7, 0xe0, 0xd7, 0xd8,
+	  0x80, 0x7f, 0x10, 0xfa, 0x15, 0xce, 0xd1, 0xad,
+	  0x35, 0x85, 0x52, 0x1a, 0x1d, 0x99, 0x5a, 0x89 },
+	{ 0x81, 0x6a, 0xef, 0x87, 0x59, 0x53, 0x71, 0x6c,
+	  0xd7, 0xa5, 0x81, 0xf7, 0x32, 0xf5, 0x3d, 0xd4,
+	  0x35, 0xda, 0xb6, 0x6d, 0x09, 0xc3, 0x61, 0xd2,
+	  0xd6, 0x59, 0x2d, 0xe1, 0x77, 0x55, 0xd8, 0xa8 },
+	{ 0x9a, 0x76, 0x89, 0x32, 0x26, 0x69, 0x3b, 0x6e,
+	  0xa9, 0x7e, 0x6a, 0x73, 0x8f, 0x9d, 0x10, 0xfb,
+	  0x3d, 0x0b, 0x43, 0xae, 0x0e, 0x8b, 0x7d, 0x81,
+	  0x23, 0xea, 0x76, 0xce, 0x97, 0x98, 0x9c, 0x7e },
+	{ 0x8d, 0xae, 0xdb, 0x9a, 0x27, 0x15, 0x29, 0xdb,
+	  0xb7, 0xdc, 0x3b, 0x60, 0x7f, 0xe5, 0xeb, 0x2d,
+	  0x32, 0x11, 0x77, 0x07, 0x58, 0xdd, 0x3b, 0x0a,
+	  0x35, 0x93, 0xd2, 0xd7, 0x95, 0x4e, 0x2d, 0x5b },
+	{ 0x16, 0xdb, 0xc0, 0xaa, 0x5d, 0xd2, 0xc7, 0x74,
+	  0xf5, 0x05, 0x10, 0x0f, 0x73, 0x37, 0x86, 0xd8,
+	  0xa1, 0x75, 0xfc, 0xbb, 0xb5, 0x9c, 0x43, 0xe1,
+	  0xfb, 0xff, 0x3e, 0x1e, 0xaf, 0x31, 0xcb, 0x4a },
+	{ 0x86, 0x06, 0xcb, 0x89, 0x9c, 0x6a, 0xea, 0xf5,
+	  0x1b, 0x9d, 0xb0, 0xfe, 0x49, 0x24, 0xa9, 0xfd,
+	  0x5d, 0xab, 0xc1, 0x9f, 0x88, 0x26, 0xf2, 0xbc,
+	  0x1c, 0x1d, 0x7d, 0xa1, 0x4d, 0x2c, 0x2c, 0x99 },
+	{ 0x84, 0x79, 0x73, 0x1a, 0xed, 0xa5, 0x7b, 0xd3,
+	  0x7e, 0xad, 0xb5, 0x1a, 0x50, 0x7e, 0x30, 0x7f,
+	  0x3b, 0xd9, 0x5e, 0x69, 0xdb, 0xca, 0x94, 0xf3,
+	  0xbc, 0x21, 0x72, 0x60, 0x66, 0xad, 0x6d, 0xfd },
+	{ 0x58, 0x47, 0x3a, 0x9e, 0xa8, 0x2e, 0xfa, 0x3f,
+	  0x3b, 0x3d, 0x8f, 0xc8, 0x3e, 0xd8, 0x86, 0x31,
+	  0x27, 0xb3, 0x3a, 0xe8, 0xde, 0xae, 0x63, 0x07,
+	  0x20, 0x1e, 0xdb, 0x6d, 0xde, 0x61, 0xde, 0x29 },
+	{ 0x9a, 0x92, 0x55, 0xd5, 0x3a, 0xf1, 0x16, 0xde,
+	  0x8b, 0xa2, 0x7c, 0xe3, 0x5b, 0x4c, 0x7e, 0x15,
+	  0x64, 0x06, 0x57, 0xa0, 0xfc, 0xb8, 0x88, 0xc7,
+	  0x0d, 0x95, 0x43, 0x1d, 0xac, 0xd8, 0xf8, 0x30 },
+	{ 0x9e, 0xb0, 0x5f, 0xfb, 0xa3, 0x9f, 0xd8, 0x59,
+	  0x6a, 0x45, 0x49, 0x3e, 0x18, 0xd2, 0x51, 0x0b,
+	  0xf3, 0xef, 0x06, 0x5c, 0x51, 0xd6, 0xe1, 0x3a,
+	  0xbe, 0x66, 0xaa, 0x57, 0xe0, 0x5c, 0xfd, 0xb7 },
+	{ 0x81, 0xdc, 0xc3, 0xa5, 0x05, 0xea, 0xce, 0x3f,
+	  0x87, 0x9d, 0x8f, 0x70, 0x27, 0x76, 0x77, 0x0f,
+	  0x9d, 0xf5, 0x0e, 0x52, 0x1d, 0x14, 0x28, 0xa8,
+	  0x5d, 0xaf, 0x04, 0xf9, 0xad, 0x21, 0x50, 0xe0 },
+	{ 0xe3, 0xe3, 0xc4, 0xaa, 0x3a, 0xcb, 0xbc, 0x85,
+	  0x33, 0x2a, 0xf9, 0xd5, 0x64, 0xbc, 0x24, 0x16,
+	  0x5e, 0x16, 0x87, 0xf6, 0xb1, 0xad, 0xcb, 0xfa,
+	  0xe7, 0x7a, 0x8f, 0x03, 0xc7, 0x2a, 0xc2, 0x8c },
+	{ 0x67, 0x46, 0xc8, 0x0b, 0x4e, 0xb5, 0x6a, 0xea,
+	  0x45, 0xe6, 0x4e, 0x72, 0x89, 0xbb, 0xa3, 0xed,
+	  0xbf, 0x45, 0xec, 0xf8, 0x20, 0x64, 0x81, 0xff,
+	  0x63, 0x02, 0x12, 0x29, 0x84, 0xcd, 0x52, 0x6a },
+	{ 0x2b, 0x62, 0x8e, 0x52, 0x76, 0x4d, 0x7d, 0x62,
+	  0xc0, 0x86, 0x8b, 0x21, 0x23, 0x57, 0xcd, 0xd1,
+	  0x2d, 0x91, 0x49, 0x82, 0x2f, 0x4e, 0x98, 0x45,
+	  0xd9, 0x18, 0xa0, 0x8d, 0x1a, 0xe9, 0x90, 0xc0 },
+	{ 0xe4, 0xbf, 0xe8, 0x0d, 0x58, 0xc9, 0x19, 0x94,
+	  0x61, 0x39, 0x09, 0xdc, 0x4b, 0x1a, 0x12, 0x49,
+	  0x68, 0x96, 0xc0, 0x04, 0xaf, 0x7b, 0x57, 0x01,
+	  0x48, 0x3d, 0xe4, 0x5d, 0x28, 0x23, 0xd7, 0x8e },
+	{ 0xeb, 0xb4, 0xba, 0x15, 0x0c, 0xef, 0x27, 0x34,
+	  0x34, 0x5b, 0x5d, 0x64, 0x1b, 0xbe, 0xd0, 0x3a,
+	  0x21, 0xea, 0xfa, 0xe9, 0x33, 0xc9, 0x9e, 0x00,
+	  0x92, 0x12, 0xef, 0x04, 0x57, 0x4a, 0x85, 0x30 },
+	{ 0x39, 0x66, 0xec, 0x73, 0xb1, 0x54, 0xac, 0xc6,
+	  0x97, 0xac, 0x5c, 0xf5, 0xb2, 0x4b, 0x40, 0xbd,
+	  0xb0, 0xdb, 0x9e, 0x39, 0x88, 0x36, 0xd7, 0x6d,
+	  0x4b, 0x88, 0x0e, 0x3b, 0x2a, 0xf1, 0xaa, 0x27 },
+	{ 0xef, 0x7e, 0x48, 0x31, 0xb3, 0xa8, 0x46, 0x36,
+	  0x51, 0x8d, 0x6e, 0x4b, 0xfc, 0xe6, 0x4a, 0x43,
+	  0xdb, 0x2a, 0x5d, 0xda, 0x9c, 0xca, 0x2b, 0x44,
+	  0xf3, 0x90, 0x33, 0xbd, 0xc4, 0x0d, 0x62, 0x43 },
+	{ 0x7a, 0xbf, 0x6a, 0xcf, 0x5c, 0x8e, 0x54, 0x9d,
+	  0xdb, 0xb1, 0x5a, 0xe8, 0xd8, 0xb3, 0x88, 0xc1,
+	  0xc1, 0x97, 0xe6, 0x98, 0x73, 0x7c, 0x97, 0x85,
+	  0x50, 0x1e, 0xd1, 0xf9, 0x49, 0x30, 0xb7, 0xd9 },
+	{ 0x88, 0x01, 0x8d, 0xed, 0x66, 0x81, 0x3f, 0x0c,
+	  0xa9, 0x5d, 0xef, 0x47, 0x4c, 0x63, 0x06, 0x92,
+	  0x01, 0x99, 0x67, 0xb9, 0xe3, 0x68, 0x88, 0xda,
+	  0xdd, 0x94, 0x12, 0x47, 0x19, 0xb6, 0x82, 0xf6 },
+	{ 0x39, 0x30, 0x87, 0x6b, 0x9f, 0xc7, 0x52, 0x90,
+	  0x36, 0xb0, 0x08, 0xb1, 0xb8, 0xbb, 0x99, 0x75,
+	  0x22, 0xa4, 0x41, 0x63, 0x5a, 0x0c, 0x25, 0xec,
+	  0x02, 0xfb, 0x6d, 0x90, 0x26, 0xe5, 0x5a, 0x97 },
+	{ 0x0a, 0x40, 0x49, 0xd5, 0x7e, 0x83, 0x3b, 0x56,
+	  0x95, 0xfa, 0xc9, 0x3d, 0xd1, 0xfb, 0xef, 0x31,
+	  0x66, 0xb4, 0x4b, 0x12, 0xad, 0x11, 0x24, 0x86,
+	  0x62, 0x38, 0x3a, 0xe0, 0x51, 0xe1, 0x58, 0x27 },
+	{ 0x81, 0xdc, 0xc0, 0x67, 0x8b, 0xb6, 0xa7, 0x65,
+	  0xe4, 0x8c, 0x32, 0x09, 0x65, 0x4f, 0xe9, 0x00,
+	  0x89, 0xce, 0x44, 0xff, 0x56, 0x18, 0x47, 0x7e,
+	  0x39, 0xab, 0x28, 0x64, 0x76, 0xdf, 0x05, 0x2b },
+	{ 0xe6, 0x9b, 0x3a, 0x36, 0xa4, 0x46, 0x19, 0x12,
+	  0xdc, 0x08, 0x34, 0x6b, 0x11, 0xdd, 0xcb, 0x9d,
+	  0xb7, 0x96, 0xf8, 0x85, 0xfd, 0x01, 0x93, 0x6e,
+	  0x66, 0x2f, 0xe2, 0x92, 0x97, 0xb0, 0x99, 0xa4 },
+	{ 0x5a, 0xc6, 0x50, 0x3b, 0x0d, 0x8d, 0xa6, 0x91,
+	  0x76, 0x46, 0xe6, 0xdc, 0xc8, 0x7e, 0xdc, 0x58,
+	  0xe9, 0x42, 0x45, 0x32, 0x4c, 0xc2, 0x04, 0xf4,
+	  0xdd, 0x4a, 0xf0, 0x15, 0x63, 0xac, 0xd4, 0x27 },
+	{ 0xdf, 0x6d, 0xda, 0x21, 0x35, 0x9a, 0x30, 0xbc,
+	  0x27, 0x17, 0x80, 0x97, 0x1c, 0x1a, 0xbd, 0x56,
+	  0xa6, 0xef, 0x16, 0x7e, 0x48, 0x08, 0x87, 0x88,
+	  0x8e, 0x73, 0xa8, 0x6d, 0x3b, 0xf6, 0x05, 0xe9 },
+	{ 0xe8, 0xe6, 0xe4, 0x70, 0x71, 0xe7, 0xb7, 0xdf,
+	  0x25, 0x80, 0xf2, 0x25, 0xcf, 0xbb, 0xed, 0xf8,
+	  0x4c, 0xe6, 0x77, 0x46, 0x62, 0x66, 0x28, 0xd3,
+	  0x30, 0x97, 0xe4, 0xb7, 0xdc, 0x57, 0x11, 0x07 },
+	{ 0x53, 0xe4, 0x0e, 0xad, 0x62, 0x05, 0x1e, 0x19,
+	  0xcb, 0x9b, 0xa8, 0x13, 0x3e, 0x3e, 0x5c, 0x1c,
+	  0xe0, 0x0d, 0xdc, 0xad, 0x8a, 0xcf, 0x34, 0x2a,
+	  0x22, 0x43, 0x60, 0xb0, 0xac, 0xc1, 0x47, 0x77 },
+	{ 0x9c, 0xcd, 0x53, 0xfe, 0x80, 0xbe, 0x78, 0x6a,
+	  0xa9, 0x84, 0x63, 0x84, 0x62, 0xfb, 0x28, 0xaf,
+	  0xdf, 0x12, 0x2b, 0x34, 0xd7, 0x8f, 0x46, 0x87,
+	  0xec, 0x63, 0x2b, 0xb1, 0x9d, 0xe2, 0x37, 0x1a },
+	{ 0xcb, 0xd4, 0x80, 0x52, 0xc4, 0x8d, 0x78, 0x84,
+	  0x66, 0xa3, 0xe8, 0x11, 0x8c, 0x56, 0xc9, 0x7f,
+	  0xe1, 0x46, 0xe5, 0x54, 0x6f, 0xaa, 0xf9, 0x3e,
+	  0x2b, 0xc3, 0xc4, 0x7e, 0x45, 0x93, 0x97, 0x53 },
+	{ 0x25, 0x68, 0x83, 0xb1, 0x4e, 0x2a, 0xf4, 0x4d,
+	  0xad, 0xb2, 0x8e, 0x1b, 0x34, 0xb2, 0xac, 0x0f,
+	  0x0f, 0x4c, 0x91, 0xc3, 0x4e, 0xc9, 0x16, 0x9e,
+	  0x29, 0x03, 0x61, 0x58, 0xac, 0xaa, 0x95, 0xb9 },
+	{ 0x44, 0x71, 0xb9, 0x1a, 0xb4, 0x2d, 0xb7, 0xc4,
+	  0xdd, 0x84, 0x90, 0xab, 0x95, 0xa2, 0xee, 0x8d,
+	  0x04, 0xe3, 0xef, 0x5c, 0x3d, 0x6f, 0xc7, 0x1a,
+	  0xc7, 0x4b, 0x2b, 0x26, 0x91, 0x4d, 0x16, 0x41 },
+	{ 0xa5, 0xeb, 0x08, 0x03, 0x8f, 0x8f, 0x11, 0x55,
+	  0xed, 0x86, 0xe6, 0x31, 0x90, 0x6f, 0xc1, 0x30,
+	  0x95, 0xf6, 0xbb, 0xa4, 0x1d, 0xe5, 0xd4, 0xe7,
+	  0x95, 0x75, 0x8e, 0xc8, 0xc8, 0xdf, 0x8a, 0xf1 },
+	{ 0xdc, 0x1d, 0xb6, 0x4e, 0xd8, 0xb4, 0x8a, 0x91,
+	  0x0e, 0x06, 0x0a, 0x6b, 0x86, 0x63, 0x74, 0xc5,
+	  0x78, 0x78, 0x4e, 0x9a, 0xc4, 0x9a, 0xb2, 0x77,
+	  0x40, 0x92, 0xac, 0x71, 0x50, 0x19, 0x34, 0xac },
+	{ 0x28, 0x54, 0x13, 0xb2, 0xf2, 0xee, 0x87, 0x3d,
+	  0x34, 0x31, 0x9e, 0xe0, 0xbb, 0xfb, 0xb9, 0x0f,
+	  0x32, 0xda, 0x43, 0x4c, 0xc8, 0x7e, 0x3d, 0xb5,
+	  0xed, 0x12, 0x1b, 0xb3, 0x98, 0xed, 0x96, 0x4b },
+	{ 0x02, 0x16, 0xe0, 0xf8, 0x1f, 0x75, 0x0f, 0x26,
+	  0xf1, 0x99, 0x8b, 0xc3, 0x93, 0x4e, 0x3e, 0x12,
+	  0x4c, 0x99, 0x45, 0xe6, 0x85, 0xa6, 0x0b, 0x25,
+	  0xe8, 0xfb, 0xd9, 0x62, 0x5a, 0xb6, 0xb5, 0x99 },
+	{ 0x38, 0xc4, 0x10, 0xf5, 0xb9, 0xd4, 0x07, 0x20,
+	  0x50, 0x75, 0x5b, 0x31, 0xdc, 0xa8, 0x9f, 0xd5,
+	  0x39, 0x5c, 0x67, 0x85, 0xee, 0xb3, 0xd7, 0x90,
+	  0xf3, 0x20, 0xff, 0x94, 0x1c, 0x5a, 0x93, 0xbf },
+	{ 0xf1, 0x84, 0x17, 0xb3, 0x9d, 0x61, 0x7a, 0xb1,
+	  0xc1, 0x8f, 0xdf, 0x91, 0xeb, 0xd0, 0xfc, 0x6d,
+	  0x55, 0x16, 0xbb, 0x34, 0xcf, 0x39, 0x36, 0x40,
+	  0x37, 0xbc, 0xe8, 0x1f, 0xa0, 0x4c, 0xec, 0xb1 },
+	{ 0x1f, 0xa8, 0x77, 0xde, 0x67, 0x25, 0x9d, 0x19,
+	  0x86, 0x3a, 0x2a, 0x34, 0xbc, 0xc6, 0x96, 0x2a,
+	  0x2b, 0x25, 0xfc, 0xbf, 0x5c, 0xbe, 0xcd, 0x7e,
+	  0xde, 0x8f, 0x1f, 0xa3, 0x66, 0x88, 0xa7, 0x96 },
+	{ 0x5b, 0xd1, 0x69, 0xe6, 0x7c, 0x82, 0xc2, 0xc2,
+	  0xe9, 0x8e, 0xf7, 0x00, 0x8b, 0xdf, 0x26, 0x1f,
+	  0x2d, 0xdf, 0x30, 0xb1, 0xc0, 0x0f, 0x9e, 0x7f,
+	  0x27, 0x5b, 0xb3, 0xe8, 0xa2, 0x8d, 0xc9, 0xa2 },
+	{ 0xc8, 0x0a, 0xbe, 0xeb, 0xb6, 0x69, 0xad, 0x5d,
+	  0xee, 0xb5, 0xf5, 0xec, 0x8e, 0xa6, 0xb7, 0xa0,
+	  0x5d, 0xdf, 0x7d, 0x31, 0xec, 0x4c, 0x0a, 0x2e,
+	  0xe2, 0x0b, 0x0b, 0x98, 0xca, 0xec, 0x67, 0x46 },
+	{ 0xe7, 0x6d, 0x3f, 0xbd, 0xa5, 0xba, 0x37, 0x4e,
+	  0x6b, 0xf8, 0xe5, 0x0f, 0xad, 0xc3, 0xbb, 0xb9,
+	  0xba, 0x5c, 0x20, 0x6e, 0xbd, 0xec, 0x89, 0xa3,
+	  0xa5, 0x4c, 0xf3, 0xdd, 0x84, 0xa0, 0x70, 0x16 },
+	{ 0x7b, 0xba, 0x9d, 0xc5, 0xb5, 0xdb, 0x20, 0x71,
+	  0xd1, 0x77, 0x52, 0xb1, 0x04, 0x4c, 0x1e, 0xce,
+	  0xd9, 0x6a, 0xaf, 0x2d, 0xd4, 0x6e, 0x9b, 0x43,
+	  0x37, 0x50, 0xe8, 0xea, 0x0d, 0xcc, 0x18, 0x70 },
+	{ 0xf2, 0x9b, 0x1b, 0x1a, 0xb9, 0xba, 0xb1, 0x63,
+	  0x01, 0x8e, 0xe3, 0xda, 0x15, 0x23, 0x2c, 0xca,
+	  0x78, 0xec, 0x52, 0xdb, 0xc3, 0x4e, 0xda, 0x5b,
+	  0x82, 0x2e, 0xc1, 0xd8, 0x0f, 0xc2, 0x1b, 0xd0 },
+	{ 0x9e, 0xe3, 0xe3, 0xe7, 0xe9, 0x00, 0xf1, 0xe1,
+	  0x1d, 0x30, 0x8c, 0x4b, 0x2b, 0x30, 0x76, 0xd2,
+	  0x72, 0xcf, 0x70, 0x12, 0x4f, 0x9f, 0x51, 0xe1,
+	  0xda, 0x60, 0xf3, 0x78, 0x46, 0xcd, 0xd2, 0xf4 },
+	{ 0x70, 0xea, 0x3b, 0x01, 0x76, 0x92, 0x7d, 0x90,
+	  0x96, 0xa1, 0x85, 0x08, 0xcd, 0x12, 0x3a, 0x29,
+	  0x03, 0x25, 0x92, 0x0a, 0x9d, 0x00, 0xa8, 0x9b,
+	  0x5d, 0xe0, 0x42, 0x73, 0xfb, 0xc7, 0x6b, 0x85 },
+	{ 0x67, 0xde, 0x25, 0xc0, 0x2a, 0x4a, 0xab, 0xa2,
+	  0x3b, 0xdc, 0x97, 0x3c, 0x8b, 0xb0, 0xb5, 0x79,
+	  0x6d, 0x47, 0xcc, 0x06, 0x59, 0xd4, 0x3d, 0xff,
+	  0x1f, 0x97, 0xde, 0x17, 0x49, 0x63, 0xb6, 0x8e },
+	{ 0xb2, 0x16, 0x8e, 0x4e, 0x0f, 0x18, 0xb0, 0xe6,
+	  0x41, 0x00, 0xb5, 0x17, 0xed, 0x95, 0x25, 0x7d,
+	  0x73, 0xf0, 0x62, 0x0d, 0xf8, 0x85, 0xc1, 0x3d,
+	  0x2e, 0xcf, 0x79, 0x36, 0x7b, 0x38, 0x4c, 0xee },
+	{ 0x2e, 0x7d, 0xec, 0x24, 0x28, 0x85, 0x3b, 0x2c,
+	  0x71, 0x76, 0x07, 0x45, 0x54, 0x1f, 0x7a, 0xfe,
+	  0x98, 0x25, 0xb5, 0xdd, 0x77, 0xdf, 0x06, 0x51,
+	  0x1d, 0x84, 0x41, 0xa9, 0x4b, 0xac, 0xc9, 0x27 },
+	{ 0xca, 0x9f, 0xfa, 0xc4, 0xc4, 0x3f, 0x0b, 0x48,
+	  0x46, 0x1d, 0xc5, 0xc2, 0x63, 0xbe, 0xa3, 0xf6,
+	  0xf0, 0x06, 0x11, 0xce, 0xac, 0xab, 0xf6, 0xf8,
+	  0x95, 0xba, 0x2b, 0x01, 0x01, 0xdb, 0xb6, 0x8d },
+	{ 0x74, 0x10, 0xd4, 0x2d, 0x8f, 0xd1, 0xd5, 0xe9,
+	  0xd2, 0xf5, 0x81, 0x5c, 0xb9, 0x34, 0x17, 0x99,
+	  0x88, 0x28, 0xef, 0x3c, 0x42, 0x30, 0xbf, 0xbd,
+	  0x41, 0x2d, 0xf0, 0xa4, 0xa7, 0xa2, 0x50, 0x7a },
+	{ 0x50, 0x10, 0xf6, 0x84, 0x51, 0x6d, 0xcc, 0xd0,
+	  0xb6, 0xee, 0x08, 0x52, 0xc2, 0x51, 0x2b, 0x4d,
+	  0xc0, 0x06, 0x6c, 0xf0, 0xd5, 0x6f, 0x35, 0x30,
+	  0x29, 0x78, 0xdb, 0x8a, 0xe3, 0x2c, 0x6a, 0x81 },
+	{ 0xac, 0xaa, 0xb5, 0x85, 0xf7, 0xb7, 0x9b, 0x71,
+	  0x99, 0x35, 0xce, 0xb8, 0x95, 0x23, 0xdd, 0xc5,
+	  0x48, 0x27, 0xf7, 0x5c, 0x56, 0x88, 0x38, 0x56,
+	  0x15, 0x4a, 0x56, 0xcd, 0xcd, 0x5e, 0xe9, 0x88 },
+	{ 0x66, 0x6d, 0xe5, 0xd1, 0x44, 0x0f, 0xee, 0x73,
+	  0x31, 0xaa, 0xf0, 0x12, 0x3a, 0x62, 0xef, 0x2d,
+	  0x8b, 0xa5, 0x74, 0x53, 0xa0, 0x76, 0x96, 0x35,
+	  0xac, 0x6c, 0xd0, 0x1e, 0x63, 0x3f, 0x77, 0x12 },
+	{ 0xa6, 0xf9, 0x86, 0x58, 0xf6, 0xea, 0xba, 0xf9,
+	  0x02, 0xd8, 0xb3, 0x87, 0x1a, 0x4b, 0x10, 0x1d,
+	  0x16, 0x19, 0x6e, 0x8a, 0x4b, 0x24, 0x1e, 0x15,
+	  0x58, 0xfe, 0x29, 0x96, 0x6e, 0x10, 0x3e, 0x8d },
+	{ 0x89, 0x15, 0x46, 0xa8, 0xb2, 0x9f, 0x30, 0x47,
+	  0xdd, 0xcf, 0xe5, 0xb0, 0x0e, 0x45, 0xfd, 0x55,
+	  0x75, 0x63, 0x73, 0x10, 0x5e, 0xa8, 0x63, 0x7d,
+	  0xfc, 0xff, 0x54, 0x7b, 0x6e, 0xa9, 0x53, 0x5f },
+	{ 0x18, 0xdf, 0xbc, 0x1a, 0xc5, 0xd2, 0x5b, 0x07,
+	  0x61, 0x13, 0x7d, 0xbd, 0x22, 0xc1, 0x7c, 0x82,
+	  0x9d, 0x0f, 0x0e, 0xf1, 0xd8, 0x23, 0x44, 0xe9,
+	  0xc8, 0x9c, 0x28, 0x66, 0x94, 0xda, 0x24, 0xe8 },
+	{ 0xb5, 0x4b, 0x9b, 0x67, 0xf8, 0xfe, 0xd5, 0x4b,
+	  0xbf, 0x5a, 0x26, 0x66, 0xdb, 0xdf, 0x4b, 0x23,
+	  0xcf, 0xf1, 0xd1, 0xb6, 0xf4, 0xaf, 0xc9, 0x85,
+	  0xb2, 0xe6, 0xd3, 0x30, 0x5a, 0x9f, 0xf8, 0x0f },
+	{ 0x7d, 0xb4, 0x42, 0xe1, 0x32, 0xba, 0x59, 0xbc,
+	  0x12, 0x89, 0xaa, 0x98, 0xb0, 0xd3, 0xe8, 0x06,
+	  0x00, 0x4f, 0x8e, 0xc1, 0x28, 0x11, 0xaf, 0x1e,
+	  0x2e, 0x33, 0xc6, 0x9b, 0xfd, 0xe7, 0x29, 0xe1 },
+	{ 0x25, 0x0f, 0x37, 0xcd, 0xc1, 0x5e, 0x81, 0x7d,
+	  0x2f, 0x16, 0x0d, 0x99, 0x56, 0xc7, 0x1f, 0xe3,
+	  0xeb, 0x5d, 0xb7, 0x45, 0x56, 0xe4, 0xad, 0xf9,
+	  0xa4, 0xff, 0xaf, 0xba, 0x74, 0x01, 0x03, 0x96 },
+	{ 0x4a, 0xb8, 0xa3, 0xdd, 0x1d, 0xdf, 0x8a, 0xd4,
+	  0x3d, 0xab, 0x13, 0xa2, 0x7f, 0x66, 0xa6, 0x54,
+	  0x4f, 0x29, 0x05, 0x97, 0xfa, 0x96, 0x04, 0x0e,
+	  0x0e, 0x1d, 0xb9, 0x26, 0x3a, 0xa4, 0x79, 0xf8 },
+	{ 0xee, 0x61, 0x72, 0x7a, 0x07, 0x66, 0xdf, 0x93,
+	  0x9c, 0xcd, 0xc8, 0x60, 0x33, 0x40, 0x44, 0xc7,
+	  0x9a, 0x3c, 0x9b, 0x15, 0x62, 0x00, 0xbc, 0x3a,
+	  0xa3, 0x29, 0x73, 0x48, 0x3d, 0x83, 0x41, 0xae },
+	{ 0x3f, 0x68, 0xc7, 0xec, 0x63, 0xac, 0x11, 0xeb,
+	  0xb9, 0x8f, 0x94, 0xb3, 0x39, 0xb0, 0x5c, 0x10,
+	  0x49, 0x84, 0xfd, 0xa5, 0x01, 0x03, 0x06, 0x01,
+	  0x44, 0xe5, 0xa2, 0xbf, 0xcc, 0xc9, 0xda, 0x95 },
+	{ 0x05, 0x6f, 0x29, 0x81, 0x6b, 0x8a, 0xf8, 0xf5,
+	  0x66, 0x82, 0xbc, 0x4d, 0x7c, 0xf0, 0x94, 0x11,
+	  0x1d, 0xa7, 0x73, 0x3e, 0x72, 0x6c, 0xd1, 0x3d,
+	  0x6b, 0x3e, 0x8e, 0xa0, 0x3e, 0x92, 0xa0, 0xd5 },
+	{ 0xf5, 0xec, 0x43, 0xa2, 0x8a, 0xcb, 0xef, 0xf1,
+	  0xf3, 0x31, 0x8a, 0x5b, 0xca, 0xc7, 0xc6, 0x6d,
+	  0xdb, 0x52, 0x30, 0xb7, 0x9d, 0xb2, 0xd1, 0x05,
+	  0xbc, 0xbe, 0x15, 0xf3, 0xc1, 0x14, 0x8d, 0x69 },
+	{ 0x2a, 0x69, 0x60, 0xad, 0x1d, 0x8d, 0xd5, 0x47,
+	  0x55, 0x5c, 0xfb, 0xd5, 0xe4, 0x60, 0x0f, 0x1e,
+	  0xaa, 0x1c, 0x8e, 0xda, 0x34, 0xde, 0x03, 0x74,
+	  0xec, 0x4a, 0x26, 0xea, 0xaa, 0xa3, 0x3b, 0x4e },
+	{ 0xdc, 0xc1, 0xea, 0x7b, 0xaa, 0xb9, 0x33, 0x84,
+	  0xf7, 0x6b, 0x79, 0x68, 0x66, 0x19, 0x97, 0x54,
+	  0x74, 0x2f, 0x7b, 0x96, 0xd6, 0xb4, 0xc1, 0x20,
+	  0x16, 0x5c, 0x04, 0xa6, 0xc4, 0xf5, 0xce, 0x10 },
+	{ 0x13, 0xd5, 0xdf, 0x17, 0x92, 0x21, 0x37, 0x9c,
+	  0x6a, 0x78, 0xc0, 0x7c, 0x79, 0x3f, 0xf5, 0x34,
+	  0x87, 0xca, 0xe6, 0xbf, 0x9f, 0xe8, 0x82, 0x54,
+	  0x1a, 0xb0, 0xe7, 0x35, 0xe3, 0xea, 0xda, 0x3b },
+	{ 0x8c, 0x59, 0xe4, 0x40, 0x76, 0x41, 0xa0, 0x1e,
+	  0x8f, 0xf9, 0x1f, 0x99, 0x80, 0xdc, 0x23, 0x6f,
+	  0x4e, 0xcd, 0x6f, 0xcf, 0x52, 0x58, 0x9a, 0x09,
+	  0x9a, 0x96, 0x16, 0x33, 0x96, 0x77, 0x14, 0xe1 },
+	{ 0x83, 0x3b, 0x1a, 0xc6, 0xa2, 0x51, 0xfd, 0x08,
+	  0xfd, 0x6d, 0x90, 0x8f, 0xea, 0x2a, 0x4e, 0xe1,
+	  0xe0, 0x40, 0xbc, 0xa9, 0x3f, 0xc1, 0xa3, 0x8e,
+	  0xc3, 0x82, 0x0e, 0x0c, 0x10, 0xbd, 0x82, 0xea },
+	{ 0xa2, 0x44, 0xf9, 0x27, 0xf3, 0xb4, 0x0b, 0x8f,
+	  0x6c, 0x39, 0x15, 0x70, 0xc7, 0x65, 0x41, 0x8f,
+	  0x2f, 0x6e, 0x70, 0x8e, 0xac, 0x90, 0x06, 0xc5,
+	  0x1a, 0x7f, 0xef, 0xf4, 0xaf, 0x3b, 0x2b, 0x9e },
+	{ 0x3d, 0x99, 0xed, 0x95, 0x50, 0xcf, 0x11, 0x96,
+	  0xe6, 0xc4, 0xd2, 0x0c, 0x25, 0x96, 0x20, 0xf8,
+	  0x58, 0xc3, 0xd7, 0x03, 0x37, 0x4c, 0x12, 0x8c,
+	  0xe7, 0xb5, 0x90, 0x31, 0x0c, 0x83, 0x04, 0x6d },
+	{ 0x2b, 0x35, 0xc4, 0x7d, 0x7b, 0x87, 0x76, 0x1f,
+	  0x0a, 0xe4, 0x3a, 0xc5, 0x6a, 0xc2, 0x7b, 0x9f,
+	  0x25, 0x83, 0x03, 0x67, 0xb5, 0x95, 0xbe, 0x8c,
+	  0x24, 0x0e, 0x94, 0x60, 0x0c, 0x6e, 0x33, 0x12 },
+	{ 0x5d, 0x11, 0xed, 0x37, 0xd2, 0x4d, 0xc7, 0x67,
+	  0x30, 0x5c, 0xb7, 0xe1, 0x46, 0x7d, 0x87, 0xc0,
+	  0x65, 0xac, 0x4b, 0xc8, 0xa4, 0x26, 0xde, 0x38,
+	  0x99, 0x1f, 0xf5, 0x9a, 0xa8, 0x73, 0x5d, 0x02 },
+	{ 0xb8, 0x36, 0x47, 0x8e, 0x1c, 0xa0, 0x64, 0x0d,
+	  0xce, 0x6f, 0xd9, 0x10, 0xa5, 0x09, 0x62, 0x72,
+	  0xc8, 0x33, 0x09, 0x90, 0xcd, 0x97, 0x86, 0x4a,
+	  0xc2, 0xbf, 0x14, 0xef, 0x6b, 0x23, 0x91, 0x4a },
+	{ 0x91, 0x00, 0xf9, 0x46, 0xd6, 0xcc, 0xde, 0x3a,
+	  0x59, 0x7f, 0x90, 0xd3, 0x9f, 0xc1, 0x21, 0x5b,
+	  0xad, 0xdc, 0x74, 0x13, 0x64, 0x3d, 0x85, 0xc2,
+	  0x1c, 0x3e, 0xee, 0x5d, 0x2d, 0xd3, 0x28, 0x94 },
+	{ 0xda, 0x70, 0xee, 0xdd, 0x23, 0xe6, 0x63, 0xaa,
+	  0x1a, 0x74, 0xb9, 0x76, 0x69, 0x35, 0xb4, 0x79,
+	  0x22, 0x2a, 0x72, 0xaf, 0xba, 0x5c, 0x79, 0x51,
+	  0x58, 0xda, 0xd4, 0x1a, 0x3b, 0xd7, 0x7e, 0x40 },
+	{ 0xf0, 0x67, 0xed, 0x6a, 0x0d, 0xbd, 0x43, 0xaa,
+	  0x0a, 0x92, 0x54, 0xe6, 0x9f, 0xd6, 0x6b, 0xdd,
+	  0x8a, 0xcb, 0x87, 0xde, 0x93, 0x6c, 0x25, 0x8c,
+	  0xfb, 0x02, 0x28, 0x5f, 0x2c, 0x11, 0xfa, 0x79 },
+	{ 0x71, 0x5c, 0x99, 0xc7, 0xd5, 0x75, 0x80, 0xcf,
+	  0x97, 0x53, 0xb4, 0xc1, 0xd7, 0x95, 0xe4, 0x5a,
+	  0x83, 0xfb, 0xb2, 0x28, 0xc0, 0xd3, 0x6f, 0xbe,
+	  0x20, 0xfa, 0xf3, 0x9b, 0xdd, 0x6d, 0x4e, 0x85 },
+	{ 0xe4, 0x57, 0xd6, 0xad, 0x1e, 0x67, 0xcb, 0x9b,
+	  0xbd, 0x17, 0xcb, 0xd6, 0x98, 0xfa, 0x6d, 0x7d,
+	  0xae, 0x0c, 0x9b, 0x7a, 0xd6, 0xcb, 0xd6, 0x53,
+	  0x96, 0x34, 0xe3, 0x2a, 0x71, 0x9c, 0x84, 0x92 },
+	{ 0xec, 0xe3, 0xea, 0x81, 0x03, 0xe0, 0x24, 0x83,
+	  0xc6, 0x4a, 0x70, 0xa4, 0xbd, 0xce, 0xe8, 0xce,
+	  0xb6, 0x27, 0x8f, 0x25, 0x33, 0xf3, 0xf4, 0x8d,
+	  0xbe, 0xed, 0xfb, 0xa9, 0x45, 0x31, 0xd4, 0xae },
+	{ 0x38, 0x8a, 0xa5, 0xd3, 0x66, 0x7a, 0x97, 0xc6,
+	  0x8d, 0x3d, 0x56, 0xf8, 0xf3, 0xee, 0x8d, 0x3d,
+	  0x36, 0x09, 0x1f, 0x17, 0xfe, 0x5d, 0x1b, 0x0d,
+	  0x5d, 0x84, 0xc9, 0x3b, 0x2f, 0xfe, 0x40, 0xbd },
+	{ 0x8b, 0x6b, 0x31, 0xb9, 0xad, 0x7c, 0x3d, 0x5c,
+	  0xd8, 0x4b, 0xf9, 0x89, 0x47, 0xb9, 0xcd, 0xb5,
+	  0x9d, 0xf8, 0xa2, 0x5f, 0xf7, 0x38, 0x10, 0x10,
+	  0x13, 0xbe, 0x4f, 0xd6, 0x5e, 0x1d, 0xd1, 0xa3 },
+	{ 0x06, 0x62, 0x91, 0xf6, 0xbb, 0xd2, 0x5f, 0x3c,
+	  0x85, 0x3d, 0xb7, 0xd8, 0xb9, 0x5c, 0x9a, 0x1c,
+	  0xfb, 0x9b, 0xf1, 0xc1, 0xc9, 0x9f, 0xb9, 0x5a,
+	  0x9b, 0x78, 0x69, 0xd9, 0x0f, 0x1c, 0x29, 0x03 },
+	{ 0xa7, 0x07, 0xef, 0xbc, 0xcd, 0xce, 0xed, 0x42,
+	  0x96, 0x7a, 0x66, 0xf5, 0x53, 0x9b, 0x93, 0xed,
+	  0x75, 0x60, 0xd4, 0x67, 0x30, 0x40, 0x16, 0xc4,
+	  0x78, 0x0d, 0x77, 0x55, 0xa5, 0x65, 0xd4, 0xc4 },
+	{ 0x38, 0xc5, 0x3d, 0xfb, 0x70, 0xbe, 0x7e, 0x79,
+	  0x2b, 0x07, 0xa6, 0xa3, 0x5b, 0x8a, 0x6a, 0x0a,
+	  0xba, 0x02, 0xc5, 0xc5, 0xf3, 0x8b, 0xaf, 0x5c,
+	  0x82, 0x3f, 0xdf, 0xd9, 0xe4, 0x2d, 0x65, 0x7e },
+	{ 0xf2, 0x91, 0x13, 0x86, 0x50, 0x1d, 0x9a, 0xb9,
+	  0xd7, 0x20, 0xcf, 0x8a, 0xd1, 0x05, 0x03, 0xd5,
+	  0x63, 0x4b, 0xf4, 0xb7, 0xd1, 0x2b, 0x56, 0xdf,
+	  0xb7, 0x4f, 0xec, 0xc6, 0xe4, 0x09, 0x3f, 0x68 },
+	{ 0xc6, 0xf2, 0xbd, 0xd5, 0x2b, 0x81, 0xe6, 0xe4,
+	  0xf6, 0x59, 0x5a, 0xbd, 0x4d, 0x7f, 0xb3, 0x1f,
+	  0x65, 0x11, 0x69, 0xd0, 0x0f, 0xf3, 0x26, 0x92,
+	  0x6b, 0x34, 0x94, 0x7b, 0x28, 0xa8, 0x39, 0x59 },
+	{ 0x29, 0x3d, 0x94, 0xb1, 0x8c, 0x98, 0xbb, 0x32,
+	  0x23, 0x36, 0x6b, 0x8c, 0xe7, 0x4c, 0x28, 0xfb,
+	  0xdf, 0x28, 0xe1, 0xf8, 0x4a, 0x33, 0x50, 0xb0,
+	  0xeb, 0x2d, 0x18, 0x04, 0xa5, 0x77, 0x57, 0x9b },
+	{ 0x2c, 0x2f, 0xa5, 0xc0, 0xb5, 0x15, 0x33, 0x16,
+	  0x5b, 0xc3, 0x75, 0xc2, 0x2e, 0x27, 0x81, 0x76,
+	  0x82, 0x70, 0xa3, 0x83, 0x98, 0x5d, 0x13, 0xbd,
+	  0x6b, 0x67, 0xb6, 0xfd, 0x67, 0xf8, 0x89, 0xeb },
+	{ 0xca, 0xa0, 0x9b, 0x82, 0xb7, 0x25, 0x62, 0xe4,
+	  0x3f, 0x4b, 0x22, 0x75, 0xc0, 0x91, 0x91, 0x8e,
+	  0x62, 0x4d, 0x91, 0x16, 0x61, 0xcc, 0x81, 0x1b,
+	  0xb5, 0xfa, 0xec, 0x51, 0xf6, 0x08, 0x8e, 0xf7 },
+	{ 0x24, 0x76, 0x1e, 0x45, 0xe6, 0x74, 0x39, 0x53,
+	  0x79, 0xfb, 0x17, 0x72, 0x9c, 0x78, 0xcb, 0x93,
+	  0x9e, 0x6f, 0x74, 0xc5, 0xdf, 0xfb, 0x9c, 0x96,
+	  0x1f, 0x49, 0x59, 0x82, 0xc3, 0xed, 0x1f, 0xe3 },
+	{ 0x55, 0xb7, 0x0a, 0x82, 0x13, 0x1e, 0xc9, 0x48,
+	  0x88, 0xd7, 0xab, 0x54, 0xa7, 0xc5, 0x15, 0x25,
+	  0x5c, 0x39, 0x38, 0xbb, 0x10, 0xbc, 0x78, 0x4d,
+	  0xc9, 0xb6, 0x7f, 0x07, 0x6e, 0x34, 0x1a, 0x73 },
+	{ 0x6a, 0xb9, 0x05, 0x7b, 0x97, 0x7e, 0xbc, 0x3c,
+	  0xa4, 0xd4, 0xce, 0x74, 0x50, 0x6c, 0x25, 0xcc,
+	  0xcd, 0xc5, 0x66, 0x49, 0x7c, 0x45, 0x0b, 0x54,
+	  0x15, 0xa3, 0x94, 0x86, 0xf8, 0x65, 0x7a, 0x03 },
+	{ 0x24, 0x06, 0x6d, 0xee, 0xe0, 0xec, 0xee, 0x15,
+	  0xa4, 0x5f, 0x0a, 0x32, 0x6d, 0x0f, 0x8d, 0xbc,
+	  0x79, 0x76, 0x1e, 0xbb, 0x93, 0xcf, 0x8c, 0x03,
+	  0x77, 0xaf, 0x44, 0x09, 0x78, 0xfc, 0xf9, 0x94 },
+	{ 0x20, 0x00, 0x0d, 0x3f, 0x66, 0xba, 0x76, 0x86,
+	  0x0d, 0x5a, 0x95, 0x06, 0x88, 0xb9, 0xaa, 0x0d,
+	  0x76, 0xcf, 0xea, 0x59, 0xb0, 0x05, 0xd8, 0x59,
+	  0x91, 0x4b, 0x1a, 0x46, 0x65, 0x3a, 0x93, 0x9b },
+	{ 0xb9, 0x2d, 0xaa, 0x79, 0x60, 0x3e, 0x3b, 0xdb,
+	  0xc3, 0xbf, 0xe0, 0xf4, 0x19, 0xe4, 0x09, 0xb2,
+	  0xea, 0x10, 0xdc, 0x43, 0x5b, 0xee, 0xfe, 0x29,
+	  0x59, 0xda, 0x16, 0x89, 0x5d, 0x5d, 0xca, 0x1c },
+	{ 0xe9, 0x47, 0x94, 0x87, 0x05, 0xb2, 0x06, 0xd5,
+	  0x72, 0xb0, 0xe8, 0xf6, 0x2f, 0x66, 0xa6, 0x55,
+	  0x1c, 0xbd, 0x6b, 0xc3, 0x05, 0xd2, 0x6c, 0xe7,
+	  0x53, 0x9a, 0x12, 0xf9, 0xaa, 0xdf, 0x75, 0x71 },
+	{ 0x3d, 0x67, 0xc1, 0xb3, 0xf9, 0xb2, 0x39, 0x10,
+	  0xe3, 0xd3, 0x5e, 0x6b, 0x0f, 0x2c, 0xcf, 0x44,
+	  0xa0, 0xb5, 0x40, 0xa4, 0x5c, 0x18, 0xba, 0x3c,
+	  0x36, 0x26, 0x4d, 0xd4, 0x8e, 0x96, 0xaf, 0x6a },
+	{ 0xc7, 0x55, 0x8b, 0xab, 0xda, 0x04, 0xbc, 0xcb,
+	  0x76, 0x4d, 0x0b, 0xbf, 0x33, 0x58, 0x42, 0x51,
+	  0x41, 0x90, 0x2d, 0x22, 0x39, 0x1d, 0x9f, 0x8c,
+	  0x59, 0x15, 0x9f, 0xec, 0x9e, 0x49, 0xb1, 0x51 },
+	{ 0x0b, 0x73, 0x2b, 0xb0, 0x35, 0x67, 0x5a, 0x50,
+	  0xff, 0x58, 0xf2, 0xc2, 0x42, 0xe4, 0x71, 0x0a,
+	  0xec, 0xe6, 0x46, 0x70, 0x07, 0x9c, 0x13, 0x04,
+	  0x4c, 0x79, 0xc9, 0xb7, 0x49, 0x1f, 0x70, 0x00 },
+	{ 0xd1, 0x20, 0xb5, 0xef, 0x6d, 0x57, 0xeb, 0xf0,
+	  0x6e, 0xaf, 0x96, 0xbc, 0x93, 0x3c, 0x96, 0x7b,
+	  0x16, 0xcb, 0xe6, 0xe2, 0xbf, 0x00, 0x74, 0x1c,
+	  0x30, 0xaa, 0x1c, 0x54, 0xba, 0x64, 0x80, 0x1f },
+	{ 0x58, 0xd2, 0x12, 0xad, 0x6f, 0x58, 0xae, 0xf0,
+	  0xf8, 0x01, 0x16, 0xb4, 0x41, 0xe5, 0x7f, 0x61,
+	  0x95, 0xbf, 0xef, 0x26, 0xb6, 0x14, 0x63, 0xed,
+	  0xec, 0x11, 0x83, 0xcd, 0xb0, 0x4f, 0xe7, 0x6d },
+	{ 0xb8, 0x83, 0x6f, 0x51, 0xd1, 0xe2, 0x9b, 0xdf,
+	  0xdb, 0xa3, 0x25, 0x56, 0x53, 0x60, 0x26, 0x8b,
+	  0x8f, 0xad, 0x62, 0x74, 0x73, 0xed, 0xec, 0xef,
+	  0x7e, 0xae, 0xfe, 0xe8, 0x37, 0xc7, 0x40, 0x03 },
+	{ 0xc5, 0x47, 0xa3, 0xc1, 0x24, 0xae, 0x56, 0x85,
+	  0xff, 0xa7, 0xb8, 0xed, 0xaf, 0x96, 0xec, 0x86,
+	  0xf8, 0xb2, 0xd0, 0xd5, 0x0c, 0xee, 0x8b, 0xe3,
+	  0xb1, 0xf0, 0xc7, 0x67, 0x63, 0x06, 0x9d, 0x9c },
+	{ 0x5d, 0x16, 0x8b, 0x76, 0x9a, 0x2f, 0x67, 0x85,
+	  0x3d, 0x62, 0x95, 0xf7, 0x56, 0x8b, 0xe4, 0x0b,
+	  0xb7, 0xa1, 0x6b, 0x8d, 0x65, 0xba, 0x87, 0x63,
+	  0x5d, 0x19, 0x78, 0xd2, 0xab, 0x11, 0xba, 0x2a },
+	{ 0xa2, 0xf6, 0x75, 0xdc, 0x73, 0x02, 0x63, 0x8c,
+	  0xb6, 0x02, 0x01, 0x06, 0x4c, 0xa5, 0x50, 0x77,
+	  0x71, 0x4d, 0x71, 0xfe, 0x09, 0x6a, 0x31, 0x5f,
+	  0x2f, 0xe7, 0x40, 0x12, 0x77, 0xca, 0xa5, 0xaf },
+	{ 0xc8, 0xaa, 0xb5, 0xcd, 0x01, 0x60, 0xae, 0x78,
+	  0xcd, 0x2e, 0x8a, 0xc5, 0xfb, 0x0e, 0x09, 0x3c,
+	  0xdb, 0x5c, 0x4b, 0x60, 0x52, 0xa0, 0xa9, 0x7b,
+	  0xb0, 0x42, 0x16, 0x82, 0x6f, 0xa7, 0xa4, 0x37 },
+	{ 0xff, 0x68, 0xca, 0x40, 0x35, 0xbf, 0xeb, 0x43,
+	  0xfb, 0xf1, 0x45, 0xfd, 0xdd, 0x5e, 0x43, 0xf1,
+	  0xce, 0xa5, 0x4f, 0x11, 0xf7, 0xbe, 0xe1, 0x30,
+	  0x58, 0xf0, 0x27, 0x32, 0x9a, 0x4a, 0x5f, 0xa4 },
+	{ 0x1d, 0x4e, 0x54, 0x87, 0xae, 0x3c, 0x74, 0x0f,
+	  0x2b, 0xa6, 0xe5, 0x41, 0xac, 0x91, 0xbc, 0x2b,
+	  0xfc, 0xd2, 0x99, 0x9c, 0x51, 0x8d, 0x80, 0x7b,
+	  0x42, 0x67, 0x48, 0x80, 0x3a, 0x35, 0x0f, 0xd4 },
+	{ 0x6d, 0x24, 0x4e, 0x1a, 0x06, 0xce, 0x4e, 0xf5,
+	  0x78, 0xdd, 0x0f, 0x63, 0xaf, 0xf0, 0x93, 0x67,
+	  0x06, 0x73, 0x51, 0x19, 0xca, 0x9c, 0x8d, 0x22,
+	  0xd8, 0x6c, 0x80, 0x14, 0x14, 0xab, 0x97, 0x41 },
+	{ 0xde, 0xcf, 0x73, 0x29, 0xdb, 0xcc, 0x82, 0x7b,
+	  0x8f, 0xc5, 0x24, 0xc9, 0x43, 0x1e, 0x89, 0x98,
+	  0x02, 0x9e, 0xce, 0x12, 0xce, 0x93, 0xb7, 0xb2,
+	  0xf3, 0xe7, 0x69, 0xa9, 0x41, 0xfb, 0x8c, 0xea },
+	{ 0x2f, 0xaf, 0xcc, 0x0f, 0x2e, 0x63, 0xcb, 0xd0,
+	  0x77, 0x55, 0xbe, 0x7b, 0x75, 0xec, 0xea, 0x0a,
+	  0xdf, 0xf9, 0xaa, 0x5e, 0xde, 0x2a, 0x52, 0xfd,
+	  0xab, 0x4d, 0xfd, 0x03, 0x74, 0xcd, 0x48, 0x3f },
+	{ 0xaa, 0x85, 0x01, 0x0d, 0xd4, 0x6a, 0x54, 0x6b,
+	  0x53, 0x5e, 0xf4, 0xcf, 0x5f, 0x07, 0xd6, 0x51,
+	  0x61, 0xe8, 0x98, 0x28, 0xf3, 0xa7, 0x7d, 0xb7,
+	  0xb9, 0xb5, 0x6f, 0x0d, 0xf5, 0x9a, 0xae, 0x45 },
+	{ 0x07, 0xe8, 0xe1, 0xee, 0x73, 0x2c, 0xb0, 0xd3,
+	  0x56, 0xc9, 0xc0, 0xd1, 0x06, 0x9c, 0x89, 0xd1,
+	  0x7a, 0xdf, 0x6a, 0x9a, 0x33, 0x4f, 0x74, 0x5e,
+	  0xc7, 0x86, 0x73, 0x32, 0x54, 0x8c, 0xa8, 0xe9 },
+	{ 0x0e, 0x01, 0xe8, 0x1c, 0xad, 0xa8, 0x16, 0x2b,
+	  0xfd, 0x5f, 0x8a, 0x8c, 0x81, 0x8a, 0x6c, 0x69,
+	  0xfe, 0xdf, 0x02, 0xce, 0xb5, 0x20, 0x85, 0x23,
+	  0xcb, 0xe5, 0x31, 0x3b, 0x89, 0xca, 0x10, 0x53 },
+	{ 0x6b, 0xb6, 0xc6, 0x47, 0x26, 0x55, 0x08, 0x43,
+	  0x99, 0x85, 0x2e, 0x00, 0x24, 0x9f, 0x8c, 0xb2,
+	  0x47, 0x89, 0x6d, 0x39, 0x2b, 0x02, 0xd7, 0x3b,
+	  0x7f, 0x0d, 0xd8, 0x18, 0xe1, 0xe2, 0x9b, 0x07 },
+	{ 0x42, 0xd4, 0x63, 0x6e, 0x20, 0x60, 0xf0, 0x8f,
+	  0x41, 0xc8, 0x82, 0xe7, 0x6b, 0x39, 0x6b, 0x11,
+	  0x2e, 0xf6, 0x27, 0xcc, 0x24, 0xc4, 0x3d, 0xd5,
+	  0xf8, 0x3a, 0x1d, 0x1a, 0x7e, 0xad, 0x71, 0x1a },
+	{ 0x48, 0x58, 0xc9, 0xa1, 0x88, 0xb0, 0x23, 0x4f,
+	  0xb9, 0xa8, 0xd4, 0x7d, 0x0b, 0x41, 0x33, 0x65,
+	  0x0a, 0x03, 0x0b, 0xd0, 0x61, 0x1b, 0x87, 0xc3,
+	  0x89, 0x2e, 0x94, 0x95, 0x1f, 0x8d, 0xf8, 0x52 },
+	{ 0x3f, 0xab, 0x3e, 0x36, 0x98, 0x8d, 0x44, 0x5a,
+	  0x51, 0xc8, 0x78, 0x3e, 0x53, 0x1b, 0xe3, 0xa0,
+	  0x2b, 0xe4, 0x0c, 0xd0, 0x47, 0x96, 0xcf, 0xb6,
+	  0x1d, 0x40, 0x34, 0x74, 0x42, 0xd3, 0xf7, 0x94 },
+	{ 0xeb, 0xab, 0xc4, 0x96, 0x36, 0xbd, 0x43, 0x3d,
+	  0x2e, 0xc8, 0xf0, 0xe5, 0x18, 0x73, 0x2e, 0xf8,
+	  0xfa, 0x21, 0xd4, 0xd0, 0x71, 0xcc, 0x3b, 0xc4,
+	  0x6c, 0xd7, 0x9f, 0xa3, 0x8a, 0x28, 0xb8, 0x10 },
+	{ 0xa1, 0xd0, 0x34, 0x35, 0x23, 0xb8, 0x93, 0xfc,
+	  0xa8, 0x4f, 0x47, 0xfe, 0xb4, 0xa6, 0x4d, 0x35,
+	  0x0a, 0x17, 0xd8, 0xee, 0xf5, 0x49, 0x7e, 0xce,
+	  0x69, 0x7d, 0x02, 0xd7, 0x91, 0x78, 0xb5, 0x91 },
+	{ 0x26, 0x2e, 0xbf, 0xd9, 0x13, 0x0b, 0x7d, 0x28,
+	  0x76, 0x0d, 0x08, 0xef, 0x8b, 0xfd, 0x3b, 0x86,
+	  0xcd, 0xd3, 0xb2, 0x11, 0x3d, 0x2c, 0xae, 0xf7,
+	  0xea, 0x95, 0x1a, 0x30, 0x3d, 0xfa, 0x38, 0x46 },
+	{ 0xf7, 0x61, 0x58, 0xed, 0xd5, 0x0a, 0x15, 0x4f,
+	  0xa7, 0x82, 0x03, 0xed, 0x23, 0x62, 0x93, 0x2f,
+	  0xcb, 0x82, 0x53, 0xaa, 0xe3, 0x78, 0x90, 0x3e,
+	  0xde, 0xd1, 0xe0, 0x3f, 0x70, 0x21, 0xa2, 0x57 },
+	{ 0x26, 0x17, 0x8e, 0x95, 0x0a, 0xc7, 0x22, 0xf6,
+	  0x7a, 0xe5, 0x6e, 0x57, 0x1b, 0x28, 0x4c, 0x02,
+	  0x07, 0x68, 0x4a, 0x63, 0x34, 0xa1, 0x77, 0x48,
+	  0xa9, 0x4d, 0x26, 0x0b, 0xc5, 0xf5, 0x52, 0x74 },
+	{ 0xc3, 0x78, 0xd1, 0xe4, 0x93, 0xb4, 0x0e, 0xf1,
+	  0x1f, 0xe6, 0xa1, 0x5d, 0x9c, 0x27, 0x37, 0xa3,
+	  0x78, 0x09, 0x63, 0x4c, 0x5a, 0xba, 0xd5, 0xb3,
+	  0x3d, 0x7e, 0x39, 0x3b, 0x4a, 0xe0, 0x5d, 0x03 },
+	{ 0x98, 0x4b, 0xd8, 0x37, 0x91, 0x01, 0xbe, 0x8f,
+	  0xd8, 0x06, 0x12, 0xd8, 0xea, 0x29, 0x59, 0xa7,
+	  0x86, 0x5e, 0xc9, 0x71, 0x85, 0x23, 0x55, 0x01,
+	  0x07, 0xae, 0x39, 0x38, 0xdf, 0x32, 0x01, 0x1b },
+	{ 0xc6, 0xf2, 0x5a, 0x81, 0x2a, 0x14, 0x48, 0x58,
+	  0xac, 0x5c, 0xed, 0x37, 0xa9, 0x3a, 0x9f, 0x47,
+	  0x59, 0xba, 0x0b, 0x1c, 0x0f, 0xdc, 0x43, 0x1d,
+	  0xce, 0x35, 0xf9, 0xec, 0x1f, 0x1f, 0x4a, 0x99 },
+	{ 0x92, 0x4c, 0x75, 0xc9, 0x44, 0x24, 0xff, 0x75,
+	  0xe7, 0x4b, 0x8b, 0x4e, 0x94, 0x35, 0x89, 0x58,
+	  0xb0, 0x27, 0xb1, 0x71, 0xdf, 0x5e, 0x57, 0x89,
+	  0x9a, 0xd0, 0xd4, 0xda, 0xc3, 0x73, 0x53, 0xb6 },
+	{ 0x0a, 0xf3, 0x58, 0x92, 0xa6, 0x3f, 0x45, 0x93,
+	  0x1f, 0x68, 0x46, 0xed, 0x19, 0x03, 0x61, 0xcd,
+	  0x07, 0x30, 0x89, 0xe0, 0x77, 0x16, 0x57, 0x14,
+	  0xb5, 0x0b, 0x81, 0xa2, 0xe3, 0xdd, 0x9b, 0xa1 },
+	{ 0xcc, 0x80, 0xce, 0xfb, 0x26, 0xc3, 0xb2, 0xb0,
+	  0xda, 0xef, 0x23, 0x3e, 0x60, 0x6d, 0x5f, 0xfc,
+	  0x80, 0xfa, 0x17, 0x42, 0x7d, 0x18, 0xe3, 0x04,
+	  0x89, 0x67, 0x3e, 0x06, 0xef, 0x4b, 0x87, 0xf7 },
+	{ 0xc2, 0xf8, 0xc8, 0x11, 0x74, 0x47, 0xf3, 0x97,
+	  0x8b, 0x08, 0x18, 0xdc, 0xf6, 0xf7, 0x01, 0x16,
+	  0xac, 0x56, 0xfd, 0x18, 0x4d, 0xd1, 0x27, 0x84,
+	  0x94, 0xe1, 0x03, 0xfc, 0x6d, 0x74, 0xa8, 0x87 },
+	{ 0xbd, 0xec, 0xf6, 0xbf, 0xc1, 0xba, 0x0d, 0xf6,
+	  0xe8, 0x62, 0xc8, 0x31, 0x99, 0x22, 0x07, 0x79,
+	  0x6a, 0xcc, 0x79, 0x79, 0x68, 0x35, 0x88, 0x28,
+	  0xc0, 0x6e, 0x7a, 0x51, 0xe0, 0x90, 0x09, 0x8f },
+	{ 0x24, 0xd1, 0xa2, 0x6e, 0x3d, 0xab, 0x02, 0xfe,
+	  0x45, 0x72, 0xd2, 0xaa, 0x7d, 0xbd, 0x3e, 0xc3,
+	  0x0f, 0x06, 0x93, 0xdb, 0x26, 0xf2, 0x73, 0xd0,
+	  0xab, 0x2c, 0xb0, 0xc1, 0x3b, 0x5e, 0x64, 0x51 },
+	{ 0xec, 0x56, 0xf5, 0x8b, 0x09, 0x29, 0x9a, 0x30,
+	  0x0b, 0x14, 0x05, 0x65, 0xd7, 0xd3, 0xe6, 0x87,
+	  0x82, 0xb6, 0xe2, 0xfb, 0xeb, 0x4b, 0x7e, 0xa9,
+	  0x7a, 0xc0, 0x57, 0x98, 0x90, 0x61, 0xdd, 0x3f },
+	{ 0x11, 0xa4, 0x37, 0xc1, 0xab, 0xa3, 0xc1, 0x19,
+	  0xdd, 0xfa, 0xb3, 0x1b, 0x3e, 0x8c, 0x84, 0x1d,
+	  0xee, 0xeb, 0x91, 0x3e, 0xf5, 0x7f, 0x7e, 0x48,
+	  0xf2, 0xc9, 0xcf, 0x5a, 0x28, 0xfa, 0x42, 0xbc },
+	{ 0x53, 0xc7, 0xe6, 0x11, 0x4b, 0x85, 0x0a, 0x2c,
+	  0xb4, 0x96, 0xc9, 0xb3, 0xc6, 0x9a, 0x62, 0x3e,
+	  0xae, 0xa2, 0xcb, 0x1d, 0x33, 0xdd, 0x81, 0x7e,
+	  0x47, 0x65, 0xed, 0xaa, 0x68, 0x23, 0xc2, 0x28 },
+	{ 0x15, 0x4c, 0x3e, 0x96, 0xfe, 0xe5, 0xdb, 0x14,
+	  0xf8, 0x77, 0x3e, 0x18, 0xaf, 0x14, 0x85, 0x79,
+	  0x13, 0x50, 0x9d, 0xa9, 0x99, 0xb4, 0x6c, 0xdd,
+	  0x3d, 0x4c, 0x16, 0x97, 0x60, 0xc8, 0x3a, 0xd2 },
+	{ 0x40, 0xb9, 0x91, 0x6f, 0x09, 0x3e, 0x02, 0x7a,
+	  0x87, 0x86, 0x64, 0x18, 0x18, 0x92, 0x06, 0x20,
+	  0x47, 0x2f, 0xbc, 0xf6, 0x8f, 0x70, 0x1d, 0x1b,
+	  0x68, 0x06, 0x32, 0xe6, 0x99, 0x6b, 0xde, 0xd3 },
+	{ 0x24, 0xc4, 0xcb, 0xba, 0x07, 0x11, 0x98, 0x31,
+	  0xa7, 0x26, 0xb0, 0x53, 0x05, 0xd9, 0x6d, 0xa0,
+	  0x2f, 0xf8, 0xb1, 0x48, 0xf0, 0xda, 0x44, 0x0f,
+	  0xe2, 0x33, 0xbc, 0xaa, 0x32, 0xc7, 0x2f, 0x6f },
+	{ 0x5d, 0x20, 0x15, 0x10, 0x25, 0x00, 0x20, 0xb7,
+	  0x83, 0x68, 0x96, 0x88, 0xab, 0xbf, 0x8e, 0xcf,
+	  0x25, 0x94, 0xa9, 0x6a, 0x08, 0xf2, 0xbf, 0xec,
+	  0x6c, 0xe0, 0x57, 0x44, 0x65, 0xdd, 0xed, 0x71 },
+	{ 0x04, 0x3b, 0x97, 0xe3, 0x36, 0xee, 0x6f, 0xdb,
+	  0xbe, 0x2b, 0x50, 0xf2, 0x2a, 0xf8, 0x32, 0x75,
+	  0xa4, 0x08, 0x48, 0x05, 0xd2, 0xd5, 0x64, 0x59,
+	  0x62, 0x45, 0x4b, 0x6c, 0x9b, 0x80, 0x53, 0xa0 },
+	{ 0x56, 0x48, 0x35, 0xcb, 0xae, 0xa7, 0x74, 0x94,
+	  0x85, 0x68, 0xbe, 0x36, 0xcf, 0x52, 0xfc, 0xdd,
+	  0x83, 0x93, 0x4e, 0xb0, 0xa2, 0x75, 0x12, 0xdb,
+	  0xe3, 0xe2, 0xdb, 0x47, 0xb9, 0xe6, 0x63, 0x5a },
+	{ 0xf2, 0x1c, 0x33, 0xf4, 0x7b, 0xde, 0x40, 0xa2,
+	  0xa1, 0x01, 0xc9, 0xcd, 0xe8, 0x02, 0x7a, 0xaf,
+	  0x61, 0xa3, 0x13, 0x7d, 0xe2, 0x42, 0x2b, 0x30,
+	  0x03, 0x5a, 0x04, 0xc2, 0x70, 0x89, 0x41, 0x83 },
+	{ 0x9d, 0xb0, 0xef, 0x74, 0xe6, 0x6c, 0xbb, 0x84,
+	  0x2e, 0xb0, 0xe0, 0x73, 0x43, 0xa0, 0x3c, 0x5c,
+	  0x56, 0x7e, 0x37, 0x2b, 0x3f, 0x23, 0xb9, 0x43,
+	  0xc7, 0x88, 0xa4, 0xf2, 0x50, 0xf6, 0x78, 0x91 },
+	{ 0xab, 0x8d, 0x08, 0x65, 0x5f, 0xf1, 0xd3, 0xfe,
+	  0x87, 0x58, 0xd5, 0x62, 0x23, 0x5f, 0xd2, 0x3e,
+	  0x7c, 0xf9, 0xdc, 0xaa, 0xd6, 0x58, 0x87, 0x2a,
+	  0x49, 0xe5, 0xd3, 0x18, 0x3b, 0x6c, 0xce, 0xbd },
+	{ 0x6f, 0x27, 0xf7, 0x7e, 0x7b, 0xcf, 0x46, 0xa1,
+	  0xe9, 0x63, 0xad, 0xe0, 0x30, 0x97, 0x33, 0x54,
+	  0x30, 0x31, 0xdc, 0xcd, 0xd4, 0x7c, 0xaa, 0xc1,
+	  0x74, 0xd7, 0xd2, 0x7c, 0xe8, 0x07, 0x7e, 0x8b },
+	{ 0xe3, 0xcd, 0x54, 0xda, 0x7e, 0x44, 0x4c, 0xaa,
+	  0x62, 0x07, 0x56, 0x95, 0x25, 0xa6, 0x70, 0xeb,
+	  0xae, 0x12, 0x78, 0xde, 0x4e, 0x3f, 0xe2, 0x68,
+	  0x4b, 0x3e, 0x33, 0xf5, 0xef, 0x90, 0xcc, 0x1b },
+	{ 0xb2, 0xc3, 0xe3, 0x3a, 0x51, 0xd2, 0x2c, 0x4c,
+	  0x08, 0xfc, 0x09, 0x89, 0xc8, 0x73, 0xc9, 0xcc,
+	  0x41, 0x50, 0x57, 0x9b, 0x1e, 0x61, 0x63, 0xfa,
+	  0x69, 0x4a, 0xd5, 0x1d, 0x53, 0xd7, 0x12, 0xdc },
+	{ 0xbe, 0x7f, 0xda, 0x98, 0x3e, 0x13, 0x18, 0x9b,
+	  0x4c, 0x77, 0xe0, 0xa8, 0x09, 0x20, 0xb6, 0xe0,
+	  0xe0, 0xea, 0x80, 0xc3, 0xb8, 0x4d, 0xbe, 0x7e,
+	  0x71, 0x17, 0xd2, 0x53, 0xf4, 0x81, 0x12, 0xf4 },
+	{ 0xb6, 0x00, 0x8c, 0x28, 0xfa, 0xe0, 0x8a, 0xa4,
+	  0x27, 0xe5, 0xbd, 0x3a, 0xad, 0x36, 0xf1, 0x00,
+	  0x21, 0xf1, 0x6c, 0x77, 0xcf, 0xea, 0xbe, 0xd0,
+	  0x7f, 0x97, 0xcc, 0x7d, 0xc1, 0xf1, 0x28, 0x4a },
+	{ 0x6e, 0x4e, 0x67, 0x60, 0xc5, 0x38, 0xf2, 0xe9,
+	  0x7b, 0x3a, 0xdb, 0xfb, 0xbc, 0xde, 0x57, 0xf8,
+	  0x96, 0x6b, 0x7e, 0xa8, 0xfc, 0xb5, 0xbf, 0x7e,
+	  0xfe, 0xc9, 0x13, 0xfd, 0x2a, 0x2b, 0x0c, 0x55 },
+	{ 0x4a, 0xe5, 0x1f, 0xd1, 0x83, 0x4a, 0xa5, 0xbd,
+	  0x9a, 0x6f, 0x7e, 0xc3, 0x9f, 0xc6, 0x63, 0x33,
+	  0x8d, 0xc5, 0xd2, 0xe2, 0x07, 0x61, 0x56, 0x6d,
+	  0x90, 0xcc, 0x68, 0xb1, 0xcb, 0x87, 0x5e, 0xd8 },
+	{ 0xb6, 0x73, 0xaa, 0xd7, 0x5a, 0xb1, 0xfd, 0xb5,
+	  0x40, 0x1a, 0xbf, 0xa1, 0xbf, 0x89, 0xf3, 0xad,
+	  0xd2, 0xeb, 0xc4, 0x68, 0xdf, 0x36, 0x24, 0xa4,
+	  0x78, 0xf4, 0xfe, 0x85, 0x9d, 0x8d, 0x55, 0xe2 },
+	{ 0x13, 0xc9, 0x47, 0x1a, 0x98, 0x55, 0x91, 0x35,
+	  0x39, 0x83, 0x66, 0x60, 0x39, 0x8d, 0xa0, 0xf3,
+	  0xf9, 0x9a, 0xda, 0x08, 0x47, 0x9c, 0x69, 0xd1,
+	  0xb7, 0xfc, 0xaa, 0x34, 0x61, 0xdd, 0x7e, 0x59 },
+	{ 0x2c, 0x11, 0xf4, 0xa7, 0xf9, 0x9a, 0x1d, 0x23,
+	  0xa5, 0x8b, 0xb6, 0x36, 0x35, 0x0f, 0xe8, 0x49,
+	  0xf2, 0x9c, 0xba, 0xc1, 0xb2, 0xa1, 0x11, 0x2d,
+	  0x9f, 0x1e, 0xd5, 0xbc, 0x5b, 0x31, 0x3c, 0xcd },
+	{ 0xc7, 0xd3, 0xc0, 0x70, 0x6b, 0x11, 0xae, 0x74,
+	  0x1c, 0x05, 0xa1, 0xef, 0x15, 0x0d, 0xd6, 0x5b,
+	  0x54, 0x94, 0xd6, 0xd5, 0x4c, 0x9a, 0x86, 0xe2,
+	  0x61, 0x78, 0x54, 0xe6, 0xae, 0xee, 0xbb, 0xd9 },
+	{ 0x19, 0x4e, 0x10, 0xc9, 0x38, 0x93, 0xaf, 0xa0,
+	  0x64, 0xc3, 0xac, 0x04, 0xc0, 0xdd, 0x80, 0x8d,
+	  0x79, 0x1c, 0x3d, 0x4b, 0x75, 0x56, 0xe8, 0x9d,
+	  0x8d, 0x9c, 0xb2, 0x25, 0xc4, 0xb3, 0x33, 0x39 },
+	{ 0x6f, 0xc4, 0x98, 0x8b, 0x8f, 0x78, 0x54, 0x6b,
+	  0x16, 0x88, 0x99, 0x18, 0x45, 0x90, 0x8f, 0x13,
+	  0x4b, 0x6a, 0x48, 0x2e, 0x69, 0x94, 0xb3, 0xd4,
+	  0x83, 0x17, 0xbf, 0x08, 0xdb, 0x29, 0x21, 0x85 },
+	{ 0x56, 0x65, 0xbe, 0xb8, 0xb0, 0x95, 0x55, 0x25,
+	  0x81, 0x3b, 0x59, 0x81, 0xcd, 0x14, 0x2e, 0xd4,
+	  0xd0, 0x3f, 0xba, 0x38, 0xa6, 0xf3, 0xe5, 0xad,
+	  0x26, 0x8e, 0x0c, 0xc2, 0x70, 0xd1, 0xcd, 0x11 },
+	{ 0xb8, 0x83, 0xd6, 0x8f, 0x5f, 0xe5, 0x19, 0x36,
+	  0x43, 0x1b, 0xa4, 0x25, 0x67, 0x38, 0x05, 0x3b,
+	  0x1d, 0x04, 0x26, 0xd4, 0xcb, 0x64, 0xb1, 0x6e,
+	  0x83, 0xba, 0xdc, 0x5e, 0x9f, 0xbe, 0x3b, 0x81 },
+	{ 0x53, 0xe7, 0xb2, 0x7e, 0xa5, 0x9c, 0x2f, 0x6d,
+	  0xbb, 0x50, 0x76, 0x9e, 0x43, 0x55, 0x4d, 0xf3,
+	  0x5a, 0xf8, 0x9f, 0x48, 0x22, 0xd0, 0x46, 0x6b,
+	  0x00, 0x7d, 0xd6, 0xf6, 0xde, 0xaf, 0xff, 0x02 },
+	{ 0x1f, 0x1a, 0x02, 0x29, 0xd4, 0x64, 0x0f, 0x01,
+	  0x90, 0x15, 0x88, 0xd9, 0xde, 0xc2, 0x2d, 0x13,
+	  0xfc, 0x3e, 0xb3, 0x4a, 0x61, 0xb3, 0x29, 0x38,
+	  0xef, 0xbf, 0x53, 0x34, 0xb2, 0x80, 0x0a, 0xfa },
+	{ 0xc2, 0xb4, 0x05, 0xaf, 0xa0, 0xfa, 0x66, 0x68,
+	  0x85, 0x2a, 0xee, 0x4d, 0x88, 0x04, 0x08, 0x53,
+	  0xfa, 0xb8, 0x00, 0xe7, 0x2b, 0x57, 0x58, 0x14,
+	  0x18, 0xe5, 0x50, 0x6f, 0x21, 0x4c, 0x7d, 0x1f },
+	{ 0xc0, 0x8a, 0xa1, 0xc2, 0x86, 0xd7, 0x09, 0xfd,
+	  0xc7, 0x47, 0x37, 0x44, 0x97, 0x71, 0x88, 0xc8,
+	  0x95, 0xba, 0x01, 0x10, 0x14, 0x24, 0x7e, 0x4e,
+	  0xfa, 0x8d, 0x07, 0xe7, 0x8f, 0xec, 0x69, 0x5c },
+	{ 0xf0, 0x3f, 0x57, 0x89, 0xd3, 0x33, 0x6b, 0x80,
+	  0xd0, 0x02, 0xd5, 0x9f, 0xdf, 0x91, 0x8b, 0xdb,
+	  0x77, 0x5b, 0x00, 0x95, 0x6e, 0xd5, 0x52, 0x8e,
+	  0x86, 0xaa, 0x99, 0x4a, 0xcb, 0x38, 0xfe, 0x2d }
+};
+
+static const u8 blake2s_keyed_testvecs[][BLAKE2S_OUTBYTES] __initconst = {
+	{ 0x48, 0xa8, 0x99, 0x7d, 0xa4, 0x07, 0x87, 0x6b,
+	  0x3d, 0x79, 0xc0, 0xd9, 0x23, 0x25, 0xad, 0x3b,
+	  0x89, 0xcb, 0xb7, 0x54, 0xd8, 0x6a, 0xb7, 0x1a,
+	  0xee, 0x04, 0x7a, 0xd3, 0x45, 0xfd, 0x2c, 0x49 },
+	{ 0x40, 0xd1, 0x5f, 0xee, 0x7c, 0x32, 0x88, 0x30,
+	  0x16, 0x6a, 0xc3, 0xf9, 0x18, 0x65, 0x0f, 0x80,
+	  0x7e, 0x7e, 0x01, 0xe1, 0x77, 0x25, 0x8c, 0xdc,
+	  0x0a, 0x39, 0xb1, 0x1f, 0x59, 0x80, 0x66, 0xf1 },
+	{ 0x6b, 0xb7, 0x13, 0x00, 0x64, 0x4c, 0xd3, 0x99,
+	  0x1b, 0x26, 0xcc, 0xd4, 0xd2, 0x74, 0xac, 0xd1,
+	  0xad, 0xea, 0xb8, 0xb1, 0xd7, 0x91, 0x45, 0x46,
+	  0xc1, 0x19, 0x8b, 0xbe, 0x9f, 0xc9, 0xd8, 0x03 },
+	{ 0x1d, 0x22, 0x0d, 0xbe, 0x2e, 0xe1, 0x34, 0x66,
+	  0x1f, 0xdf, 0x6d, 0x9e, 0x74, 0xb4, 0x17, 0x04,
+	  0x71, 0x05, 0x56, 0xf2, 0xf6, 0xe5, 0xa0, 0x91,
+	  0xb2, 0x27, 0x69, 0x74, 0x45, 0xdb, 0xea, 0x6b },
+	{ 0xf6, 0xc3, 0xfb, 0xad, 0xb4, 0xcc, 0x68, 0x7a,
+	  0x00, 0x64, 0xa5, 0xbe, 0x6e, 0x79, 0x1b, 0xec,
+	  0x63, 0xb8, 0x68, 0xad, 0x62, 0xfb, 0xa6, 0x1b,
+	  0x37, 0x57, 0xef, 0x9c, 0xa5, 0x2e, 0x05, 0xb2 },
+	{ 0x49, 0xc1, 0xf2, 0x11, 0x88, 0xdf, 0xd7, 0x69,
+	  0xae, 0xa0, 0xe9, 0x11, 0xdd, 0x6b, 0x41, 0xf1,
+	  0x4d, 0xab, 0x10, 0x9d, 0x2b, 0x85, 0x97, 0x7a,
+	  0xa3, 0x08, 0x8b, 0x5c, 0x70, 0x7e, 0x85, 0x98 },
+	{ 0xfd, 0xd8, 0x99, 0x3d, 0xcd, 0x43, 0xf6, 0x96,
+	  0xd4, 0x4f, 0x3c, 0xea, 0x0f, 0xf3, 0x53, 0x45,
+	  0x23, 0x4e, 0xc8, 0xee, 0x08, 0x3e, 0xb3, 0xca,
+	  0xda, 0x01, 0x7c, 0x7f, 0x78, 0xc1, 0x71, 0x43 },
+	{ 0xe6, 0xc8, 0x12, 0x56, 0x37, 0x43, 0x8d, 0x09,
+	  0x05, 0xb7, 0x49, 0xf4, 0x65, 0x60, 0xac, 0x89,
+	  0xfd, 0x47, 0x1c, 0xf8, 0x69, 0x2e, 0x28, 0xfa,
+	  0xb9, 0x82, 0xf7, 0x3f, 0x01, 0x9b, 0x83, 0xa9 },
+	{ 0x19, 0xfc, 0x8c, 0xa6, 0x97, 0x9d, 0x60, 0xe6,
+	  0xed, 0xd3, 0xb4, 0x54, 0x1e, 0x2f, 0x96, 0x7c,
+	  0xed, 0x74, 0x0d, 0xf6, 0xec, 0x1e, 0xae, 0xbb,
+	  0xfe, 0x81, 0x38, 0x32, 0xe9, 0x6b, 0x29, 0x74 },
+	{ 0xa6, 0xad, 0x77, 0x7c, 0xe8, 0x81, 0xb5, 0x2b,
+	  0xb5, 0xa4, 0x42, 0x1a, 0xb6, 0xcd, 0xd2, 0xdf,
+	  0xba, 0x13, 0xe9, 0x63, 0x65, 0x2d, 0x4d, 0x6d,
+	  0x12, 0x2a, 0xee, 0x46, 0x54, 0x8c, 0x14, 0xa7 },
+	{ 0xf5, 0xc4, 0xb2, 0xba, 0x1a, 0x00, 0x78, 0x1b,
+	  0x13, 0xab, 0xa0, 0x42, 0x52, 0x42, 0xc6, 0x9c,
+	  0xb1, 0x55, 0x2f, 0x3f, 0x71, 0xa9, 0xa3, 0xbb,
+	  0x22, 0xb4, 0xa6, 0xb4, 0x27, 0x7b, 0x46, 0xdd },
+	{ 0xe3, 0x3c, 0x4c, 0x9b, 0xd0, 0xcc, 0x7e, 0x45,
+	  0xc8, 0x0e, 0x65, 0xc7, 0x7f, 0xa5, 0x99, 0x7f,
+	  0xec, 0x70, 0x02, 0x73, 0x85, 0x41, 0x50, 0x9e,
+	  0x68, 0xa9, 0x42, 0x38, 0x91, 0xe8, 0x22, 0xa3 },
+	{ 0xfb, 0xa1, 0x61, 0x69, 0xb2, 0xc3, 0xee, 0x10,
+	  0x5b, 0xe6, 0xe1, 0xe6, 0x50, 0xe5, 0xcb, 0xf4,
+	  0x07, 0x46, 0xb6, 0x75, 0x3d, 0x03, 0x6a, 0xb5,
+	  0x51, 0x79, 0x01, 0x4a, 0xd7, 0xef, 0x66, 0x51 },
+	{ 0xf5, 0xc4, 0xbe, 0xc6, 0xd6, 0x2f, 0xc6, 0x08,
+	  0xbf, 0x41, 0xcc, 0x11, 0x5f, 0x16, 0xd6, 0x1c,
+	  0x7e, 0xfd, 0x3f, 0xf6, 0xc6, 0x56, 0x92, 0xbb,
+	  0xe0, 0xaf, 0xff, 0xb1, 0xfe, 0xde, 0x74, 0x75 },
+	{ 0xa4, 0x86, 0x2e, 0x76, 0xdb, 0x84, 0x7f, 0x05,
+	  0xba, 0x17, 0xed, 0xe5, 0xda, 0x4e, 0x7f, 0x91,
+	  0xb5, 0x92, 0x5c, 0xf1, 0xad, 0x4b, 0xa1, 0x27,
+	  0x32, 0xc3, 0x99, 0x57, 0x42, 0xa5, 0xcd, 0x6e },
+	{ 0x65, 0xf4, 0xb8, 0x60, 0xcd, 0x15, 0xb3, 0x8e,
+	  0xf8, 0x14, 0xa1, 0xa8, 0x04, 0x31, 0x4a, 0x55,
+	  0xbe, 0x95, 0x3c, 0xaa, 0x65, 0xfd, 0x75, 0x8a,
+	  0xd9, 0x89, 0xff, 0x34, 0xa4, 0x1c, 0x1e, 0xea },
+	{ 0x19, 0xba, 0x23, 0x4f, 0x0a, 0x4f, 0x38, 0x63,
+	  0x7d, 0x18, 0x39, 0xf9, 0xd9, 0xf7, 0x6a, 0xd9,
+	  0x1c, 0x85, 0x22, 0x30, 0x71, 0x43, 0xc9, 0x7d,
+	  0x5f, 0x93, 0xf6, 0x92, 0x74, 0xce, 0xc9, 0xa7 },
+	{ 0x1a, 0x67, 0x18, 0x6c, 0xa4, 0xa5, 0xcb, 0x8e,
+	  0x65, 0xfc, 0xa0, 0xe2, 0xec, 0xbc, 0x5d, 0xdc,
+	  0x14, 0xae, 0x38, 0x1b, 0xb8, 0xbf, 0xfe, 0xb9,
+	  0xe0, 0xa1, 0x03, 0x44, 0x9e, 0x3e, 0xf0, 0x3c },
+	{ 0xaf, 0xbe, 0xa3, 0x17, 0xb5, 0xa2, 0xe8, 0x9c,
+	  0x0b, 0xd9, 0x0c, 0xcf, 0x5d, 0x7f, 0xd0, 0xed,
+	  0x57, 0xfe, 0x58, 0x5e, 0x4b, 0xe3, 0x27, 0x1b,
+	  0x0a, 0x6b, 0xf0, 0xf5, 0x78, 0x6b, 0x0f, 0x26 },
+	{ 0xf1, 0xb0, 0x15, 0x58, 0xce, 0x54, 0x12, 0x62,
+	  0xf5, 0xec, 0x34, 0x29, 0x9d, 0x6f, 0xb4, 0x09,
+	  0x00, 0x09, 0xe3, 0x43, 0x4b, 0xe2, 0xf4, 0x91,
+	  0x05, 0xcf, 0x46, 0xaf, 0x4d, 0x2d, 0x41, 0x24 },
+	{ 0x13, 0xa0, 0xa0, 0xc8, 0x63, 0x35, 0x63, 0x5e,
+	  0xaa, 0x74, 0xca, 0x2d, 0x5d, 0x48, 0x8c, 0x79,
+	  0x7b, 0xbb, 0x4f, 0x47, 0xdc, 0x07, 0x10, 0x50,
+	  0x15, 0xed, 0x6a, 0x1f, 0x33, 0x09, 0xef, 0xce },
+	{ 0x15, 0x80, 0xaf, 0xee, 0xbe, 0xbb, 0x34, 0x6f,
+	  0x94, 0xd5, 0x9f, 0xe6, 0x2d, 0xa0, 0xb7, 0x92,
+	  0x37, 0xea, 0xd7, 0xb1, 0x49, 0x1f, 0x56, 0x67,
+	  0xa9, 0x0e, 0x45, 0xed, 0xf6, 0xca, 0x8b, 0x03 },
+	{ 0x20, 0xbe, 0x1a, 0x87, 0x5b, 0x38, 0xc5, 0x73,
+	  0xdd, 0x7f, 0xaa, 0xa0, 0xde, 0x48, 0x9d, 0x65,
+	  0x5c, 0x11, 0xef, 0xb6, 0xa5, 0x52, 0x69, 0x8e,
+	  0x07, 0xa2, 0xd3, 0x31, 0xb5, 0xf6, 0x55, 0xc3 },
+	{ 0xbe, 0x1f, 0xe3, 0xc4, 0xc0, 0x40, 0x18, 0xc5,
+	  0x4c, 0x4a, 0x0f, 0x6b, 0x9a, 0x2e, 0xd3, 0xc5,
+	  0x3a, 0xbe, 0x3a, 0x9f, 0x76, 0xb4, 0xd2, 0x6d,
+	  0xe5, 0x6f, 0xc9, 0xae, 0x95, 0x05, 0x9a, 0x99 },
+	{ 0xe3, 0xe3, 0xac, 0xe5, 0x37, 0xeb, 0x3e, 0xdd,
+	  0x84, 0x63, 0xd9, 0xad, 0x35, 0x82, 0xe1, 0x3c,
+	  0xf8, 0x65, 0x33, 0xff, 0xde, 0x43, 0xd6, 0x68,
+	  0xdd, 0x2e, 0x93, 0xbb, 0xdb, 0xd7, 0x19, 0x5a },
+	{ 0x11, 0x0c, 0x50, 0xc0, 0xbf, 0x2c, 0x6e, 0x7a,
+	  0xeb, 0x7e, 0x43, 0x5d, 0x92, 0xd1, 0x32, 0xab,
+	  0x66, 0x55, 0x16, 0x8e, 0x78, 0xa2, 0xde, 0xcd,
+	  0xec, 0x33, 0x30, 0x77, 0x76, 0x84, 0xd9, 0xc1 },
+	{ 0xe9, 0xba, 0x8f, 0x50, 0x5c, 0x9c, 0x80, 0xc0,
+	  0x86, 0x66, 0xa7, 0x01, 0xf3, 0x36, 0x7e, 0x6c,
+	  0xc6, 0x65, 0xf3, 0x4b, 0x22, 0xe7, 0x3c, 0x3c,
+	  0x04, 0x17, 0xeb, 0x1c, 0x22, 0x06, 0x08, 0x2f },
+	{ 0x26, 0xcd, 0x66, 0xfc, 0xa0, 0x23, 0x79, 0xc7,
+	  0x6d, 0xf1, 0x23, 0x17, 0x05, 0x2b, 0xca, 0xfd,
+	  0x6c, 0xd8, 0xc3, 0xa7, 0xb8, 0x90, 0xd8, 0x05,
+	  0xf3, 0x6c, 0x49, 0x98, 0x97, 0x82, 0x43, 0x3a },
+	{ 0x21, 0x3f, 0x35, 0x96, 0xd6, 0xe3, 0xa5, 0xd0,
+	  0xe9, 0x93, 0x2c, 0xd2, 0x15, 0x91, 0x46, 0x01,
+	  0x5e, 0x2a, 0xbc, 0x94, 0x9f, 0x47, 0x29, 0xee,
+	  0x26, 0x32, 0xfe, 0x1e, 0xdb, 0x78, 0xd3, 0x37 },
+	{ 0x10, 0x15, 0xd7, 0x01, 0x08, 0xe0, 0x3b, 0xe1,
+	  0xc7, 0x02, 0xfe, 0x97, 0x25, 0x36, 0x07, 0xd1,
+	  0x4a, 0xee, 0x59, 0x1f, 0x24, 0x13, 0xea, 0x67,
+	  0x87, 0x42, 0x7b, 0x64, 0x59, 0xff, 0x21, 0x9a },
+	{ 0x3c, 0xa9, 0x89, 0xde, 0x10, 0xcf, 0xe6, 0x09,
+	  0x90, 0x94, 0x72, 0xc8, 0xd3, 0x56, 0x10, 0x80,
+	  0x5b, 0x2f, 0x97, 0x77, 0x34, 0xcf, 0x65, 0x2c,
+	  0xc6, 0x4b, 0x3b, 0xfc, 0x88, 0x2d, 0x5d, 0x89 },
+	{ 0xb6, 0x15, 0x6f, 0x72, 0xd3, 0x80, 0xee, 0x9e,
+	  0xa6, 0xac, 0xd1, 0x90, 0x46, 0x4f, 0x23, 0x07,
+	  0xa5, 0xc1, 0x79, 0xef, 0x01, 0xfd, 0x71, 0xf9,
+	  0x9f, 0x2d, 0x0f, 0x7a, 0x57, 0x36, 0x0a, 0xea },
+	{ 0xc0, 0x3b, 0xc6, 0x42, 0xb2, 0x09, 0x59, 0xcb,
+	  0xe1, 0x33, 0xa0, 0x30, 0x3e, 0x0c, 0x1a, 0xbf,
+	  0xf3, 0xe3, 0x1e, 0xc8, 0xe1, 0xa3, 0x28, 0xec,
+	  0x85, 0x65, 0xc3, 0x6d, 0xec, 0xff, 0x52, 0x65 },
+	{ 0x2c, 0x3e, 0x08, 0x17, 0x6f, 0x76, 0x0c, 0x62,
+	  0x64, 0xc3, 0xa2, 0xcd, 0x66, 0xfe, 0xc6, 0xc3,
+	  0xd7, 0x8d, 0xe4, 0x3f, 0xc1, 0x92, 0x45, 0x7b,
+	  0x2a, 0x4a, 0x66, 0x0a, 0x1e, 0x0e, 0xb2, 0x2b },
+	{ 0xf7, 0x38, 0xc0, 0x2f, 0x3c, 0x1b, 0x19, 0x0c,
+	  0x51, 0x2b, 0x1a, 0x32, 0xde, 0xab, 0xf3, 0x53,
+	  0x72, 0x8e, 0x0e, 0x9a, 0xb0, 0x34, 0x49, 0x0e,
+	  0x3c, 0x34, 0x09, 0x94, 0x6a, 0x97, 0xae, 0xec },
+	{ 0x8b, 0x18, 0x80, 0xdf, 0x30, 0x1c, 0xc9, 0x63,
+	  0x41, 0x88, 0x11, 0x08, 0x89, 0x64, 0x83, 0x92,
+	  0x87, 0xff, 0x7f, 0xe3, 0x1c, 0x49, 0xea, 0x6e,
+	  0xbd, 0x9e, 0x48, 0xbd, 0xee, 0xe4, 0x97, 0xc5 },
+	{ 0x1e, 0x75, 0xcb, 0x21, 0xc6, 0x09, 0x89, 0x02,
+	  0x03, 0x75, 0xf1, 0xa7, 0xa2, 0x42, 0x83, 0x9f,
+	  0x0b, 0x0b, 0x68, 0x97, 0x3a, 0x4c, 0x2a, 0x05,
+	  0xcf, 0x75, 0x55, 0xed, 0x5a, 0xae, 0xc4, 0xc1 },
+	{ 0x62, 0xbf, 0x8a, 0x9c, 0x32, 0xa5, 0xbc, 0xcf,
+	  0x29, 0x0b, 0x6c, 0x47, 0x4d, 0x75, 0xb2, 0xa2,
+	  0xa4, 0x09, 0x3f, 0x1a, 0x9e, 0x27, 0x13, 0x94,
+	  0x33, 0xa8, 0xf2, 0xb3, 0xbc, 0xe7, 0xb8, 0xd7 },
+	{ 0x16, 0x6c, 0x83, 0x50, 0xd3, 0x17, 0x3b, 0x5e,
+	  0x70, 0x2b, 0x78, 0x3d, 0xfd, 0x33, 0xc6, 0x6e,
+	  0xe0, 0x43, 0x27, 0x42, 0xe9, 0xb9, 0x2b, 0x99,
+	  0x7f, 0xd2, 0x3c, 0x60, 0xdc, 0x67, 0x56, 0xca },
+	{ 0x04, 0x4a, 0x14, 0xd8, 0x22, 0xa9, 0x0c, 0xac,
+	  0xf2, 0xf5, 0xa1, 0x01, 0x42, 0x8a, 0xdc, 0x8f,
+	  0x41, 0x09, 0x38, 0x6c, 0xcb, 0x15, 0x8b, 0xf9,
+	  0x05, 0xc8, 0x61, 0x8b, 0x8e, 0xe2, 0x4e, 0xc3 },
+	{ 0x38, 0x7d, 0x39, 0x7e, 0xa4, 0x3a, 0x99, 0x4b,
+	  0xe8, 0x4d, 0x2d, 0x54, 0x4a, 0xfb, 0xe4, 0x81,
+	  0xa2, 0x00, 0x0f, 0x55, 0x25, 0x26, 0x96, 0xbb,
+	  0xa2, 0xc5, 0x0c, 0x8e, 0xbd, 0x10, 0x13, 0x47 },
+	{ 0x56, 0xf8, 0xcc, 0xf1, 0xf8, 0x64, 0x09, 0xb4,
+	  0x6c, 0xe3, 0x61, 0x66, 0xae, 0x91, 0x65, 0x13,
+	  0x84, 0x41, 0x57, 0x75, 0x89, 0xdb, 0x08, 0xcb,
+	  0xc5, 0xf6, 0x6c, 0xa2, 0x97, 0x43, 0xb9, 0xfd },
+	{ 0x97, 0x06, 0xc0, 0x92, 0xb0, 0x4d, 0x91, 0xf5,
+	  0x3d, 0xff, 0x91, 0xfa, 0x37, 0xb7, 0x49, 0x3d,
+	  0x28, 0xb5, 0x76, 0xb5, 0xd7, 0x10, 0x46, 0x9d,
+	  0xf7, 0x94, 0x01, 0x66, 0x22, 0x36, 0xfc, 0x03 },
+	{ 0x87, 0x79, 0x68, 0x68, 0x6c, 0x06, 0x8c, 0xe2,
+	  0xf7, 0xe2, 0xad, 0xcf, 0xf6, 0x8b, 0xf8, 0x74,
+	  0x8e, 0xdf, 0x3c, 0xf8, 0x62, 0xcf, 0xb4, 0xd3,
+	  0x94, 0x7a, 0x31, 0x06, 0x95, 0x80, 0x54, 0xe3 },
+	{ 0x88, 0x17, 0xe5, 0x71, 0x98, 0x79, 0xac, 0xf7,
+	  0x02, 0x47, 0x87, 0xec, 0xcd, 0xb2, 0x71, 0x03,
+	  0x55, 0x66, 0xcf, 0xa3, 0x33, 0xe0, 0x49, 0x40,
+	  0x7c, 0x01, 0x78, 0xcc, 0xc5, 0x7a, 0x5b, 0x9f },
+	{ 0x89, 0x38, 0x24, 0x9e, 0x4b, 0x50, 0xca, 0xda,
+	  0xcc, 0xdf, 0x5b, 0x18, 0x62, 0x13, 0x26, 0xcb,
+	  0xb1, 0x52, 0x53, 0xe3, 0x3a, 0x20, 0xf5, 0x63,
+	  0x6e, 0x99, 0x5d, 0x72, 0x47, 0x8d, 0xe4, 0x72 },
+	{ 0xf1, 0x64, 0xab, 0xba, 0x49, 0x63, 0xa4, 0x4d,
+	  0x10, 0x72, 0x57, 0xe3, 0x23, 0x2d, 0x90, 0xac,
+	  0xa5, 0xe6, 0x6a, 0x14, 0x08, 0x24, 0x8c, 0x51,
+	  0x74, 0x1e, 0x99, 0x1d, 0xb5, 0x22, 0x77, 0x56 },
+	{ 0xd0, 0x55, 0x63, 0xe2, 0xb1, 0xcb, 0xa0, 0xc4,
+	  0xa2, 0xa1, 0xe8, 0xbd, 0xe3, 0xa1, 0xa0, 0xd9,
+	  0xf5, 0xb4, 0x0c, 0x85, 0xa0, 0x70, 0xd6, 0xf5,
+	  0xfb, 0x21, 0x06, 0x6e, 0xad, 0x5d, 0x06, 0x01 },
+	{ 0x03, 0xfb, 0xb1, 0x63, 0x84, 0xf0, 0xa3, 0x86,
+	  0x6f, 0x4c, 0x31, 0x17, 0x87, 0x76, 0x66, 0xef,
+	  0xbf, 0x12, 0x45, 0x97, 0x56, 0x4b, 0x29, 0x3d,
+	  0x4a, 0xab, 0x0d, 0x26, 0x9f, 0xab, 0xdd, 0xfa },
+	{ 0x5f, 0xa8, 0x48, 0x6a, 0xc0, 0xe5, 0x29, 0x64,
+	  0xd1, 0x88, 0x1b, 0xbe, 0x33, 0x8e, 0xb5, 0x4b,
+	  0xe2, 0xf7, 0x19, 0x54, 0x92, 0x24, 0x89, 0x20,
+	  0x57, 0xb4, 0xda, 0x04, 0xba, 0x8b, 0x34, 0x75 },
+	{ 0xcd, 0xfa, 0xbc, 0xee, 0x46, 0x91, 0x11, 0x11,
+	  0x23, 0x6a, 0x31, 0x70, 0x8b, 0x25, 0x39, 0xd7,
+	  0x1f, 0xc2, 0x11, 0xd9, 0xb0, 0x9c, 0x0d, 0x85,
+	  0x30, 0xa1, 0x1e, 0x1d, 0xbf, 0x6e, 0xed, 0x01 },
+	{ 0x4f, 0x82, 0xde, 0x03, 0xb9, 0x50, 0x47, 0x93,
+	  0xb8, 0x2a, 0x07, 0xa0, 0xbd, 0xcd, 0xff, 0x31,
+	  0x4d, 0x75, 0x9e, 0x7b, 0x62, 0xd2, 0x6b, 0x78,
+	  0x49, 0x46, 0xb0, 0xd3, 0x6f, 0x91, 0x6f, 0x52 },
+	{ 0x25, 0x9e, 0xc7, 0xf1, 0x73, 0xbc, 0xc7, 0x6a,
+	  0x09, 0x94, 0xc9, 0x67, 0xb4, 0xf5, 0xf0, 0x24,
+	  0xc5, 0x60, 0x57, 0xfb, 0x79, 0xc9, 0x65, 0xc4,
+	  0xfa, 0xe4, 0x18, 0x75, 0xf0, 0x6a, 0x0e, 0x4c },
+	{ 0x19, 0x3c, 0xc8, 0xe7, 0xc3, 0xe0, 0x8b, 0xb3,
+	  0x0f, 0x54, 0x37, 0xaa, 0x27, 0xad, 0xe1, 0xf1,
+	  0x42, 0x36, 0x9b, 0x24, 0x6a, 0x67, 0x5b, 0x23,
+	  0x83, 0xe6, 0xda, 0x9b, 0x49, 0xa9, 0x80, 0x9e },
+	{ 0x5c, 0x10, 0x89, 0x6f, 0x0e, 0x28, 0x56, 0xb2,
+	  0xa2, 0xee, 0xe0, 0xfe, 0x4a, 0x2c, 0x16, 0x33,
+	  0x56, 0x5d, 0x18, 0xf0, 0xe9, 0x3e, 0x1f, 0xab,
+	  0x26, 0xc3, 0x73, 0xe8, 0xf8, 0x29, 0x65, 0x4d },
+	{ 0xf1, 0x60, 0x12, 0xd9, 0x3f, 0x28, 0x85, 0x1a,
+	  0x1e, 0xb9, 0x89, 0xf5, 0xd0, 0xb4, 0x3f, 0x3f,
+	  0x39, 0xca, 0x73, 0xc9, 0xa6, 0x2d, 0x51, 0x81,
+	  0xbf, 0xf2, 0x37, 0x53, 0x6b, 0xd3, 0x48, 0xc3 },
+	{ 0x29, 0x66, 0xb3, 0xcf, 0xae, 0x1e, 0x44, 0xea,
+	  0x99, 0x6d, 0xc5, 0xd6, 0x86, 0xcf, 0x25, 0xfa,
+	  0x05, 0x3f, 0xb6, 0xf6, 0x72, 0x01, 0xb9, 0xe4,
+	  0x6e, 0xad, 0xe8, 0x5d, 0x0a, 0xd6, 0xb8, 0x06 },
+	{ 0xdd, 0xb8, 0x78, 0x24, 0x85, 0xe9, 0x00, 0xbc,
+	  0x60, 0xbc, 0xf4, 0xc3, 0x3a, 0x6f, 0xd5, 0x85,
+	  0x68, 0x0c, 0xc6, 0x83, 0xd5, 0x16, 0xef, 0xa0,
+	  0x3e, 0xb9, 0x98, 0x5f, 0xad, 0x87, 0x15, 0xfb },
+	{ 0x4c, 0x4d, 0x6e, 0x71, 0xae, 0xa0, 0x57, 0x86,
+	  0x41, 0x31, 0x48, 0xfc, 0x7a, 0x78, 0x6b, 0x0e,
+	  0xca, 0xf5, 0x82, 0xcf, 0xf1, 0x20, 0x9f, 0x5a,
+	  0x80, 0x9f, 0xba, 0x85, 0x04, 0xce, 0x66, 0x2c },
+	{ 0xfb, 0x4c, 0x5e, 0x86, 0xd7, 0xb2, 0x22, 0x9b,
+	  0x99, 0xb8, 0xba, 0x6d, 0x94, 0xc2, 0x47, 0xef,
+	  0x96, 0x4a, 0xa3, 0xa2, 0xba, 0xe8, 0xed, 0xc7,
+	  0x75, 0x69, 0xf2, 0x8d, 0xbb, 0xff, 0x2d, 0x4e },
+	{ 0xe9, 0x4f, 0x52, 0x6d, 0xe9, 0x01, 0x96, 0x33,
+	  0xec, 0xd5, 0x4a, 0xc6, 0x12, 0x0f, 0x23, 0x95,
+	  0x8d, 0x77, 0x18, 0xf1, 0xe7, 0x71, 0x7b, 0xf3,
+	  0x29, 0x21, 0x1a, 0x4f, 0xae, 0xed, 0x4e, 0x6d },
+	{ 0xcb, 0xd6, 0x66, 0x0a, 0x10, 0xdb, 0x3f, 0x23,
+	  0xf7, 0xa0, 0x3d, 0x4b, 0x9d, 0x40, 0x44, 0xc7,
+	  0x93, 0x2b, 0x28, 0x01, 0xac, 0x89, 0xd6, 0x0b,
+	  0xc9, 0xeb, 0x92, 0xd6, 0x5a, 0x46, 0xc2, 0xa0 },
+	{ 0x88, 0x18, 0xbb, 0xd3, 0xdb, 0x4d, 0xc1, 0x23,
+	  0xb2, 0x5c, 0xbb, 0xa5, 0xf5, 0x4c, 0x2b, 0xc4,
+	  0xb3, 0xfc, 0xf9, 0xbf, 0x7d, 0x7a, 0x77, 0x09,
+	  0xf4, 0xae, 0x58, 0x8b, 0x26, 0x7c, 0x4e, 0xce },
+	{ 0xc6, 0x53, 0x82, 0x51, 0x3f, 0x07, 0x46, 0x0d,
+	  0xa3, 0x98, 0x33, 0xcb, 0x66, 0x6c, 0x5e, 0xd8,
+	  0x2e, 0x61, 0xb9, 0xe9, 0x98, 0xf4, 0xb0, 0xc4,
+	  0x28, 0x7c, 0xee, 0x56, 0xc3, 0xcc, 0x9b, 0xcd },
+	{ 0x89, 0x75, 0xb0, 0x57, 0x7f, 0xd3, 0x55, 0x66,
+	  0xd7, 0x50, 0xb3, 0x62, 0xb0, 0x89, 0x7a, 0x26,
+	  0xc3, 0x99, 0x13, 0x6d, 0xf0, 0x7b, 0xab, 0xab,
+	  0xbd, 0xe6, 0x20, 0x3f, 0xf2, 0x95, 0x4e, 0xd4 },
+	{ 0x21, 0xfe, 0x0c, 0xeb, 0x00, 0x52, 0xbe, 0x7f,
+	  0xb0, 0xf0, 0x04, 0x18, 0x7c, 0xac, 0xd7, 0xde,
+	  0x67, 0xfa, 0x6e, 0xb0, 0x93, 0x8d, 0x92, 0x76,
+	  0x77, 0xf2, 0x39, 0x8c, 0x13, 0x23, 0x17, 0xa8 },
+	{ 0x2e, 0xf7, 0x3f, 0x3c, 0x26, 0xf1, 0x2d, 0x93,
+	  0x88, 0x9f, 0x3c, 0x78, 0xb6, 0xa6, 0x6c, 0x1d,
+	  0x52, 0xb6, 0x49, 0xdc, 0x9e, 0x85, 0x6e, 0x2c,
+	  0x17, 0x2e, 0xa7, 0xc5, 0x8a, 0xc2, 0xb5, 0xe3 },
+	{ 0x38, 0x8a, 0x3c, 0xd5, 0x6d, 0x73, 0x86, 0x7a,
+	  0xbb, 0x5f, 0x84, 0x01, 0x49, 0x2b, 0x6e, 0x26,
+	  0x81, 0xeb, 0x69, 0x85, 0x1e, 0x76, 0x7f, 0xd8,
+	  0x42, 0x10, 0xa5, 0x60, 0x76, 0xfb, 0x3d, 0xd3 },
+	{ 0xaf, 0x53, 0x3e, 0x02, 0x2f, 0xc9, 0x43, 0x9e,
+	  0x4e, 0x3c, 0xb8, 0x38, 0xec, 0xd1, 0x86, 0x92,
+	  0x23, 0x2a, 0xdf, 0x6f, 0xe9, 0x83, 0x95, 0x26,
+	  0xd3, 0xc3, 0xdd, 0x1b, 0x71, 0x91, 0x0b, 0x1a },
+	{ 0x75, 0x1c, 0x09, 0xd4, 0x1a, 0x93, 0x43, 0x88,
+	  0x2a, 0x81, 0xcd, 0x13, 0xee, 0x40, 0x81, 0x8d,
+	  0x12, 0xeb, 0x44, 0xc6, 0xc7, 0xf4, 0x0d, 0xf1,
+	  0x6e, 0x4a, 0xea, 0x8f, 0xab, 0x91, 0x97, 0x2a },
+	{ 0x5b, 0x73, 0xdd, 0xb6, 0x8d, 0x9d, 0x2b, 0x0a,
+	  0xa2, 0x65, 0xa0, 0x79, 0x88, 0xd6, 0xb8, 0x8a,
+	  0xe9, 0xaa, 0xc5, 0x82, 0xaf, 0x83, 0x03, 0x2f,
+	  0x8a, 0x9b, 0x21, 0xa2, 0xe1, 0xb7, 0xbf, 0x18 },
+	{ 0x3d, 0xa2, 0x91, 0x26, 0xc7, 0xc5, 0xd7, 0xf4,
+	  0x3e, 0x64, 0x24, 0x2a, 0x79, 0xfe, 0xaa, 0x4e,
+	  0xf3, 0x45, 0x9c, 0xde, 0xcc, 0xc8, 0x98, 0xed,
+	  0x59, 0xa9, 0x7f, 0x6e, 0xc9, 0x3b, 0x9d, 0xab },
+	{ 0x56, 0x6d, 0xc9, 0x20, 0x29, 0x3d, 0xa5, 0xcb,
+	  0x4f, 0xe0, 0xaa, 0x8a, 0xbd, 0xa8, 0xbb, 0xf5,
+	  0x6f, 0x55, 0x23, 0x13, 0xbf, 0xf1, 0x90, 0x46,
+	  0x64, 0x1e, 0x36, 0x15, 0xc1, 0xe3, 0xed, 0x3f },
+	{ 0x41, 0x15, 0xbe, 0xa0, 0x2f, 0x73, 0xf9, 0x7f,
+	  0x62, 0x9e, 0x5c, 0x55, 0x90, 0x72, 0x0c, 0x01,
+	  0xe7, 0xe4, 0x49, 0xae, 0x2a, 0x66, 0x97, 0xd4,
+	  0xd2, 0x78, 0x33, 0x21, 0x30, 0x36, 0x92, 0xf9 },
+	{ 0x4c, 0xe0, 0x8f, 0x47, 0x62, 0x46, 0x8a, 0x76,
+	  0x70, 0x01, 0x21, 0x64, 0x87, 0x8d, 0x68, 0x34,
+	  0x0c, 0x52, 0xa3, 0x5e, 0x66, 0xc1, 0x88, 0x4d,
+	  0x5c, 0x86, 0x48, 0x89, 0xab, 0xc9, 0x66, 0x77 },
+	{ 0x81, 0xea, 0x0b, 0x78, 0x04, 0x12, 0x4e, 0x0c,
+	  0x22, 0xea, 0x5f, 0xc7, 0x11, 0x04, 0xa2, 0xaf,
+	  0xcb, 0x52, 0xa1, 0xfa, 0x81, 0x6f, 0x3e, 0xcb,
+	  0x7d, 0xcb, 0x5d, 0x9d, 0xea, 0x17, 0x86, 0xd0 },
+	{ 0xfe, 0x36, 0x27, 0x33, 0xb0, 0x5f, 0x6b, 0xed,
+	  0xaf, 0x93, 0x79, 0xd7, 0xf7, 0x93, 0x6e, 0xde,
+	  0x20, 0x9b, 0x1f, 0x83, 0x23, 0xc3, 0x92, 0x25,
+	  0x49, 0xd9, 0xe7, 0x36, 0x81, 0xb5, 0xdb, 0x7b },
+	{ 0xef, 0xf3, 0x7d, 0x30, 0xdf, 0xd2, 0x03, 0x59,
+	  0xbe, 0x4e, 0x73, 0xfd, 0xf4, 0x0d, 0x27, 0x73,
+	  0x4b, 0x3d, 0xf9, 0x0a, 0x97, 0xa5, 0x5e, 0xd7,
+	  0x45, 0x29, 0x72, 0x94, 0xca, 0x85, 0xd0, 0x9f },
+	{ 0x17, 0x2f, 0xfc, 0x67, 0x15, 0x3d, 0x12, 0xe0,
+	  0xca, 0x76, 0xa8, 0xb6, 0xcd, 0x5d, 0x47, 0x31,
+	  0x88, 0x5b, 0x39, 0xce, 0x0c, 0xac, 0x93, 0xa8,
+	  0x97, 0x2a, 0x18, 0x00, 0x6c, 0x8b, 0x8b, 0xaf },
+	{ 0xc4, 0x79, 0x57, 0xf1, 0xcc, 0x88, 0xe8, 0x3e,
+	  0xf9, 0x44, 0x58, 0x39, 0x70, 0x9a, 0x48, 0x0a,
+	  0x03, 0x6b, 0xed, 0x5f, 0x88, 0xac, 0x0f, 0xcc,
+	  0x8e, 0x1e, 0x70, 0x3f, 0xfa, 0xac, 0x13, 0x2c },
+	{ 0x30, 0xf3, 0x54, 0x83, 0x70, 0xcf, 0xdc, 0xed,
+	  0xa5, 0xc3, 0x7b, 0x56, 0x9b, 0x61, 0x75, 0xe7,
+	  0x99, 0xee, 0xf1, 0xa6, 0x2a, 0xaa, 0x94, 0x32,
+	  0x45, 0xae, 0x76, 0x69, 0xc2, 0x27, 0xa7, 0xb5 },
+	{ 0xc9, 0x5d, 0xcb, 0x3c, 0xf1, 0xf2, 0x7d, 0x0e,
+	  0xef, 0x2f, 0x25, 0xd2, 0x41, 0x38, 0x70, 0x90,
+	  0x4a, 0x87, 0x7c, 0x4a, 0x56, 0xc2, 0xde, 0x1e,
+	  0x83, 0xe2, 0xbc, 0x2a, 0xe2, 0xe4, 0x68, 0x21 },
+	{ 0xd5, 0xd0, 0xb5, 0xd7, 0x05, 0x43, 0x4c, 0xd4,
+	  0x6b, 0x18, 0x57, 0x49, 0xf6, 0x6b, 0xfb, 0x58,
+	  0x36, 0xdc, 0xdf, 0x6e, 0xe5, 0x49, 0xa2, 0xb7,
+	  0xa4, 0xae, 0xe7, 0xf5, 0x80, 0x07, 0xca, 0xaf },
+	{ 0xbb, 0xc1, 0x24, 0xa7, 0x12, 0xf1, 0x5d, 0x07,
+	  0xc3, 0x00, 0xe0, 0x5b, 0x66, 0x83, 0x89, 0xa4,
+	  0x39, 0xc9, 0x17, 0x77, 0xf7, 0x21, 0xf8, 0x32,
+	  0x0c, 0x1c, 0x90, 0x78, 0x06, 0x6d, 0x2c, 0x7e },
+	{ 0xa4, 0x51, 0xb4, 0x8c, 0x35, 0xa6, 0xc7, 0x85,
+	  0x4c, 0xfa, 0xae, 0x60, 0x26, 0x2e, 0x76, 0x99,
+	  0x08, 0x16, 0x38, 0x2a, 0xc0, 0x66, 0x7e, 0x5a,
+	  0x5c, 0x9e, 0x1b, 0x46, 0xc4, 0x34, 0x2d, 0xdf },
+	{ 0xb0, 0xd1, 0x50, 0xfb, 0x55, 0xe7, 0x78, 0xd0,
+	  0x11, 0x47, 0xf0, 0xb5, 0xd8, 0x9d, 0x99, 0xec,
+	  0xb2, 0x0f, 0xf0, 0x7e, 0x5e, 0x67, 0x60, 0xd6,
+	  0xb6, 0x45, 0xeb, 0x5b, 0x65, 0x4c, 0x62, 0x2b },
+	{ 0x34, 0xf7, 0x37, 0xc0, 0xab, 0x21, 0x99, 0x51,
+	  0xee, 0xe8, 0x9a, 0x9f, 0x8d, 0xac, 0x29, 0x9c,
+	  0x9d, 0x4c, 0x38, 0xf3, 0x3f, 0xa4, 0x94, 0xc5,
+	  0xc6, 0xee, 0xfc, 0x92, 0xb6, 0xdb, 0x08, 0xbc },
+	{ 0x1a, 0x62, 0xcc, 0x3a, 0x00, 0x80, 0x0d, 0xcb,
+	  0xd9, 0x98, 0x91, 0x08, 0x0c, 0x1e, 0x09, 0x84,
+	  0x58, 0x19, 0x3a, 0x8c, 0xc9, 0xf9, 0x70, 0xea,
+	  0x99, 0xfb, 0xef, 0xf0, 0x03, 0x18, 0xc2, 0x89 },
+	{ 0xcf, 0xce, 0x55, 0xeb, 0xaf, 0xc8, 0x40, 0xd7,
+	  0xae, 0x48, 0x28, 0x1c, 0x7f, 0xd5, 0x7e, 0xc8,
+	  0xb4, 0x82, 0xd4, 0xb7, 0x04, 0x43, 0x74, 0x95,
+	  0x49, 0x5a, 0xc4, 0x14, 0xcf, 0x4a, 0x37, 0x4b },
+	{ 0x67, 0x46, 0xfa, 0xcf, 0x71, 0x14, 0x6d, 0x99,
+	  0x9d, 0xab, 0xd0, 0x5d, 0x09, 0x3a, 0xe5, 0x86,
+	  0x64, 0x8d, 0x1e, 0xe2, 0x8e, 0x72, 0x61, 0x7b,
+	  0x99, 0xd0, 0xf0, 0x08, 0x6e, 0x1e, 0x45, 0xbf },
+	{ 0x57, 0x1c, 0xed, 0x28, 0x3b, 0x3f, 0x23, 0xb4,
+	  0xe7, 0x50, 0xbf, 0x12, 0xa2, 0xca, 0xf1, 0x78,
+	  0x18, 0x47, 0xbd, 0x89, 0x0e, 0x43, 0x60, 0x3c,
+	  0xdc, 0x59, 0x76, 0x10, 0x2b, 0x7b, 0xb1, 0x1b },
+	{ 0xcf, 0xcb, 0x76, 0x5b, 0x04, 0x8e, 0x35, 0x02,
+	  0x2c, 0x5d, 0x08, 0x9d, 0x26, 0xe8, 0x5a, 0x36,
+	  0xb0, 0x05, 0xa2, 0xb8, 0x04, 0x93, 0xd0, 0x3a,
+	  0x14, 0x4e, 0x09, 0xf4, 0x09, 0xb6, 0xaf, 0xd1 },
+	{ 0x40, 0x50, 0xc7, 0xa2, 0x77, 0x05, 0xbb, 0x27,
+	  0xf4, 0x20, 0x89, 0xb2, 0x99, 0xf3, 0xcb, 0xe5,
+	  0x05, 0x4e, 0xad, 0x68, 0x72, 0x7e, 0x8e, 0xf9,
+	  0x31, 0x8c, 0xe6, 0xf2, 0x5c, 0xd6, 0xf3, 0x1d },
+	{ 0x18, 0x40, 0x70, 0xbd, 0x5d, 0x26, 0x5f, 0xbd,
+	  0xc1, 0x42, 0xcd, 0x1c, 0x5c, 0xd0, 0xd7, 0xe4,
+	  0x14, 0xe7, 0x03, 0x69, 0xa2, 0x66, 0xd6, 0x27,
+	  0xc8, 0xfb, 0xa8, 0x4f, 0xa5, 0xe8, 0x4c, 0x34 },
+	{ 0x9e, 0xdd, 0xa9, 0xa4, 0x44, 0x39, 0x02, 0xa9,
+	  0x58, 0x8c, 0x0d, 0x0c, 0xcc, 0x62, 0xb9, 0x30,
+	  0x21, 0x84, 0x79, 0xa6, 0x84, 0x1e, 0x6f, 0xe7,
+	  0xd4, 0x30, 0x03, 0xf0, 0x4b, 0x1f, 0xd6, 0x43 },
+	{ 0xe4, 0x12, 0xfe, 0xef, 0x79, 0x08, 0x32, 0x4a,
+	  0x6d, 0xa1, 0x84, 0x16, 0x29, 0xf3, 0x5d, 0x3d,
+	  0x35, 0x86, 0x42, 0x01, 0x93, 0x10, 0xec, 0x57,
+	  0xc6, 0x14, 0x83, 0x6b, 0x63, 0xd3, 0x07, 0x63 },
+	{ 0x1a, 0x2b, 0x8e, 0xdf, 0xf3, 0xf9, 0xac, 0xc1,
+	  0x55, 0x4f, 0xcb, 0xae, 0x3c, 0xf1, 0xd6, 0x29,
+	  0x8c, 0x64, 0x62, 0xe2, 0x2e, 0x5e, 0xb0, 0x25,
+	  0x96, 0x84, 0xf8, 0x35, 0x01, 0x2b, 0xd1, 0x3f },
+	{ 0x28, 0x8c, 0x4a, 0xd9, 0xb9, 0x40, 0x97, 0x62,
+	  0xea, 0x07, 0xc2, 0x4a, 0x41, 0xf0, 0x4f, 0x69,
+	  0xa7, 0xd7, 0x4b, 0xee, 0x2d, 0x95, 0x43, 0x53,
+	  0x74, 0xbd, 0xe9, 0x46, 0xd7, 0x24, 0x1c, 0x7b },
+	{ 0x80, 0x56, 0x91, 0xbb, 0x28, 0x67, 0x48, 0xcf,
+	  0xb5, 0x91, 0xd3, 0xae, 0xbe, 0x7e, 0x6f, 0x4e,
+	  0x4d, 0xc6, 0xe2, 0x80, 0x8c, 0x65, 0x14, 0x3c,
+	  0xc0, 0x04, 0xe4, 0xeb, 0x6f, 0xd0, 0x9d, 0x43 },
+	{ 0xd4, 0xac, 0x8d, 0x3a, 0x0a, 0xfc, 0x6c, 0xfa,
+	  0x7b, 0x46, 0x0a, 0xe3, 0x00, 0x1b, 0xae, 0xb3,
+	  0x6d, 0xad, 0xb3, 0x7d, 0xa0, 0x7d, 0x2e, 0x8a,
+	  0xc9, 0x18, 0x22, 0xdf, 0x34, 0x8a, 0xed, 0x3d },
+	{ 0xc3, 0x76, 0x61, 0x70, 0x14, 0xd2, 0x01, 0x58,
+	  0xbc, 0xed, 0x3d, 0x3b, 0xa5, 0x52, 0xb6, 0xec,
+	  0xcf, 0x84, 0xe6, 0x2a, 0xa3, 0xeb, 0x65, 0x0e,
+	  0x90, 0x02, 0x9c, 0x84, 0xd1, 0x3e, 0xea, 0x69 },
+	{ 0xc4, 0x1f, 0x09, 0xf4, 0x3c, 0xec, 0xae, 0x72,
+	  0x93, 0xd6, 0x00, 0x7c, 0xa0, 0xa3, 0x57, 0x08,
+	  0x7d, 0x5a, 0xe5, 0x9b, 0xe5, 0x00, 0xc1, 0xcd,
+	  0x5b, 0x28, 0x9e, 0xe8, 0x10, 0xc7, 0xb0, 0x82 },
+	{ 0x03, 0xd1, 0xce, 0xd1, 0xfb, 0xa5, 0xc3, 0x91,
+	  0x55, 0xc4, 0x4b, 0x77, 0x65, 0xcb, 0x76, 0x0c,
+	  0x78, 0x70, 0x8d, 0xcf, 0xc8, 0x0b, 0x0b, 0xd8,
+	  0xad, 0xe3, 0xa5, 0x6d, 0xa8, 0x83, 0x0b, 0x29 },
+	{ 0x09, 0xbd, 0xe6, 0xf1, 0x52, 0x21, 0x8d, 0xc9,
+	  0x2c, 0x41, 0xd7, 0xf4, 0x53, 0x87, 0xe6, 0x3e,
+	  0x58, 0x69, 0xd8, 0x07, 0xec, 0x70, 0xb8, 0x21,
+	  0x40, 0x5d, 0xbd, 0x88, 0x4b, 0x7f, 0xcf, 0x4b },
+	{ 0x71, 0xc9, 0x03, 0x6e, 0x18, 0x17, 0x9b, 0x90,
+	  0xb3, 0x7d, 0x39, 0xe9, 0xf0, 0x5e, 0xb8, 0x9c,
+	  0xc5, 0xfc, 0x34, 0x1f, 0xd7, 0xc4, 0x77, 0xd0,
+	  0xd7, 0x49, 0x32, 0x85, 0xfa, 0xca, 0x08, 0xa4 },
+	{ 0x59, 0x16, 0x83, 0x3e, 0xbb, 0x05, 0xcd, 0x91,
+	  0x9c, 0xa7, 0xfe, 0x83, 0xb6, 0x92, 0xd3, 0x20,
+	  0x5b, 0xef, 0x72, 0x39, 0x2b, 0x2c, 0xf6, 0xbb,
+	  0x0a, 0x6d, 0x43, 0xf9, 0x94, 0xf9, 0x5f, 0x11 },
+	{ 0xf6, 0x3a, 0xab, 0x3e, 0xc6, 0x41, 0xb3, 0xb0,
+	  0x24, 0x96, 0x4c, 0x2b, 0x43, 0x7c, 0x04, 0xf6,
+	  0x04, 0x3c, 0x4c, 0x7e, 0x02, 0x79, 0x23, 0x99,
+	  0x95, 0x40, 0x19, 0x58, 0xf8, 0x6b, 0xbe, 0x54 },
+	{ 0xf1, 0x72, 0xb1, 0x80, 0xbf, 0xb0, 0x97, 0x40,
+	  0x49, 0x31, 0x20, 0xb6, 0x32, 0x6c, 0xbd, 0xc5,
+	  0x61, 0xe4, 0x77, 0xde, 0xf9, 0xbb, 0xcf, 0xd2,
+	  0x8c, 0xc8, 0xc1, 0xc5, 0xe3, 0x37, 0x9a, 0x31 },
+	{ 0xcb, 0x9b, 0x89, 0xcc, 0x18, 0x38, 0x1d, 0xd9,
+	  0x14, 0x1a, 0xde, 0x58, 0x86, 0x54, 0xd4, 0xe6,
+	  0xa2, 0x31, 0xd5, 0xbf, 0x49, 0xd4, 0xd5, 0x9a,
+	  0xc2, 0x7d, 0x86, 0x9c, 0xbe, 0x10, 0x0c, 0xf3 },
+	{ 0x7b, 0xd8, 0x81, 0x50, 0x46, 0xfd, 0xd8, 0x10,
+	  0xa9, 0x23, 0xe1, 0x98, 0x4a, 0xae, 0xbd, 0xcd,
+	  0xf8, 0x4d, 0x87, 0xc8, 0x99, 0x2d, 0x68, 0xb5,
+	  0xee, 0xb4, 0x60, 0xf9, 0x3e, 0xb3, 0xc8, 0xd7 },
+	{ 0x60, 0x7b, 0xe6, 0x68, 0x62, 0xfd, 0x08, 0xee,
+	  0x5b, 0x19, 0xfa, 0xca, 0xc0, 0x9d, 0xfd, 0xbc,
+	  0xd4, 0x0c, 0x31, 0x21, 0x01, 0xd6, 0x6e, 0x6e,
+	  0xbd, 0x2b, 0x84, 0x1f, 0x1b, 0x9a, 0x93, 0x25 },
+	{ 0x9f, 0xe0, 0x3b, 0xbe, 0x69, 0xab, 0x18, 0x34,
+	  0xf5, 0x21, 0x9b, 0x0d, 0xa8, 0x8a, 0x08, 0xb3,
+	  0x0a, 0x66, 0xc5, 0x91, 0x3f, 0x01, 0x51, 0x96,
+	  0x3c, 0x36, 0x05, 0x60, 0xdb, 0x03, 0x87, 0xb3 },
+	{ 0x90, 0xa8, 0x35, 0x85, 0x71, 0x7b, 0x75, 0xf0,
+	  0xe9, 0xb7, 0x25, 0xe0, 0x55, 0xee, 0xee, 0xb9,
+	  0xe7, 0xa0, 0x28, 0xea, 0x7e, 0x6c, 0xbc, 0x07,
+	  0xb2, 0x09, 0x17, 0xec, 0x03, 0x63, 0xe3, 0x8c },
+	{ 0x33, 0x6e, 0xa0, 0x53, 0x0f, 0x4a, 0x74, 0x69,
+	  0x12, 0x6e, 0x02, 0x18, 0x58, 0x7e, 0xbb, 0xde,
+	  0x33, 0x58, 0xa0, 0xb3, 0x1c, 0x29, 0xd2, 0x00,
+	  0xf7, 0xdc, 0x7e, 0xb1, 0x5c, 0x6a, 0xad, 0xd8 },
+	{ 0xa7, 0x9e, 0x76, 0xdc, 0x0a, 0xbc, 0xa4, 0x39,
+	  0x6f, 0x07, 0x47, 0xcd, 0x7b, 0x74, 0x8d, 0xf9,
+	  0x13, 0x00, 0x76, 0x26, 0xb1, 0xd6, 0x59, 0xda,
+	  0x0c, 0x1f, 0x78, 0xb9, 0x30, 0x3d, 0x01, 0xa3 },
+	{ 0x44, 0xe7, 0x8a, 0x77, 0x37, 0x56, 0xe0, 0x95,
+	  0x15, 0x19, 0x50, 0x4d, 0x70, 0x38, 0xd2, 0x8d,
+	  0x02, 0x13, 0xa3, 0x7e, 0x0c, 0xe3, 0x75, 0x37,
+	  0x17, 0x57, 0xbc, 0x99, 0x63, 0x11, 0xe3, 0xb8 },
+	{ 0x77, 0xac, 0x01, 0x2a, 0x3f, 0x75, 0x4d, 0xcf,
+	  0xea, 0xb5, 0xeb, 0x99, 0x6b, 0xe9, 0xcd, 0x2d,
+	  0x1f, 0x96, 0x11, 0x1b, 0x6e, 0x49, 0xf3, 0x99,
+	  0x4d, 0xf1, 0x81, 0xf2, 0x85, 0x69, 0xd8, 0x25 },
+	{ 0xce, 0x5a, 0x10, 0xdb, 0x6f, 0xcc, 0xda, 0xf1,
+	  0x40, 0xaa, 0xa4, 0xde, 0xd6, 0x25, 0x0a, 0x9c,
+	  0x06, 0xe9, 0x22, 0x2b, 0xc9, 0xf9, 0xf3, 0x65,
+	  0x8a, 0x4a, 0xff, 0x93, 0x5f, 0x2b, 0x9f, 0x3a },
+	{ 0xec, 0xc2, 0x03, 0xa7, 0xfe, 0x2b, 0xe4, 0xab,
+	  0xd5, 0x5b, 0xb5, 0x3e, 0x6e, 0x67, 0x35, 0x72,
+	  0xe0, 0x07, 0x8d, 0xa8, 0xcd, 0x37, 0x5e, 0xf4,
+	  0x30, 0xcc, 0x97, 0xf9, 0xf8, 0x00, 0x83, 0xaf },
+	{ 0x14, 0xa5, 0x18, 0x6d, 0xe9, 0xd7, 0xa1, 0x8b,
+	  0x04, 0x12, 0xb8, 0x56, 0x3e, 0x51, 0xcc, 0x54,
+	  0x33, 0x84, 0x0b, 0x4a, 0x12, 0x9a, 0x8f, 0xf9,
+	  0x63, 0xb3, 0x3a, 0x3c, 0x4a, 0xfe, 0x8e, 0xbb },
+	{ 0x13, 0xf8, 0xef, 0x95, 0xcb, 0x86, 0xe6, 0xa6,
+	  0x38, 0x93, 0x1c, 0x8e, 0x10, 0x76, 0x73, 0xeb,
+	  0x76, 0xba, 0x10, 0xd7, 0xc2, 0xcd, 0x70, 0xb9,
+	  0xd9, 0x92, 0x0b, 0xbe, 0xed, 0x92, 0x94, 0x09 },
+	{ 0x0b, 0x33, 0x8f, 0x4e, 0xe1, 0x2f, 0x2d, 0xfc,
+	  0xb7, 0x87, 0x13, 0x37, 0x79, 0x41, 0xe0, 0xb0,
+	  0x63, 0x21, 0x52, 0x58, 0x1d, 0x13, 0x32, 0x51,
+	  0x6e, 0x4a, 0x2c, 0xab, 0x19, 0x42, 0xcc, 0xa4 },
+	{ 0xea, 0xab, 0x0e, 0xc3, 0x7b, 0x3b, 0x8a, 0xb7,
+	  0x96, 0xe9, 0xf5, 0x72, 0x38, 0xde, 0x14, 0xa2,
+	  0x64, 0xa0, 0x76, 0xf3, 0x88, 0x7d, 0x86, 0xe2,
+	  0x9b, 0xb5, 0x90, 0x6d, 0xb5, 0xa0, 0x0e, 0x02 },
+	{ 0x23, 0xcb, 0x68, 0xb8, 0xc0, 0xe6, 0xdc, 0x26,
+	  0xdc, 0x27, 0x76, 0x6d, 0xdc, 0x0a, 0x13, 0xa9,
+	  0x94, 0x38, 0xfd, 0x55, 0x61, 0x7a, 0xa4, 0x09,
+	  0x5d, 0x8f, 0x96, 0x97, 0x20, 0xc8, 0x72, 0xdf },
+	{ 0x09, 0x1d, 0x8e, 0xe3, 0x0d, 0x6f, 0x29, 0x68,
+	  0xd4, 0x6b, 0x68, 0x7d, 0xd6, 0x52, 0x92, 0x66,
+	  0x57, 0x42, 0xde, 0x0b, 0xb8, 0x3d, 0xcc, 0x00,
+	  0x04, 0xc7, 0x2c, 0xe1, 0x00, 0x07, 0xa5, 0x49 },
+	{ 0x7f, 0x50, 0x7a, 0xbc, 0x6d, 0x19, 0xba, 0x00,
+	  0xc0, 0x65, 0xa8, 0x76, 0xec, 0x56, 0x57, 0x86,
+	  0x88, 0x82, 0xd1, 0x8a, 0x22, 0x1b, 0xc4, 0x6c,
+	  0x7a, 0x69, 0x12, 0x54, 0x1f, 0x5b, 0xc7, 0xba },
+	{ 0xa0, 0x60, 0x7c, 0x24, 0xe1, 0x4e, 0x8c, 0x22,
+	  0x3d, 0xb0, 0xd7, 0x0b, 0x4d, 0x30, 0xee, 0x88,
+	  0x01, 0x4d, 0x60, 0x3f, 0x43, 0x7e, 0x9e, 0x02,
+	  0xaa, 0x7d, 0xaf, 0xa3, 0xcd, 0xfb, 0xad, 0x94 },
+	{ 0xdd, 0xbf, 0xea, 0x75, 0xcc, 0x46, 0x78, 0x82,
+	  0xeb, 0x34, 0x83, 0xce, 0x5e, 0x2e, 0x75, 0x6a,
+	  0x4f, 0x47, 0x01, 0xb7, 0x6b, 0x44, 0x55, 0x19,
+	  0xe8, 0x9f, 0x22, 0xd6, 0x0f, 0xa8, 0x6e, 0x06 },
+	{ 0x0c, 0x31, 0x1f, 0x38, 0xc3, 0x5a, 0x4f, 0xb9,
+	  0x0d, 0x65, 0x1c, 0x28, 0x9d, 0x48, 0x68, 0x56,
+	  0xcd, 0x14, 0x13, 0xdf, 0x9b, 0x06, 0x77, 0xf5,
+	  0x3e, 0xce, 0x2c, 0xd9, 0xe4, 0x77, 0xc6, 0x0a },
+	{ 0x46, 0xa7, 0x3a, 0x8d, 0xd3, 0xe7, 0x0f, 0x59,
+	  0xd3, 0x94, 0x2c, 0x01, 0xdf, 0x59, 0x9d, 0xef,
+	  0x78, 0x3c, 0x9d, 0xa8, 0x2f, 0xd8, 0x32, 0x22,
+	  0xcd, 0x66, 0x2b, 0x53, 0xdc, 0xe7, 0xdb, 0xdf },
+	{ 0xad, 0x03, 0x8f, 0xf9, 0xb1, 0x4d, 0xe8, 0x4a,
+	  0x80, 0x1e, 0x4e, 0x62, 0x1c, 0xe5, 0xdf, 0x02,
+	  0x9d, 0xd9, 0x35, 0x20, 0xd0, 0xc2, 0xfa, 0x38,
+	  0xbf, 0xf1, 0x76, 0xa8, 0xb1, 0xd1, 0x69, 0x8c },
+	{ 0xab, 0x70, 0xc5, 0xdf, 0xbd, 0x1e, 0xa8, 0x17,
+	  0xfe, 0xd0, 0xcd, 0x06, 0x72, 0x93, 0xab, 0xf3,
+	  0x19, 0xe5, 0xd7, 0x90, 0x1c, 0x21, 0x41, 0xd5,
+	  0xd9, 0x9b, 0x23, 0xf0, 0x3a, 0x38, 0xe7, 0x48 },
+	{ 0x1f, 0xff, 0xda, 0x67, 0x93, 0x2b, 0x73, 0xc8,
+	  0xec, 0xaf, 0x00, 0x9a, 0x34, 0x91, 0xa0, 0x26,
+	  0x95, 0x3b, 0xab, 0xfe, 0x1f, 0x66, 0x3b, 0x06,
+	  0x97, 0xc3, 0xc4, 0xae, 0x8b, 0x2e, 0x7d, 0xcb },
+	{ 0xb0, 0xd2, 0xcc, 0x19, 0x47, 0x2d, 0xd5, 0x7f,
+	  0x2b, 0x17, 0xef, 0xc0, 0x3c, 0x8d, 0x58, 0xc2,
+	  0x28, 0x3d, 0xbb, 0x19, 0xda, 0x57, 0x2f, 0x77,
+	  0x55, 0x85, 0x5a, 0xa9, 0x79, 0x43, 0x17, 0xa0 },
+	{ 0xa0, 0xd1, 0x9a, 0x6e, 0xe3, 0x39, 0x79, 0xc3,
+	  0x25, 0x51, 0x0e, 0x27, 0x66, 0x22, 0xdf, 0x41,
+	  0xf7, 0x15, 0x83, 0xd0, 0x75, 0x01, 0xb8, 0x70,
+	  0x71, 0x12, 0x9a, 0x0a, 0xd9, 0x47, 0x32, 0xa5 },
+	{ 0x72, 0x46, 0x42, 0xa7, 0x03, 0x2d, 0x10, 0x62,
+	  0xb8, 0x9e, 0x52, 0xbe, 0xa3, 0x4b, 0x75, 0xdf,
+	  0x7d, 0x8f, 0xe7, 0x72, 0xd9, 0xfe, 0x3c, 0x93,
+	  0xdd, 0xf3, 0xc4, 0x54, 0x5a, 0xb5, 0xa9, 0x9b },
+	{ 0xad, 0xe5, 0xea, 0xa7, 0xe6, 0x1f, 0x67, 0x2d,
+	  0x58, 0x7e, 0xa0, 0x3d, 0xae, 0x7d, 0x7b, 0x55,
+	  0x22, 0x9c, 0x01, 0xd0, 0x6b, 0xc0, 0xa5, 0x70,
+	  0x14, 0x36, 0xcb, 0xd1, 0x83, 0x66, 0xa6, 0x26 },
+	{ 0x01, 0x3b, 0x31, 0xeb, 0xd2, 0x28, 0xfc, 0xdd,
+	  0xa5, 0x1f, 0xab, 0xb0, 0x3b, 0xb0, 0x2d, 0x60,
+	  0xac, 0x20, 0xca, 0x21, 0x5a, 0xaf, 0xa8, 0x3b,
+	  0xdd, 0x85, 0x5e, 0x37, 0x55, 0xa3, 0x5f, 0x0b },
+	{ 0x33, 0x2e, 0xd4, 0x0b, 0xb1, 0x0d, 0xde, 0x3c,
+	  0x95, 0x4a, 0x75, 0xd7, 0xb8, 0x99, 0x9d, 0x4b,
+	  0x26, 0xa1, 0xc0, 0x63, 0xc1, 0xdc, 0x6e, 0x32,
+	  0xc1, 0xd9, 0x1b, 0xab, 0x7b, 0xbb, 0x7d, 0x16 },
+	{ 0xc7, 0xa1, 0x97, 0xb3, 0xa0, 0x5b, 0x56, 0x6b,
+	  0xcc, 0x9f, 0xac, 0xd2, 0x0e, 0x44, 0x1d, 0x6f,
+	  0x6c, 0x28, 0x60, 0xac, 0x96, 0x51, 0xcd, 0x51,
+	  0xd6, 0xb9, 0xd2, 0xcd, 0xee, 0xea, 0x03, 0x90 },
+	{ 0xbd, 0x9c, 0xf6, 0x4e, 0xa8, 0x95, 0x3c, 0x03,
+	  0x71, 0x08, 0xe6, 0xf6, 0x54, 0x91, 0x4f, 0x39,
+	  0x58, 0xb6, 0x8e, 0x29, 0xc1, 0x67, 0x00, 0xdc,
+	  0x18, 0x4d, 0x94, 0xa2, 0x17, 0x08, 0xff, 0x60 },
+	{ 0x88, 0x35, 0xb0, 0xac, 0x02, 0x11, 0x51, 0xdf,
+	  0x71, 0x64, 0x74, 0xce, 0x27, 0xce, 0x4d, 0x3c,
+	  0x15, 0xf0, 0xb2, 0xda, 0xb4, 0x80, 0x03, 0xcf,
+	  0x3f, 0x3e, 0xfd, 0x09, 0x45, 0x10, 0x6b, 0x9a },
+	{ 0x3b, 0xfe, 0xfa, 0x33, 0x01, 0xaa, 0x55, 0xc0,
+	  0x80, 0x19, 0x0c, 0xff, 0xda, 0x8e, 0xae, 0x51,
+	  0xd9, 0xaf, 0x48, 0x8b, 0x4c, 0x1f, 0x24, 0xc3,
+	  0xd9, 0xa7, 0x52, 0x42, 0xfd, 0x8e, 0xa0, 0x1d },
+	{ 0x08, 0x28, 0x4d, 0x14, 0x99, 0x3c, 0xd4, 0x7d,
+	  0x53, 0xeb, 0xae, 0xcf, 0x0d, 0xf0, 0x47, 0x8c,
+	  0xc1, 0x82, 0xc8, 0x9c, 0x00, 0xe1, 0x85, 0x9c,
+	  0x84, 0x85, 0x16, 0x86, 0xdd, 0xf2, 0xc1, 0xb7 },
+	{ 0x1e, 0xd7, 0xef, 0x9f, 0x04, 0xc2, 0xac, 0x8d,
+	  0xb6, 0xa8, 0x64, 0xdb, 0x13, 0x10, 0x87, 0xf2,
+	  0x70, 0x65, 0x09, 0x8e, 0x69, 0xc3, 0xfe, 0x78,
+	  0x71, 0x8d, 0x9b, 0x94, 0x7f, 0x4a, 0x39, 0xd0 },
+	{ 0xc1, 0x61, 0xf2, 0xdc, 0xd5, 0x7e, 0x9c, 0x14,
+	  0x39, 0xb3, 0x1a, 0x9d, 0xd4, 0x3d, 0x8f, 0x3d,
+	  0x7d, 0xd8, 0xf0, 0xeb, 0x7c, 0xfa, 0xc6, 0xfb,
+	  0x25, 0xa0, 0xf2, 0x8e, 0x30, 0x6f, 0x06, 0x61 },
+	{ 0xc0, 0x19, 0x69, 0xad, 0x34, 0xc5, 0x2c, 0xaf,
+	  0x3d, 0xc4, 0xd8, 0x0d, 0x19, 0x73, 0x5c, 0x29,
+	  0x73, 0x1a, 0xc6, 0xe7, 0xa9, 0x20, 0x85, 0xab,
+	  0x92, 0x50, 0xc4, 0x8d, 0xea, 0x48, 0xa3, 0xfc },
+	{ 0x17, 0x20, 0xb3, 0x65, 0x56, 0x19, 0xd2, 0xa5,
+	  0x2b, 0x35, 0x21, 0xae, 0x0e, 0x49, 0xe3, 0x45,
+	  0xcb, 0x33, 0x89, 0xeb, 0xd6, 0x20, 0x8a, 0xca,
+	  0xf9, 0xf1, 0x3f, 0xda, 0xcc, 0xa8, 0xbe, 0x49 },
+	{ 0x75, 0x62, 0x88, 0x36, 0x1c, 0x83, 0xe2, 0x4c,
+	  0x61, 0x7c, 0xf9, 0x5c, 0x90, 0x5b, 0x22, 0xd0,
+	  0x17, 0xcd, 0xc8, 0x6f, 0x0b, 0xf1, 0xd6, 0x58,
+	  0xf4, 0x75, 0x6c, 0x73, 0x79, 0x87, 0x3b, 0x7f },
+	{ 0xe7, 0xd0, 0xed, 0xa3, 0x45, 0x26, 0x93, 0xb7,
+	  0x52, 0xab, 0xcd, 0xa1, 0xb5, 0x5e, 0x27, 0x6f,
+	  0x82, 0x69, 0x8f, 0x5f, 0x16, 0x05, 0x40, 0x3e,
+	  0xff, 0x83, 0x0b, 0xea, 0x00, 0x71, 0xa3, 0x94 },
+	{ 0x2c, 0x82, 0xec, 0xaa, 0x6b, 0x84, 0x80, 0x3e,
+	  0x04, 0x4a, 0xf6, 0x31, 0x18, 0xaf, 0xe5, 0x44,
+	  0x68, 0x7c, 0xb6, 0xe6, 0xc7, 0xdf, 0x49, 0xed,
+	  0x76, 0x2d, 0xfd, 0x7c, 0x86, 0x93, 0xa1, 0xbc },
+	{ 0x61, 0x36, 0xcb, 0xf4, 0xb4, 0x41, 0x05, 0x6f,
+	  0xa1, 0xe2, 0x72, 0x24, 0x98, 0x12, 0x5d, 0x6d,
+	  0xed, 0x45, 0xe1, 0x7b, 0x52, 0x14, 0x39, 0x59,
+	  0xc7, 0xf4, 0xd4, 0xe3, 0x95, 0x21, 0x8a, 0xc2 },
+	{ 0x72, 0x1d, 0x32, 0x45, 0xaa, 0xfe, 0xf2, 0x7f,
+	  0x6a, 0x62, 0x4f, 0x47, 0x95, 0x4b, 0x6c, 0x25,
+	  0x50, 0x79, 0x52, 0x6f, 0xfa, 0x25, 0xe9, 0xff,
+	  0x77, 0xe5, 0xdc, 0xff, 0x47, 0x3b, 0x15, 0x97 },
+	{ 0x9d, 0xd2, 0xfb, 0xd8, 0xce, 0xf1, 0x6c, 0x35,
+	  0x3c, 0x0a, 0xc2, 0x11, 0x91, 0xd5, 0x09, 0xeb,
+	  0x28, 0xdd, 0x9e, 0x3e, 0x0d, 0x8c, 0xea, 0x5d,
+	  0x26, 0xca, 0x83, 0x93, 0x93, 0x85, 0x1c, 0x3a },
+	{ 0xb2, 0x39, 0x4c, 0xea, 0xcd, 0xeb, 0xf2, 0x1b,
+	  0xf9, 0xdf, 0x2c, 0xed, 0x98, 0xe5, 0x8f, 0x1c,
+	  0x3a, 0x4b, 0xbb, 0xff, 0x66, 0x0d, 0xd9, 0x00,
+	  0xf6, 0x22, 0x02, 0xd6, 0x78, 0x5c, 0xc4, 0x6e },
+	{ 0x57, 0x08, 0x9f, 0x22, 0x27, 0x49, 0xad, 0x78,
+	  0x71, 0x76, 0x5f, 0x06, 0x2b, 0x11, 0x4f, 0x43,
+	  0xba, 0x20, 0xec, 0x56, 0x42, 0x2a, 0x8b, 0x1e,
+	  0x3f, 0x87, 0x19, 0x2c, 0x0e, 0xa7, 0x18, 0xc6 },
+	{ 0xe4, 0x9a, 0x94, 0x59, 0x96, 0x1c, 0xd3, 0x3c,
+	  0xdf, 0x4a, 0xae, 0x1b, 0x10, 0x78, 0xa5, 0xde,
+	  0xa7, 0xc0, 0x40, 0xe0, 0xfe, 0xa3, 0x40, 0xc9,
+	  0x3a, 0x72, 0x48, 0x72, 0xfc, 0x4a, 0xf8, 0x06 },
+	{ 0xed, 0xe6, 0x7f, 0x72, 0x0e, 0xff, 0xd2, 0xca,
+	  0x9c, 0x88, 0x99, 0x41, 0x52, 0xd0, 0x20, 0x1d,
+	  0xee, 0x6b, 0x0a, 0x2d, 0x2c, 0x07, 0x7a, 0xca,
+	  0x6d, 0xae, 0x29, 0xf7, 0x3f, 0x8b, 0x63, 0x09 },
+	{ 0xe0, 0xf4, 0x34, 0xbf, 0x22, 0xe3, 0x08, 0x80,
+	  0x39, 0xc2, 0x1f, 0x71, 0x9f, 0xfc, 0x67, 0xf0,
+	  0xf2, 0xcb, 0x5e, 0x98, 0xa7, 0xa0, 0x19, 0x4c,
+	  0x76, 0xe9, 0x6b, 0xf4, 0xe8, 0xe1, 0x7e, 0x61 },
+	{ 0x27, 0x7c, 0x04, 0xe2, 0x85, 0x34, 0x84, 0xa4,
+	  0xeb, 0xa9, 0x10, 0xad, 0x33, 0x6d, 0x01, 0xb4,
+	  0x77, 0xb6, 0x7c, 0xc2, 0x00, 0xc5, 0x9f, 0x3c,
+	  0x8d, 0x77, 0xee, 0xf8, 0x49, 0x4f, 0x29, 0xcd },
+	{ 0x15, 0x6d, 0x57, 0x47, 0xd0, 0xc9, 0x9c, 0x7f,
+	  0x27, 0x09, 0x7d, 0x7b, 0x7e, 0x00, 0x2b, 0x2e,
+	  0x18, 0x5c, 0xb7, 0x2d, 0x8d, 0xd7, 0xeb, 0x42,
+	  0x4a, 0x03, 0x21, 0x52, 0x81, 0x61, 0x21, 0x9f },
+	{ 0x20, 0xdd, 0xd1, 0xed, 0x9b, 0x1c, 0xa8, 0x03,
+	  0x94, 0x6d, 0x64, 0xa8, 0x3a, 0xe4, 0x65, 0x9d,
+	  0xa6, 0x7f, 0xba, 0x7a, 0x1a, 0x3e, 0xdd, 0xb1,
+	  0xe1, 0x03, 0xc0, 0xf5, 0xe0, 0x3e, 0x3a, 0x2c },
+	{ 0xf0, 0xaf, 0x60, 0x4d, 0x3d, 0xab, 0xbf, 0x9a,
+	  0x0f, 0x2a, 0x7d, 0x3d, 0xda, 0x6b, 0xd3, 0x8b,
+	  0xba, 0x72, 0xc6, 0xd0, 0x9b, 0xe4, 0x94, 0xfc,
+	  0xef, 0x71, 0x3f, 0xf1, 0x01, 0x89, 0xb6, 0xe6 },
+	{ 0x98, 0x02, 0xbb, 0x87, 0xde, 0xf4, 0xcc, 0x10,
+	  0xc4, 0xa5, 0xfd, 0x49, 0xaa, 0x58, 0xdf, 0xe2,
+	  0xf3, 0xfd, 0xdb, 0x46, 0xb4, 0x70, 0x88, 0x14,
+	  0xea, 0xd8, 0x1d, 0x23, 0xba, 0x95, 0x13, 0x9b },
+	{ 0x4f, 0x8c, 0xe1, 0xe5, 0x1d, 0x2f, 0xe7, 0xf2,
+	  0x40, 0x43, 0xa9, 0x04, 0xd8, 0x98, 0xeb, 0xfc,
+	  0x91, 0x97, 0x54, 0x18, 0x75, 0x34, 0x13, 0xaa,
+	  0x09, 0x9b, 0x79, 0x5e, 0xcb, 0x35, 0xce, 0xdb },
+	{ 0xbd, 0xdc, 0x65, 0x14, 0xd7, 0xee, 0x6a, 0xce,
+	  0x0a, 0x4a, 0xc1, 0xd0, 0xe0, 0x68, 0x11, 0x22,
+	  0x88, 0xcb, 0xcf, 0x56, 0x04, 0x54, 0x64, 0x27,
+	  0x05, 0x63, 0x01, 0x77, 0xcb, 0xa6, 0x08, 0xbd },
+	{ 0xd6, 0x35, 0x99, 0x4f, 0x62, 0x91, 0x51, 0x7b,
+	  0x02, 0x81, 0xff, 0xdd, 0x49, 0x6a, 0xfa, 0x86,
+	  0x27, 0x12, 0xe5, 0xb3, 0xc4, 0xe5, 0x2e, 0x4c,
+	  0xd5, 0xfd, 0xae, 0x8c, 0x0e, 0x72, 0xfb, 0x08 },
+	{ 0x87, 0x8d, 0x9c, 0xa6, 0x00, 0xcf, 0x87, 0xe7,
+	  0x69, 0xcc, 0x30, 0x5c, 0x1b, 0x35, 0x25, 0x51,
+	  0x86, 0x61, 0x5a, 0x73, 0xa0, 0xda, 0x61, 0x3b,
+	  0x5f, 0x1c, 0x98, 0xdb, 0xf8, 0x12, 0x83, 0xea },
+	{ 0xa6, 0x4e, 0xbe, 0x5d, 0xc1, 0x85, 0xde, 0x9f,
+	  0xdd, 0xe7, 0x60, 0x7b, 0x69, 0x98, 0x70, 0x2e,
+	  0xb2, 0x34, 0x56, 0x18, 0x49, 0x57, 0x30, 0x7d,
+	  0x2f, 0xa7, 0x2e, 0x87, 0xa4, 0x77, 0x02, 0xd6 },
+	{ 0xce, 0x50, 0xea, 0xb7, 0xb5, 0xeb, 0x52, 0xbd,
+	  0xc9, 0xad, 0x8e, 0x5a, 0x48, 0x0a, 0xb7, 0x80,
+	  0xca, 0x93, 0x20, 0xe4, 0x43, 0x60, 0xb1, 0xfe,
+	  0x37, 0xe0, 0x3f, 0x2f, 0x7a, 0xd7, 0xde, 0x01 },
+	{ 0xee, 0xdd, 0xb7, 0xc0, 0xdb, 0x6e, 0x30, 0xab,
+	  0xe6, 0x6d, 0x79, 0xe3, 0x27, 0x51, 0x1e, 0x61,
+	  0xfc, 0xeb, 0xbc, 0x29, 0xf1, 0x59, 0xb4, 0x0a,
+	  0x86, 0xb0, 0x46, 0xec, 0xf0, 0x51, 0x38, 0x23 },
+	{ 0x78, 0x7f, 0xc9, 0x34, 0x40, 0xc1, 0xec, 0x96,
+	  0xb5, 0xad, 0x01, 0xc1, 0x6c, 0xf7, 0x79, 0x16,
+	  0xa1, 0x40, 0x5f, 0x94, 0x26, 0x35, 0x6e, 0xc9,
+	  0x21, 0xd8, 0xdf, 0xf3, 0xea, 0x63, 0xb7, 0xe0 },
+	{ 0x7f, 0x0d, 0x5e, 0xab, 0x47, 0xee, 0xfd, 0xa6,
+	  0x96, 0xc0, 0xbf, 0x0f, 0xbf, 0x86, 0xab, 0x21,
+	  0x6f, 0xce, 0x46, 0x1e, 0x93, 0x03, 0xab, 0xa6,
+	  0xac, 0x37, 0x41, 0x20, 0xe8, 0x90, 0xe8, 0xdf },
+	{ 0xb6, 0x80, 0x04, 0xb4, 0x2f, 0x14, 0xad, 0x02,
+	  0x9f, 0x4c, 0x2e, 0x03, 0xb1, 0xd5, 0xeb, 0x76,
+	  0xd5, 0x71, 0x60, 0xe2, 0x64, 0x76, 0xd2, 0x11,
+	  0x31, 0xbe, 0xf2, 0x0a, 0xda, 0x7d, 0x27, 0xf4 },
+	{ 0xb0, 0xc4, 0xeb, 0x18, 0xae, 0x25, 0x0b, 0x51,
+	  0xa4, 0x13, 0x82, 0xea, 0xd9, 0x2d, 0x0d, 0xc7,
+	  0x45, 0x5f, 0x93, 0x79, 0xfc, 0x98, 0x84, 0x42,
+	  0x8e, 0x47, 0x70, 0x60, 0x8d, 0xb0, 0xfa, 0xec },
+	{ 0xf9, 0x2b, 0x7a, 0x87, 0x0c, 0x05, 0x9f, 0x4d,
+	  0x46, 0x46, 0x4c, 0x82, 0x4e, 0xc9, 0x63, 0x55,
+	  0x14, 0x0b, 0xdc, 0xe6, 0x81, 0x32, 0x2c, 0xc3,
+	  0xa9, 0x92, 0xff, 0x10, 0x3e, 0x3f, 0xea, 0x52 },
+	{ 0x53, 0x64, 0x31, 0x26, 0x14, 0x81, 0x33, 0x98,
+	  0xcc, 0x52, 0x5d, 0x4c, 0x4e, 0x14, 0x6e, 0xde,
+	  0xb3, 0x71, 0x26, 0x5f, 0xba, 0x19, 0x13, 0x3a,
+	  0x2c, 0x3d, 0x21, 0x59, 0x29, 0x8a, 0x17, 0x42 },
+	{ 0xf6, 0x62, 0x0e, 0x68, 0xd3, 0x7f, 0xb2, 0xaf,
+	  0x50, 0x00, 0xfc, 0x28, 0xe2, 0x3b, 0x83, 0x22,
+	  0x97, 0xec, 0xd8, 0xbc, 0xe9, 0x9e, 0x8b, 0xe4,
+	  0xd0, 0x4e, 0x85, 0x30, 0x9e, 0x3d, 0x33, 0x74 },
+	{ 0x53, 0x16, 0xa2, 0x79, 0x69, 0xd7, 0xfe, 0x04,
+	  0xff, 0x27, 0xb2, 0x83, 0x96, 0x1b, 0xff, 0xc3,
+	  0xbf, 0x5d, 0xfb, 0x32, 0xfb, 0x6a, 0x89, 0xd1,
+	  0x01, 0xc6, 0xc3, 0xb1, 0x93, 0x7c, 0x28, 0x71 },
+	{ 0x81, 0xd1, 0x66, 0x4f, 0xdf, 0x3c, 0xb3, 0x3c,
+	  0x24, 0xee, 0xba, 0xc0, 0xbd, 0x64, 0x24, 0x4b,
+	  0x77, 0xc4, 0xab, 0xea, 0x90, 0xbb, 0xe8, 0xb5,
+	  0xee, 0x0b, 0x2a, 0xaf, 0xcf, 0x2d, 0x6a, 0x53 },
+	{ 0x34, 0x57, 0x82, 0xf2, 0x95, 0xb0, 0x88, 0x03,
+	  0x52, 0xe9, 0x24, 0xa0, 0x46, 0x7b, 0x5f, 0xbc,
+	  0x3e, 0x8f, 0x3b, 0xfb, 0xc3, 0xc7, 0xe4, 0x8b,
+	  0x67, 0x09, 0x1f, 0xb5, 0xe8, 0x0a, 0x94, 0x42 },
+	{ 0x79, 0x41, 0x11, 0xea, 0x6c, 0xd6, 0x5e, 0x31,
+	  0x1f, 0x74, 0xee, 0x41, 0xd4, 0x76, 0xcb, 0x63,
+	  0x2c, 0xe1, 0xe4, 0xb0, 0x51, 0xdc, 0x1d, 0x9e,
+	  0x9d, 0x06, 0x1a, 0x19, 0xe1, 0xd0, 0xbb, 0x49 },
+	{ 0x2a, 0x85, 0xda, 0xf6, 0x13, 0x88, 0x16, 0xb9,
+	  0x9b, 0xf8, 0xd0, 0x8b, 0xa2, 0x11, 0x4b, 0x7a,
+	  0xb0, 0x79, 0x75, 0xa7, 0x84, 0x20, 0xc1, 0xa3,
+	  0xb0, 0x6a, 0x77, 0x7c, 0x22, 0xdd, 0x8b, 0xcb },
+	{ 0x89, 0xb0, 0xd5, 0xf2, 0x89, 0xec, 0x16, 0x40,
+	  0x1a, 0x06, 0x9a, 0x96, 0x0d, 0x0b, 0x09, 0x3e,
+	  0x62, 0x5d, 0xa3, 0xcf, 0x41, 0xee, 0x29, 0xb5,
+	  0x9b, 0x93, 0x0c, 0x58, 0x20, 0x14, 0x54, 0x55 },
+	{ 0xd0, 0xfd, 0xcb, 0x54, 0x39, 0x43, 0xfc, 0x27,
+	  0xd2, 0x08, 0x64, 0xf5, 0x21, 0x81, 0x47, 0x1b,
+	  0x94, 0x2c, 0xc7, 0x7c, 0xa6, 0x75, 0xbc, 0xb3,
+	  0x0d, 0xf3, 0x1d, 0x35, 0x8e, 0xf7, 0xb1, 0xeb },
+	{ 0xb1, 0x7e, 0xa8, 0xd7, 0x70, 0x63, 0xc7, 0x09,
+	  0xd4, 0xdc, 0x6b, 0x87, 0x94, 0x13, 0xc3, 0x43,
+	  0xe3, 0x79, 0x0e, 0x9e, 0x62, 0xca, 0x85, 0xb7,
+	  0x90, 0x0b, 0x08, 0x6f, 0x6b, 0x75, 0xc6, 0x72 },
+	{ 0xe7, 0x1a, 0x3e, 0x2c, 0x27, 0x4d, 0xb8, 0x42,
+	  0xd9, 0x21, 0x14, 0xf2, 0x17, 0xe2, 0xc0, 0xea,
+	  0xc8, 0xb4, 0x50, 0x93, 0xfd, 0xfd, 0x9d, 0xf4,
+	  0xca, 0x71, 0x62, 0x39, 0x48, 0x62, 0xd5, 0x01 },
+	{ 0xc0, 0x47, 0x67, 0x59, 0xab, 0x7a, 0xa3, 0x33,
+	  0x23, 0x4f, 0x6b, 0x44, 0xf5, 0xfd, 0x85, 0x83,
+	  0x90, 0xec, 0x23, 0x69, 0x4c, 0x62, 0x2c, 0xb9,
+	  0x86, 0xe7, 0x69, 0xc7, 0x8e, 0xdd, 0x73, 0x3e },
+	{ 0x9a, 0xb8, 0xea, 0xbb, 0x14, 0x16, 0x43, 0x4d,
+	  0x85, 0x39, 0x13, 0x41, 0xd5, 0x69, 0x93, 0xc5,
+	  0x54, 0x58, 0x16, 0x7d, 0x44, 0x18, 0xb1, 0x9a,
+	  0x0f, 0x2a, 0xd8, 0xb7, 0x9a, 0x83, 0xa7, 0x5b },
+	{ 0x79, 0x92, 0xd0, 0xbb, 0xb1, 0x5e, 0x23, 0x82,
+	  0x6f, 0x44, 0x3e, 0x00, 0x50, 0x5d, 0x68, 0xd3,
+	  0xed, 0x73, 0x72, 0x99, 0x5a, 0x5c, 0x3e, 0x49,
+	  0x86, 0x54, 0x10, 0x2f, 0xbc, 0xd0, 0x96, 0x4e },
+	{ 0xc0, 0x21, 0xb3, 0x00, 0x85, 0x15, 0x14, 0x35,
+	  0xdf, 0x33, 0xb0, 0x07, 0xcc, 0xec, 0xc6, 0x9d,
+	  0xf1, 0x26, 0x9f, 0x39, 0xba, 0x25, 0x09, 0x2b,
+	  0xed, 0x59, 0xd9, 0x32, 0xac, 0x0f, 0xdc, 0x28 },
+	{ 0x91, 0xa2, 0x5e, 0xc0, 0xec, 0x0d, 0x9a, 0x56,
+	  0x7f, 0x89, 0xc4, 0xbf, 0xe1, 0xa6, 0x5a, 0x0e,
+	  0x43, 0x2d, 0x07, 0x06, 0x4b, 0x41, 0x90, 0xe2,
+	  0x7d, 0xfb, 0x81, 0x90, 0x1f, 0xd3, 0x13, 0x9b },
+	{ 0x59, 0x50, 0xd3, 0x9a, 0x23, 0xe1, 0x54, 0x5f,
+	  0x30, 0x12, 0x70, 0xaa, 0x1a, 0x12, 0xf2, 0xe6,
+	  0xc4, 0x53, 0x77, 0x6e, 0x4d, 0x63, 0x55, 0xde,
+	  0x42, 0x5c, 0xc1, 0x53, 0xf9, 0x81, 0x88, 0x67 },
+	{ 0xd7, 0x9f, 0x14, 0x72, 0x0c, 0x61, 0x0a, 0xf1,
+	  0x79, 0xa3, 0x76, 0x5d, 0x4b, 0x7c, 0x09, 0x68,
+	  0xf9, 0x77, 0x96, 0x2d, 0xbf, 0x65, 0x5b, 0x52,
+	  0x12, 0x72, 0xb6, 0xf1, 0xe1, 0x94, 0x48, 0x8e },
+	{ 0xe9, 0x53, 0x1b, 0xfc, 0x8b, 0x02, 0x99, 0x5a,
+	  0xea, 0xa7, 0x5b, 0xa2, 0x70, 0x31, 0xfa, 0xdb,
+	  0xcb, 0xf4, 0xa0, 0xda, 0xb8, 0x96, 0x1d, 0x92,
+	  0x96, 0xcd, 0x7e, 0x84, 0xd2, 0x5d, 0x60, 0x06 },
+	{ 0x34, 0xe9, 0xc2, 0x6a, 0x01, 0xd7, 0xf1, 0x61,
+	  0x81, 0xb4, 0x54, 0xa9, 0xd1, 0x62, 0x3c, 0x23,
+	  0x3c, 0xb9, 0x9d, 0x31, 0xc6, 0x94, 0x65, 0x6e,
+	  0x94, 0x13, 0xac, 0xa3, 0xe9, 0x18, 0x69, 0x2f },
+	{ 0xd9, 0xd7, 0x42, 0x2f, 0x43, 0x7b, 0xd4, 0x39,
+	  0xdd, 0xd4, 0xd8, 0x83, 0xda, 0xe2, 0xa0, 0x83,
+	  0x50, 0x17, 0x34, 0x14, 0xbe, 0x78, 0x15, 0x51,
+	  0x33, 0xff, 0xf1, 0x96, 0x4c, 0x3d, 0x79, 0x72 },
+	{ 0x4a, 0xee, 0x0c, 0x7a, 0xaf, 0x07, 0x54, 0x14,
+	  0xff, 0x17, 0x93, 0xea, 0xd7, 0xea, 0xca, 0x60,
+	  0x17, 0x75, 0xc6, 0x15, 0xdb, 0xd6, 0x0b, 0x64,
+	  0x0b, 0x0a, 0x9f, 0x0c, 0xe5, 0x05, 0xd4, 0x35 },
+	{ 0x6b, 0xfd, 0xd1, 0x54, 0x59, 0xc8, 0x3b, 0x99,
+	  0xf0, 0x96, 0xbf, 0xb4, 0x9e, 0xe8, 0x7b, 0x06,
+	  0x3d, 0x69, 0xc1, 0x97, 0x4c, 0x69, 0x28, 0xac,
+	  0xfc, 0xfb, 0x40, 0x99, 0xf8, 0xc4, 0xef, 0x67 },
+	{ 0x9f, 0xd1, 0xc4, 0x08, 0xfd, 0x75, 0xc3, 0x36,
+	  0x19, 0x3a, 0x2a, 0x14, 0xd9, 0x4f, 0x6a, 0xf5,
+	  0xad, 0xf0, 0x50, 0xb8, 0x03, 0x87, 0xb4, 0xb0,
+	  0x10, 0xfb, 0x29, 0xf4, 0xcc, 0x72, 0x70, 0x7c },
+	{ 0x13, 0xc8, 0x84, 0x80, 0xa5, 0xd0, 0x0d, 0x6c,
+	  0x8c, 0x7a, 0xd2, 0x11, 0x0d, 0x76, 0xa8, 0x2d,
+	  0x9b, 0x70, 0xf4, 0xfa, 0x66, 0x96, 0xd4, 0xe5,
+	  0xdd, 0x42, 0xa0, 0x66, 0xdc, 0xaf, 0x99, 0x20 },
+	{ 0x82, 0x0e, 0x72, 0x5e, 0xe2, 0x5f, 0xe8, 0xfd,
+	  0x3a, 0x8d, 0x5a, 0xbe, 0x4c, 0x46, 0xc3, 0xba,
+	  0x88, 0x9d, 0xe6, 0xfa, 0x91, 0x91, 0xaa, 0x22,
+	  0xba, 0x67, 0xd5, 0x70, 0x54, 0x21, 0x54, 0x2b },
+	{ 0x32, 0xd9, 0x3a, 0x0e, 0xb0, 0x2f, 0x42, 0xfb,
+	  0xbc, 0xaf, 0x2b, 0xad, 0x00, 0x85, 0xb2, 0x82,
+	  0xe4, 0x60, 0x46, 0xa4, 0xdf, 0x7a, 0xd1, 0x06,
+	  0x57, 0xc9, 0xd6, 0x47, 0x63, 0x75, 0xb9, 0x3e },
+	{ 0xad, 0xc5, 0x18, 0x79, 0x05, 0xb1, 0x66, 0x9c,
+	  0xd8, 0xec, 0x9c, 0x72, 0x1e, 0x19, 0x53, 0x78,
+	  0x6b, 0x9d, 0x89, 0xa9, 0xba, 0xe3, 0x07, 0x80,
+	  0xf1, 0xe1, 0xea, 0xb2, 0x4a, 0x00, 0x52, 0x3c },
+	{ 0xe9, 0x07, 0x56, 0xff, 0x7f, 0x9a, 0xd8, 0x10,
+	  0xb2, 0x39, 0xa1, 0x0c, 0xed, 0x2c, 0xf9, 0xb2,
+	  0x28, 0x43, 0x54, 0xc1, 0xf8, 0xc7, 0xe0, 0xac,
+	  0xcc, 0x24, 0x61, 0xdc, 0x79, 0x6d, 0x6e, 0x89 },
+	{ 0x12, 0x51, 0xf7, 0x6e, 0x56, 0x97, 0x84, 0x81,
+	  0x87, 0x53, 0x59, 0x80, 0x1d, 0xb5, 0x89, 0xa0,
+	  0xb2, 0x2f, 0x86, 0xd8, 0xd6, 0x34, 0xdc, 0x04,
+	  0x50, 0x6f, 0x32, 0x2e, 0xd7, 0x8f, 0x17, 0xe8 },
+	{ 0x3a, 0xfa, 0x89, 0x9f, 0xd9, 0x80, 0xe7, 0x3e,
+	  0xcb, 0x7f, 0x4d, 0x8b, 0x8f, 0x29, 0x1d, 0xc9,
+	  0xaf, 0x79, 0x6b, 0xc6, 0x5d, 0x27, 0xf9, 0x74,
+	  0xc6, 0xf1, 0x93, 0xc9, 0x19, 0x1a, 0x09, 0xfd },
+	{ 0xaa, 0x30, 0x5b, 0xe2, 0x6e, 0x5d, 0xed, 0xdc,
+	  0x3c, 0x10, 0x10, 0xcb, 0xc2, 0x13, 0xf9, 0x5f,
+	  0x05, 0x1c, 0x78, 0x5c, 0x5b, 0x43, 0x1e, 0x6a,
+	  0x7c, 0xd0, 0x48, 0xf1, 0x61, 0x78, 0x75, 0x28 },
+	{ 0x8e, 0xa1, 0x88, 0x4f, 0xf3, 0x2e, 0x9d, 0x10,
+	  0xf0, 0x39, 0xb4, 0x07, 0xd0, 0xd4, 0x4e, 0x7e,
+	  0x67, 0x0a, 0xbd, 0x88, 0x4a, 0xee, 0xe0, 0xfb,
+	  0x75, 0x7a, 0xe9, 0x4e, 0xaa, 0x97, 0x37, 0x3d },
+	{ 0xd4, 0x82, 0xb2, 0x15, 0x5d, 0x4d, 0xec, 0x6b,
+	  0x47, 0x36, 0xa1, 0xf1, 0x61, 0x7b, 0x53, 0xaa,
+	  0xa3, 0x73, 0x10, 0x27, 0x7d, 0x3f, 0xef, 0x0c,
+	  0x37, 0xad, 0x41, 0x76, 0x8f, 0xc2, 0x35, 0xb4 },
+	{ 0x4d, 0x41, 0x39, 0x71, 0x38, 0x7e, 0x7a, 0x88,
+	  0x98, 0xa8, 0xdc, 0x2a, 0x27, 0x50, 0x07, 0x78,
+	  0x53, 0x9e, 0xa2, 0x14, 0xa2, 0xdf, 0xe9, 0xb3,
+	  0xd7, 0xe8, 0xeb, 0xdc, 0xe5, 0xcf, 0x3d, 0xb3 },
+	{ 0x69, 0x6e, 0x5d, 0x46, 0xe6, 0xc5, 0x7e, 0x87,
+	  0x96, 0xe4, 0x73, 0x5d, 0x08, 0x91, 0x6e, 0x0b,
+	  0x79, 0x29, 0xb3, 0xcf, 0x29, 0x8c, 0x29, 0x6d,
+	  0x22, 0xe9, 0xd3, 0x01, 0x96, 0x53, 0x37, 0x1c },
+	{ 0x1f, 0x56, 0x47, 0xc1, 0xd3, 0xb0, 0x88, 0x22,
+	  0x88, 0x85, 0x86, 0x5c, 0x89, 0x40, 0x90, 0x8b,
+	  0xf4, 0x0d, 0x1a, 0x82, 0x72, 0x82, 0x19, 0x73,
+	  0xb1, 0x60, 0x00, 0x8e, 0x7a, 0x3c, 0xe2, 0xeb },
+	{ 0xb6, 0xe7, 0x6c, 0x33, 0x0f, 0x02, 0x1a, 0x5b,
+	  0xda, 0x65, 0x87, 0x50, 0x10, 0xb0, 0xed, 0xf0,
+	  0x91, 0x26, 0xc0, 0xf5, 0x10, 0xea, 0x84, 0x90,
+	  0x48, 0x19, 0x20, 0x03, 0xae, 0xf4, 0xc6, 0x1c },
+	{ 0x3c, 0xd9, 0x52, 0xa0, 0xbe, 0xad, 0xa4, 0x1a,
+	  0xbb, 0x42, 0x4c, 0xe4, 0x7f, 0x94, 0xb4, 0x2b,
+	  0xe6, 0x4e, 0x1f, 0xfb, 0x0f, 0xd0, 0x78, 0x22,
+	  0x76, 0x80, 0x79, 0x46, 0xd0, 0xd0, 0xbc, 0x55 },
+	{ 0x98, 0xd9, 0x26, 0x77, 0x43, 0x9b, 0x41, 0xb7,
+	  0xbb, 0x51, 0x33, 0x12, 0xaf, 0xb9, 0x2b, 0xcc,
+	  0x8e, 0xe9, 0x68, 0xb2, 0xe3, 0xb2, 0x38, 0xce,
+	  0xcb, 0x9b, 0x0f, 0x34, 0xc9, 0xbb, 0x63, 0xd0 },
+	{ 0xec, 0xbc, 0xa2, 0xcf, 0x08, 0xae, 0x57, 0xd5,
+	  0x17, 0xad, 0x16, 0x15, 0x8a, 0x32, 0xbf, 0xa7,
+	  0xdc, 0x03, 0x82, 0xea, 0xed, 0xa1, 0x28, 0xe9,
+	  0x18, 0x86, 0x73, 0x4c, 0x24, 0xa0, 0xb2, 0x9d },
+	{ 0x94, 0x2c, 0xc7, 0xc0, 0xb5, 0x2e, 0x2b, 0x16,
+	  0xa4, 0xb8, 0x9f, 0xa4, 0xfc, 0x7e, 0x0b, 0xf6,
+	  0x09, 0xe2, 0x9a, 0x08, 0xc1, 0xa8, 0x54, 0x34,
+	  0x52, 0xb7, 0x7c, 0x7b, 0xfd, 0x11, 0xbb, 0x28 },
+	{ 0x8a, 0x06, 0x5d, 0x8b, 0x61, 0xa0, 0xdf, 0xfb,
+	  0x17, 0x0d, 0x56, 0x27, 0x73, 0x5a, 0x76, 0xb0,
+	  0xe9, 0x50, 0x60, 0x37, 0x80, 0x8c, 0xba, 0x16,
+	  0xc3, 0x45, 0x00, 0x7c, 0x9f, 0x79, 0xcf, 0x8f },
+	{ 0x1b, 0x9f, 0xa1, 0x97, 0x14, 0x65, 0x9c, 0x78,
+	  0xff, 0x41, 0x38, 0x71, 0x84, 0x92, 0x15, 0x36,
+	  0x10, 0x29, 0xac, 0x80, 0x2b, 0x1c, 0xbc, 0xd5,
+	  0x4e, 0x40, 0x8b, 0xd8, 0x72, 0x87, 0xf8, 0x1f },
+	{ 0x8d, 0xab, 0x07, 0x1b, 0xcd, 0x6c, 0x72, 0x92,
+	  0xa9, 0xef, 0x72, 0x7b, 0x4a, 0xe0, 0xd8, 0x67,
+	  0x13, 0x30, 0x1d, 0xa8, 0x61, 0x8d, 0x9a, 0x48,
+	  0xad, 0xce, 0x55, 0xf3, 0x03, 0xa8, 0x69, 0xa1 },
+	{ 0x82, 0x53, 0xe3, 0xe7, 0xc7, 0xb6, 0x84, 0xb9,
+	  0xcb, 0x2b, 0xeb, 0x01, 0x4c, 0xe3, 0x30, 0xff,
+	  0x3d, 0x99, 0xd1, 0x7a, 0xbb, 0xdb, 0xab, 0xe4,
+	  0xf4, 0xd6, 0x74, 0xde, 0xd5, 0x3f, 0xfc, 0x6b },
+	{ 0xf1, 0x95, 0xf3, 0x21, 0xe9, 0xe3, 0xd6, 0xbd,
+	  0x7d, 0x07, 0x45, 0x04, 0xdd, 0x2a, 0xb0, 0xe6,
+	  0x24, 0x1f, 0x92, 0xe7, 0x84, 0xb1, 0xaa, 0x27,
+	  0x1f, 0xf6, 0x48, 0xb1, 0xca, 0xb6, 0xd7, 0xf6 },
+	{ 0x27, 0xe4, 0xcc, 0x72, 0x09, 0x0f, 0x24, 0x12,
+	  0x66, 0x47, 0x6a, 0x7c, 0x09, 0x49, 0x5f, 0x2d,
+	  0xb1, 0x53, 0xd5, 0xbc, 0xbd, 0x76, 0x19, 0x03,
+	  0xef, 0x79, 0x27, 0x5e, 0xc5, 0x6b, 0x2e, 0xd8 },
+	{ 0x89, 0x9c, 0x24, 0x05, 0x78, 0x8e, 0x25, 0xb9,
+	  0x9a, 0x18, 0x46, 0x35, 0x5e, 0x64, 0x6d, 0x77,
+	  0xcf, 0x40, 0x00, 0x83, 0x41, 0x5f, 0x7d, 0xc5,
+	  0xaf, 0xe6, 0x9d, 0x6e, 0x17, 0xc0, 0x00, 0x23 },
+	{ 0xa5, 0x9b, 0x78, 0xc4, 0x90, 0x57, 0x44, 0x07,
+	  0x6b, 0xfe, 0xe8, 0x94, 0xde, 0x70, 0x7d, 0x4f,
+	  0x12, 0x0b, 0x5c, 0x68, 0x93, 0xea, 0x04, 0x00,
+	  0x29, 0x7d, 0x0b, 0xb8, 0x34, 0x72, 0x76, 0x32 },
+	{ 0x59, 0xdc, 0x78, 0xb1, 0x05, 0x64, 0x97, 0x07,
+	  0xa2, 0xbb, 0x44, 0x19, 0xc4, 0x8f, 0x00, 0x54,
+	  0x00, 0xd3, 0x97, 0x3d, 0xe3, 0x73, 0x66, 0x10,
+	  0x23, 0x04, 0x35, 0xb1, 0x04, 0x24, 0xb2, 0x4f },
+	{ 0xc0, 0x14, 0x9d, 0x1d, 0x7e, 0x7a, 0x63, 0x53,
+	  0xa6, 0xd9, 0x06, 0xef, 0xe7, 0x28, 0xf2, 0xf3,
+	  0x29, 0xfe, 0x14, 0xa4, 0x14, 0x9a, 0x3e, 0xa7,
+	  0x76, 0x09, 0xbc, 0x42, 0xb9, 0x75, 0xdd, 0xfa },
+	{ 0xa3, 0x2f, 0x24, 0x14, 0x74, 0xa6, 0xc1, 0x69,
+	  0x32, 0xe9, 0x24, 0x3b, 0xe0, 0xcf, 0x09, 0xbc,
+	  0xdc, 0x7e, 0x0c, 0xa0, 0xe7, 0xa6, 0xa1, 0xb9,
+	  0xb1, 0xa0, 0xf0, 0x1e, 0x41, 0x50, 0x23, 0x77 },
+	{ 0xb2, 0x39, 0xb2, 0xe4, 0xf8, 0x18, 0x41, 0x36,
+	  0x1c, 0x13, 0x39, 0xf6, 0x8e, 0x2c, 0x35, 0x9f,
+	  0x92, 0x9a, 0xf9, 0xad, 0x9f, 0x34, 0xe0, 0x1a,
+	  0xab, 0x46, 0x31, 0xad, 0x6d, 0x55, 0x00, 0xb0 },
+	{ 0x85, 0xfb, 0x41, 0x9c, 0x70, 0x02, 0xa3, 0xe0,
+	  0xb4, 0xb6, 0xea, 0x09, 0x3b, 0x4c, 0x1a, 0xc6,
+	  0x93, 0x66, 0x45, 0xb6, 0x5d, 0xac, 0x5a, 0xc1,
+	  0x5a, 0x85, 0x28, 0xb7, 0xb9, 0x4c, 0x17, 0x54 },
+	{ 0x96, 0x19, 0x72, 0x06, 0x25, 0xf1, 0x90, 0xb9,
+	  0x3a, 0x3f, 0xad, 0x18, 0x6a, 0xb3, 0x14, 0x18,
+	  0x96, 0x33, 0xc0, 0xd3, 0xa0, 0x1e, 0x6f, 0x9b,
+	  0xc8, 0xc4, 0xa8, 0xf8, 0x2f, 0x38, 0x3d, 0xbf },
+	{ 0x7d, 0x62, 0x0d, 0x90, 0xfe, 0x69, 0xfa, 0x46,
+	  0x9a, 0x65, 0x38, 0x38, 0x89, 0x70, 0xa1, 0xaa,
+	  0x09, 0xbb, 0x48, 0xa2, 0xd5, 0x9b, 0x34, 0x7b,
+	  0x97, 0xe8, 0xce, 0x71, 0xf4, 0x8c, 0x7f, 0x46 },
+	{ 0x29, 0x43, 0x83, 0x56, 0x85, 0x96, 0xfb, 0x37,
+	  0xc7, 0x5b, 0xba, 0xcd, 0x97, 0x9c, 0x5f, 0xf6,
+	  0xf2, 0x0a, 0x55, 0x6b, 0xf8, 0x87, 0x9c, 0xc7,
+	  0x29, 0x24, 0x85, 0x5d, 0xf9, 0xb8, 0x24, 0x0e },
+	{ 0x16, 0xb1, 0x8a, 0xb3, 0x14, 0x35, 0x9c, 0x2b,
+	  0x83, 0x3c, 0x1c, 0x69, 0x86, 0xd4, 0x8c, 0x55,
+	  0xa9, 0xfc, 0x97, 0xcd, 0xe9, 0xa3, 0xc1, 0xf1,
+	  0x0a, 0x31, 0x77, 0x14, 0x0f, 0x73, 0xf7, 0x38 },
+	{ 0x8c, 0xbb, 0xdd, 0x14, 0xbc, 0x33, 0xf0, 0x4c,
+	  0xf4, 0x58, 0x13, 0xe4, 0xa1, 0x53, 0xa2, 0x73,
+	  0xd3, 0x6a, 0xda, 0xd5, 0xce, 0x71, 0xf4, 0x99,
+	  0xee, 0xb8, 0x7f, 0xb8, 0xac, 0x63, 0xb7, 0x29 },
+	{ 0x69, 0xc9, 0xa4, 0x98, 0xdb, 0x17, 0x4e, 0xca,
+	  0xef, 0xcc, 0x5a, 0x3a, 0xc9, 0xfd, 0xed, 0xf0,
+	  0xf8, 0x13, 0xa5, 0xbe, 0xc7, 0x27, 0xf1, 0xe7,
+	  0x75, 0xba, 0xbd, 0xec, 0x77, 0x18, 0x81, 0x6e },
+	{ 0xb4, 0x62, 0xc3, 0xbe, 0x40, 0x44, 0x8f, 0x1d,
+	  0x4f, 0x80, 0x62, 0x62, 0x54, 0xe5, 0x35, 0xb0,
+	  0x8b, 0xc9, 0xcd, 0xcf, 0xf5, 0x99, 0xa7, 0x68,
+	  0x57, 0x8d, 0x4b, 0x28, 0x81, 0xa8, 0xe3, 0xf0 },
+	{ 0x55, 0x3e, 0x9d, 0x9c, 0x5f, 0x36, 0x0a, 0xc0,
+	  0xb7, 0x4a, 0x7d, 0x44, 0xe5, 0xa3, 0x91, 0xda,
+	  0xd4, 0xce, 0xd0, 0x3e, 0x0c, 0x24, 0x18, 0x3b,
+	  0x7e, 0x8e, 0xca, 0xbd, 0xf1, 0x71, 0x5a, 0x64 },
+	{ 0x7a, 0x7c, 0x55, 0xa5, 0x6f, 0xa9, 0xae, 0x51,
+	  0xe6, 0x55, 0xe0, 0x19, 0x75, 0xd8, 0xa6, 0xff,
+	  0x4a, 0xe9, 0xe4, 0xb4, 0x86, 0xfc, 0xbe, 0x4e,
+	  0xac, 0x04, 0x45, 0x88, 0xf2, 0x45, 0xeb, 0xea },
+	{ 0x2a, 0xfd, 0xf3, 0xc8, 0x2a, 0xbc, 0x48, 0x67,
+	  0xf5, 0xde, 0x11, 0x12, 0x86, 0xc2, 0xb3, 0xbe,
+	  0x7d, 0x6e, 0x48, 0x65, 0x7b, 0xa9, 0x23, 0xcf,
+	  0xbf, 0x10, 0x1a, 0x6d, 0xfc, 0xf9, 0xdb, 0x9a },
+	{ 0x41, 0x03, 0x7d, 0x2e, 0xdc, 0xdc, 0xe0, 0xc4,
+	  0x9b, 0x7f, 0xb4, 0xa6, 0xaa, 0x09, 0x99, 0xca,
+	  0x66, 0x97, 0x6c, 0x74, 0x83, 0xaf, 0xe6, 0x31,
+	  0xd4, 0xed, 0xa2, 0x83, 0x14, 0x4f, 0x6d, 0xfc },
+	{ 0xc4, 0x46, 0x6f, 0x84, 0x97, 0xca, 0x2e, 0xeb,
+	  0x45, 0x83, 0xa0, 0xb0, 0x8e, 0x9d, 0x9a, 0xc7,
+	  0x43, 0x95, 0x70, 0x9f, 0xda, 0x10, 0x9d, 0x24,
+	  0xf2, 0xe4, 0x46, 0x21, 0x96, 0x77, 0x9c, 0x5d },
+	{ 0x75, 0xf6, 0x09, 0x33, 0x8a, 0xa6, 0x7d, 0x96,
+	  0x9a, 0x2a, 0xe2, 0xa2, 0x36, 0x2b, 0x2d, 0xa9,
+	  0xd7, 0x7c, 0x69, 0x5d, 0xfd, 0x1d, 0xf7, 0x22,
+	  0x4a, 0x69, 0x01, 0xdb, 0x93, 0x2c, 0x33, 0x64 },
+	{ 0x68, 0x60, 0x6c, 0xeb, 0x98, 0x9d, 0x54, 0x88,
+	  0xfc, 0x7c, 0xf6, 0x49, 0xf3, 0xd7, 0xc2, 0x72,
+	  0xef, 0x05, 0x5d, 0xa1, 0xa9, 0x3f, 0xae, 0xcd,
+	  0x55, 0xfe, 0x06, 0xf6, 0x96, 0x70, 0x98, 0xca },
+	{ 0x44, 0x34, 0x6b, 0xde, 0xb7, 0xe0, 0x52, 0xf6,
+	  0x25, 0x50, 0x48, 0xf0, 0xd9, 0xb4, 0x2c, 0x42,
+	  0x5b, 0xab, 0x9c, 0x3d, 0xd2, 0x41, 0x68, 0x21,
+	  0x2c, 0x3e, 0xcf, 0x1e, 0xbf, 0x34, 0xe6, 0xae },
+	{ 0x8e, 0x9c, 0xf6, 0xe1, 0xf3, 0x66, 0x47, 0x1f,
+	  0x2a, 0xc7, 0xd2, 0xee, 0x9b, 0x5e, 0x62, 0x66,
+	  0xfd, 0xa7, 0x1f, 0x8f, 0x2e, 0x41, 0x09, 0xf2,
+	  0x23, 0x7e, 0xd5, 0xf8, 0x81, 0x3f, 0xc7, 0x18 },
+	{ 0x84, 0xbb, 0xeb, 0x84, 0x06, 0xd2, 0x50, 0x95,
+	  0x1f, 0x8c, 0x1b, 0x3e, 0x86, 0xa7, 0xc0, 0x10,
+	  0x08, 0x29, 0x21, 0x83, 0x3d, 0xfd, 0x95, 0x55,
+	  0xa2, 0xf9, 0x09, 0xb1, 0x08, 0x6e, 0xb4, 0xb8 },
+	{ 0xee, 0x66, 0x6f, 0x3e, 0xef, 0x0f, 0x7e, 0x2a,
+	  0x9c, 0x22, 0x29, 0x58, 0xc9, 0x7e, 0xaf, 0x35,
+	  0xf5, 0x1c, 0xed, 0x39, 0x3d, 0x71, 0x44, 0x85,
+	  0xab, 0x09, 0xa0, 0x69, 0x34, 0x0f, 0xdf, 0x88 },
+	{ 0xc1, 0x53, 0xd3, 0x4a, 0x65, 0xc4, 0x7b, 0x4a,
+	  0x62, 0xc5, 0xca, 0xcf, 0x24, 0x01, 0x09, 0x75,
+	  0xd0, 0x35, 0x6b, 0x2f, 0x32, 0xc8, 0xf5, 0xda,
+	  0x53, 0x0d, 0x33, 0x88, 0x16, 0xad, 0x5d, 0xe6 },
+	{ 0x9f, 0xc5, 0x45, 0x01, 0x09, 0xe1, 0xb7, 0x79,
+	  0xf6, 0xc7, 0xae, 0x79, 0xd5, 0x6c, 0x27, 0x63,
+	  0x5c, 0x8d, 0xd4, 0x26, 0xc5, 0xa9, 0xd5, 0x4e,
+	  0x25, 0x78, 0xdb, 0x98, 0x9b, 0x8c, 0x3b, 0x4e },
+	{ 0xd1, 0x2b, 0xf3, 0x73, 0x2e, 0xf4, 0xaf, 0x5c,
+	  0x22, 0xfa, 0x90, 0x35, 0x6a, 0xf8, 0xfc, 0x50,
+	  0xfc, 0xb4, 0x0f, 0x8f, 0x2e, 0xa5, 0xc8, 0x59,
+	  0x47, 0x37, 0xa3, 0xb3, 0xd5, 0xab, 0xdb, 0xd7 },
+	{ 0x11, 0x03, 0x0b, 0x92, 0x89, 0xbb, 0xa5, 0xaf,
+	  0x65, 0x26, 0x06, 0x72, 0xab, 0x6f, 0xee, 0x88,
+	  0xb8, 0x74, 0x20, 0xac, 0xef, 0x4a, 0x17, 0x89,
+	  0xa2, 0x07, 0x3b, 0x7e, 0xc2, 0xf2, 0xa0, 0x9e },
+	{ 0x69, 0xcb, 0x19, 0x2b, 0x84, 0x44, 0x00, 0x5c,
+	  0x8c, 0x0c, 0xeb, 0x12, 0xc8, 0x46, 0x86, 0x07,
+	  0x68, 0x18, 0x8c, 0xda, 0x0a, 0xec, 0x27, 0xa9,
+	  0xc8, 0xa5, 0x5c, 0xde, 0xe2, 0x12, 0x36, 0x32 },
+	{ 0xdb, 0x44, 0x4c, 0x15, 0x59, 0x7b, 0x5f, 0x1a,
+	  0x03, 0xd1, 0xf9, 0xed, 0xd1, 0x6e, 0x4a, 0x9f,
+	  0x43, 0xa6, 0x67, 0xcc, 0x27, 0x51, 0x75, 0xdf,
+	  0xa2, 0xb7, 0x04, 0xe3, 0xbb, 0x1a, 0x9b, 0x83 },
+	{ 0x3f, 0xb7, 0x35, 0x06, 0x1a, 0xbc, 0x51, 0x9d,
+	  0xfe, 0x97, 0x9e, 0x54, 0xc1, 0xee, 0x5b, 0xfa,
+	  0xd0, 0xa9, 0xd8, 0x58, 0xb3, 0x31, 0x5b, 0xad,
+	  0x34, 0xbd, 0xe9, 0x99, 0xef, 0xd7, 0x24, 0xdd }
+};
+
+bool __init blake2s_selftest(void)
+{
+	u8 key[BLAKE2S_KEYBYTES];
+	u8 buf[ARRAY_SIZE(blake2s_testvecs)];
+	u8 hash[BLAKE2S_OUTBYTES];
+	size_t i;
+	bool success = true;
+
+	for (i = 0; i < BLAKE2S_KEYBYTES; ++i)
+		key[i] = (u8)i;
+
+	for (i = 0; i < ARRAY_SIZE(blake2s_testvecs); ++i)
+		buf[i] = (u8)i;
+
+	for (i = 0; i < ARRAY_SIZE(blake2s_keyed_testvecs); ++i) {
+		blake2s(hash, buf, key, BLAKE2S_OUTBYTES, i, BLAKE2S_KEYBYTES);
+		if (memcmp(hash, blake2s_keyed_testvecs[i], BLAKE2S_OUTBYTES)) {
+			pr_info("blake2s keyed self-test %zu: FAIL\n", i + 1);
+			success = false;
+		}
+	}
+
+	for (i = 0; i < ARRAY_SIZE(blake2s_testvecs); ++i) {
+		blake2s(hash, buf, NULL, BLAKE2S_OUTBYTES, i, 0);
+		if (memcmp(hash, blake2s_testvecs[i], BLAKE2S_OUTBYTES)) {
+			pr_info("blake2s unkeyed self-test %zu: FAIL\n", i + i);
+			success = false;
+		}
+	}
+
+	if (success)
+		pr_info("blake2s self-tests: pass\n");
+	return success;
+}
+#endif
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 13/20] zinc: BLAKE2s x86_64 implementation
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (11 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 12/20] zinc: BLAKE2s generic C implementation " Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 14/20] zinc: Curve25519 generic C implementations and selftest Jason A. Donenfeld
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Thomas Gleixner, Ingo Molnar, x86

These implementations from Samuel Neves support AVX and AVX-512VL.
Originally this used AVX-512F, but Skylake thermal throttling made
AVX-512VL more attractive and possible to do with negligable difference.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
---
 lib/zinc/Makefile                      |   4 +
 lib/zinc/blake2s/blake2s-x86_64-glue.h |  62 +++
 lib/zinc/blake2s/blake2s-x86_64.S      | 685 +++++++++++++++++++++++++
 3 files changed, 751 insertions(+)
 create mode 100644 lib/zinc/blake2s/blake2s-x86_64-glue.h
 create mode 100644 lib/zinc/blake2s/blake2s-x86_64.S

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 45817ec5539c..49455abfdf5e 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -53,6 +53,10 @@ endif
 
 ifeq ($(CONFIG_ZINC_BLAKE2S),y)
 zinc-y += blake2s/blake2s.o
+ifeq ($(CONFIG_ZINC_ARCH_X86_64),y)
+zinc-y += blake2s/blake2s-x86_64.o
+CFLAGS_blake2s.o += -include $(srctree)/$(src)/blake2s/blake2s-x86_64-glue.h
+endif
 endif
 
 zinc-y += main.o
diff --git a/lib/zinc/blake2s/blake2s-x86_64-glue.h b/lib/zinc/blake2s/blake2s-x86_64-glue.h
new file mode 100644
index 000000000000..2f5b74bd9117
--- /dev/null
+++ b/lib/zinc/blake2s/blake2s-x86_64-glue.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/blake2s.h>
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+#include <asm/fpu/api.h>
+#include <asm/simd.h>
+
+#ifdef CONFIG_AS_AVX
+asmlinkage void blake2s_compress_avx(struct blake2s_state *state,
+				     const u8 *block, const size_t nblocks,
+				     const u32 inc);
+#endif
+#ifdef CONFIG_AS_AVX512
+asmlinkage void blake2s_compress_avx512(struct blake2s_state *state,
+					const u8 *block, const size_t nblocks,
+					const u32 inc);
+#endif
+
+static bool blake2s_use_avx __ro_after_init;
+static bool blake2s_use_avx512 __ro_after_init;
+
+void __init blake2s_fpu_init(void)
+{
+	blake2s_use_avx =
+		boot_cpu_has(X86_FEATURE_AVX) &&
+		cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
+	blake2s_use_avx512 =
+		boot_cpu_has(X86_FEATURE_AVX) &&
+		boot_cpu_has(X86_FEATURE_AVX2) &&
+		boot_cpu_has(X86_FEATURE_AVX512F) &&
+		boot_cpu_has(X86_FEATURE_AVX512VL) &&
+		cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
+				  XFEATURE_MASK_AVX512, NULL);
+}
+
+static inline bool blake2s_arch(struct blake2s_state *state, const u8 *block,
+				size_t nblocks, const u32 inc)
+{
+#ifdef CONFIG_AS_AVX512
+	if (blake2s_use_avx512 && irq_fpu_usable()) {
+		kernel_fpu_begin();
+		blake2s_compress_avx512(state, block, nblocks, inc);
+		kernel_fpu_end();
+		return true;
+	}
+#endif
+#ifdef CONFIG_AS_AVX
+	if (blake2s_use_avx && irq_fpu_usable()) {
+		kernel_fpu_begin();
+		blake2s_compress_avx(state, block, nblocks, inc);
+		kernel_fpu_end();
+		return true;
+	}
+#endif
+	return false;
+}
+
+#define HAVE_BLAKE2S_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/blake2s/blake2s-x86_64.S b/lib/zinc/blake2s/blake2s-x86_64.S
new file mode 100644
index 000000000000..6a84b9f1f2c4
--- /dev/null
+++ b/lib/zinc/blake2s/blake2s-x86_64.S
@@ -0,0 +1,685 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2017 Samuel Neves <sneves@dei.uc.pt>. All Rights Reserved.
+ */
+
+#include <linux/linkage.h>
+
+.section .rodata.cst32.BLAKE2S_IV, "aM", @progbits, 32
+.align 32
+IV:	.octa 0xA54FF53A3C6EF372BB67AE856A09E667
+	.octa 0x5BE0CD191F83D9AB9B05688C510E527F
+.section .rodata.cst16.ROT16, "aM", @progbits, 16
+.align 16
+ROT16:	.octa 0x0D0C0F0E09080B0A0504070601000302
+.section .rodata.cst16.ROR328, "aM", @progbits, 16
+.align 16
+ROR328:	.octa 0x0C0F0E0D080B0A090407060500030201
+#ifdef CONFIG_AS_AVX512
+.section .rodata.cst64.BLAKE2S_SIGMA, "aM", @progbits, 640
+.align 64
+SIGMA:
+.long 0, 2, 4, 6, 1, 3, 5, 7, 8, 10, 12, 14, 9, 11, 13, 15
+.long 11, 2, 12, 14, 9, 8, 15, 3, 4, 0, 13, 6, 10, 1, 7, 5
+.long 10, 12, 11, 6, 5, 9, 13, 3, 4, 15, 14, 2, 0, 7, 8, 1
+.long 10, 9, 7, 0, 11, 14, 1, 12, 6, 2, 15, 3, 13, 8, 5, 4
+.long 4, 9, 8, 13, 14, 0, 10, 11, 7, 3, 12, 1, 5, 6, 15, 2
+.long 2, 10, 4, 14, 13, 3, 9, 11, 6, 5, 7, 12, 15, 1, 8, 0
+.long 4, 11, 14, 8, 13, 10, 12, 5, 2, 1, 15, 3, 9, 7, 0, 6
+.long 6, 12, 0, 13, 15, 2, 1, 10, 4, 5, 11, 14, 8, 3, 9, 7
+.long 14, 5, 4, 12, 9, 7, 3, 10, 2, 0, 6, 15, 11, 1, 13, 8
+.long 11, 7, 13, 10, 12, 14, 0, 15, 4, 5, 6, 9, 2, 1, 8, 3
+#endif /* CONFIG_AS_AVX512 */
+
+.text
+#ifdef CONFIG_AS_AVX
+ENTRY(blake2s_compress_avx)
+	movl		%ecx, %ecx
+	testq		%rdx, %rdx
+	je		.Lendofloop
+	.align 32
+.Lbeginofloop:
+	addq		%rcx, 32(%rdi)
+	vmovdqu		IV+16(%rip), %xmm1
+	vmovdqu		(%rsi), %xmm4
+	vpxor		32(%rdi), %xmm1, %xmm1
+	vmovdqu		16(%rsi), %xmm3
+	vshufps		$136, %xmm3, %xmm4, %xmm6
+	vmovdqa		ROT16(%rip), %xmm7
+	vpaddd		(%rdi), %xmm6, %xmm6
+	vpaddd		16(%rdi), %xmm6, %xmm6
+	vpxor		%xmm6, %xmm1, %xmm1
+	vmovdqu		IV(%rip), %xmm8
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vmovdqu		48(%rsi), %xmm5
+	vpaddd		%xmm1, %xmm8, %xmm8
+	vpxor		16(%rdi), %xmm8, %xmm9
+	vmovdqu		32(%rsi), %xmm2
+	vpblendw	$12, %xmm3, %xmm5, %xmm13
+	vshufps		$221, %xmm5, %xmm2, %xmm12
+	vpunpckhqdq	%xmm2, %xmm4, %xmm14
+	vpslld		$20, %xmm9, %xmm0
+	vpsrld		$12, %xmm9, %xmm9
+	vpxor		%xmm0, %xmm9, %xmm0
+	vshufps		$221, %xmm3, %xmm4, %xmm9
+	vpaddd		%xmm9, %xmm6, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vmovdqa		ROR328(%rip), %xmm6
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm8, %xmm8
+	vpxor		%xmm8, %xmm0, %xmm0
+	vpshufd		$147, %xmm1, %xmm1
+	vpshufd		$78, %xmm8, %xmm8
+	vpslld		$25, %xmm0, %xmm10
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm10, %xmm0, %xmm0
+	vshufps		$136, %xmm5, %xmm2, %xmm10
+	vpshufd		$57, %xmm0, %xmm0
+	vpaddd		%xmm10, %xmm9, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpaddd		%xmm12, %xmm9, %xmm9
+	vpblendw	$12, %xmm2, %xmm3, %xmm12
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm8, %xmm8
+	vpxor		%xmm8, %xmm0, %xmm10
+	vpslld		$20, %xmm10, %xmm0
+	vpsrld		$12, %xmm10, %xmm10
+	vpxor		%xmm0, %xmm10, %xmm0
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm8, %xmm8
+	vpxor		%xmm8, %xmm0, %xmm0
+	vpshufd		$57, %xmm1, %xmm1
+	vpshufd		$78, %xmm8, %xmm8
+	vpslld		$25, %xmm0, %xmm10
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm10, %xmm0, %xmm0
+	vpslldq		$4, %xmm5, %xmm10
+	vpblendw	$240, %xmm10, %xmm12, %xmm12
+	vpshufd		$147, %xmm0, %xmm0
+	vpshufd		$147, %xmm12, %xmm12
+	vpaddd		%xmm9, %xmm12, %xmm12
+	vpaddd		%xmm0, %xmm12, %xmm12
+	vpxor		%xmm12, %xmm1, %xmm1
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm8, %xmm8
+	vpxor		%xmm8, %xmm0, %xmm11
+	vpslld		$20, %xmm11, %xmm9
+	vpsrld		$12, %xmm11, %xmm11
+	vpxor		%xmm9, %xmm11, %xmm0
+	vpshufd		$8, %xmm2, %xmm9
+	vpblendw	$192, %xmm5, %xmm3, %xmm11
+	vpblendw	$240, %xmm11, %xmm9, %xmm9
+	vpshufd		$177, %xmm9, %xmm9
+	vpaddd		%xmm12, %xmm9, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm11
+	vpxor		%xmm11, %xmm1, %xmm1
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm8, %xmm8
+	vpxor		%xmm8, %xmm0, %xmm9
+	vpshufd		$147, %xmm1, %xmm1
+	vpshufd		$78, %xmm8, %xmm8
+	vpslld		$25, %xmm9, %xmm0
+	vpsrld		$7, %xmm9, %xmm9
+	vpxor		%xmm0, %xmm9, %xmm0
+	vpslldq		$4, %xmm3, %xmm9
+	vpblendw	$48, %xmm9, %xmm2, %xmm9
+	vpblendw	$240, %xmm9, %xmm4, %xmm9
+	vpshufd		$57, %xmm0, %xmm0
+	vpshufd		$177, %xmm9, %xmm9
+	vpaddd		%xmm11, %xmm9, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm8, %xmm11
+	vpxor		%xmm11, %xmm0, %xmm0
+	vpslld		$20, %xmm0, %xmm8
+	vpsrld		$12, %xmm0, %xmm0
+	vpxor		%xmm8, %xmm0, %xmm0
+	vpunpckhdq	%xmm3, %xmm4, %xmm8
+	vpblendw	$12, %xmm10, %xmm8, %xmm12
+	vpshufd		$177, %xmm12, %xmm12
+	vpaddd		%xmm9, %xmm12, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm11, %xmm11
+	vpxor		%xmm11, %xmm0, %xmm0
+	vpshufd		$57, %xmm1, %xmm1
+	vpshufd		$78, %xmm11, %xmm11
+	vpslld		$25, %xmm0, %xmm12
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm12, %xmm0, %xmm0
+	vpunpckhdq	%xmm5, %xmm2, %xmm12
+	vpshufd		$147, %xmm0, %xmm0
+	vpblendw	$15, %xmm13, %xmm12, %xmm12
+	vpslldq		$8, %xmm5, %xmm13
+	vpshufd		$210, %xmm12, %xmm12
+	vpaddd		%xmm9, %xmm12, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm11, %xmm11
+	vpxor		%xmm11, %xmm0, %xmm0
+	vpslld		$20, %xmm0, %xmm12
+	vpsrld		$12, %xmm0, %xmm0
+	vpxor		%xmm12, %xmm0, %xmm0
+	vpunpckldq	%xmm4, %xmm2, %xmm12
+	vpblendw	$240, %xmm4, %xmm12, %xmm12
+	vpblendw	$192, %xmm13, %xmm12, %xmm12
+	vpsrldq		$12, %xmm3, %xmm13
+	vpaddd		%xmm12, %xmm9, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm11, %xmm11
+	vpxor		%xmm11, %xmm0, %xmm0
+	vpshufd		$147, %xmm1, %xmm1
+	vpshufd		$78, %xmm11, %xmm11
+	vpslld		$25, %xmm0, %xmm12
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm12, %xmm0, %xmm0
+	vpblendw	$60, %xmm2, %xmm4, %xmm12
+	vpblendw	$3, %xmm13, %xmm12, %xmm12
+	vpshufd		$57, %xmm0, %xmm0
+	vpshufd		$78, %xmm12, %xmm12
+	vpaddd		%xmm9, %xmm12, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm11, %xmm11
+	vpxor		%xmm11, %xmm0, %xmm12
+	vpslld		$20, %xmm12, %xmm13
+	vpsrld		$12, %xmm12, %xmm0
+	vpblendw	$51, %xmm3, %xmm4, %xmm12
+	vpxor		%xmm13, %xmm0, %xmm0
+	vpblendw	$192, %xmm10, %xmm12, %xmm10
+	vpslldq		$8, %xmm2, %xmm12
+	vpshufd		$27, %xmm10, %xmm10
+	vpaddd		%xmm9, %xmm10, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm11, %xmm11
+	vpxor		%xmm11, %xmm0, %xmm0
+	vpshufd		$57, %xmm1, %xmm1
+	vpshufd		$78, %xmm11, %xmm11
+	vpslld		$25, %xmm0, %xmm10
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm10, %xmm0, %xmm0
+	vpunpckhdq	%xmm2, %xmm8, %xmm10
+	vpshufd		$147, %xmm0, %xmm0
+	vpblendw	$12, %xmm5, %xmm10, %xmm10
+	vpshufd		$210, %xmm10, %xmm10
+	vpaddd		%xmm9, %xmm10, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm11, %xmm11
+	vpxor		%xmm11, %xmm0, %xmm10
+	vpslld		$20, %xmm10, %xmm0
+	vpsrld		$12, %xmm10, %xmm10
+	vpxor		%xmm0, %xmm10, %xmm0
+	vpblendw	$12, %xmm4, %xmm5, %xmm10
+	vpblendw	$192, %xmm12, %xmm10, %xmm10
+	vpunpckldq	%xmm2, %xmm4, %xmm12
+	vpshufd		$135, %xmm10, %xmm10
+	vpaddd		%xmm9, %xmm10, %xmm9
+	vpaddd		%xmm0, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm1, %xmm1
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm11, %xmm13
+	vpxor		%xmm13, %xmm0, %xmm0
+	vpshufd		$147, %xmm1, %xmm1
+	vpshufd		$78, %xmm13, %xmm13
+	vpslld		$25, %xmm0, %xmm10
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm10, %xmm0, %xmm0
+	vpblendw	$15, %xmm3, %xmm4, %xmm10
+	vpblendw	$192, %xmm5, %xmm10, %xmm10
+	vpshufd		$57, %xmm0, %xmm0
+	vpshufd		$198, %xmm10, %xmm10
+	vpaddd		%xmm9, %xmm10, %xmm10
+	vpaddd		%xmm0, %xmm10, %xmm10
+	vpxor		%xmm10, %xmm1, %xmm1
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm13, %xmm13
+	vpxor		%xmm13, %xmm0, %xmm9
+	vpslld		$20, %xmm9, %xmm0
+	vpsrld		$12, %xmm9, %xmm9
+	vpxor		%xmm0, %xmm9, %xmm0
+	vpunpckhdq	%xmm2, %xmm3, %xmm9
+	vpunpcklqdq	%xmm12, %xmm9, %xmm15
+	vpunpcklqdq	%xmm12, %xmm8, %xmm12
+	vpblendw	$15, %xmm5, %xmm8, %xmm8
+	vpaddd		%xmm15, %xmm10, %xmm15
+	vpaddd		%xmm0, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm1, %xmm1
+	vpshufd		$141, %xmm8, %xmm8
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm13, %xmm13
+	vpxor		%xmm13, %xmm0, %xmm0
+	vpshufd		$57, %xmm1, %xmm1
+	vpshufd		$78, %xmm13, %xmm13
+	vpslld		$25, %xmm0, %xmm10
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm10, %xmm0, %xmm0
+	vpunpcklqdq	%xmm2, %xmm3, %xmm10
+	vpshufd		$147, %xmm0, %xmm0
+	vpblendw	$51, %xmm14, %xmm10, %xmm14
+	vpshufd		$135, %xmm14, %xmm14
+	vpaddd		%xmm15, %xmm14, %xmm14
+	vpaddd		%xmm0, %xmm14, %xmm14
+	vpxor		%xmm14, %xmm1, %xmm1
+	vpunpcklqdq	%xmm3, %xmm4, %xmm15
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm13, %xmm13
+	vpxor		%xmm13, %xmm0, %xmm0
+	vpslld		$20, %xmm0, %xmm11
+	vpsrld		$12, %xmm0, %xmm0
+	vpxor		%xmm11, %xmm0, %xmm0
+	vpunpckhqdq	%xmm5, %xmm3, %xmm11
+	vpblendw	$51, %xmm15, %xmm11, %xmm11
+	vpunpckhqdq	%xmm3, %xmm5, %xmm15
+	vpaddd		%xmm11, %xmm14, %xmm11
+	vpaddd		%xmm0, %xmm11, %xmm11
+	vpxor		%xmm11, %xmm1, %xmm1
+	vpshufb		%xmm6, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm13, %xmm13
+	vpxor		%xmm13, %xmm0, %xmm0
+	vpshufd		$147, %xmm1, %xmm1
+	vpshufd		$78, %xmm13, %xmm13
+	vpslld		$25, %xmm0, %xmm14
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm14, %xmm0, %xmm14
+	vpunpckhqdq	%xmm4, %xmm2, %xmm0
+	vpshufd		$57, %xmm14, %xmm14
+	vpblendw	$51, %xmm15, %xmm0, %xmm15
+	vpaddd		%xmm15, %xmm11, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm1, %xmm1
+	vpshufb		%xmm7, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm13, %xmm13
+	vpxor		%xmm13, %xmm14, %xmm14
+	vpslld		$20, %xmm14, %xmm11
+	vpsrld		$12, %xmm14, %xmm14
+	vpxor		%xmm11, %xmm14, %xmm14
+	vpblendw	$3, %xmm2, %xmm4, %xmm11
+	vpslldq		$8, %xmm11, %xmm0
+	vpblendw	$15, %xmm5, %xmm0, %xmm0
+	vpshufd		$99, %xmm0, %xmm0
+	vpaddd		%xmm15, %xmm0, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm1, %xmm0
+	vpaddd		%xmm12, %xmm15, %xmm15
+	vpshufb		%xmm6, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm13, %xmm13
+	vpxor		%xmm13, %xmm14, %xmm14
+	vpshufd		$57, %xmm0, %xmm0
+	vpshufd		$78, %xmm13, %xmm13
+	vpslld		$25, %xmm14, %xmm1
+	vpsrld		$7, %xmm14, %xmm14
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpblendw	$3, %xmm5, %xmm4, %xmm1
+	vpshufd		$147, %xmm14, %xmm14
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpshufb		%xmm7, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm13, %xmm13
+	vpxor		%xmm13, %xmm14, %xmm14
+	vpslld		$20, %xmm14, %xmm12
+	vpsrld		$12, %xmm14, %xmm14
+	vpxor		%xmm12, %xmm14, %xmm14
+	vpsrldq		$4, %xmm2, %xmm12
+	vpblendw	$60, %xmm12, %xmm1, %xmm1
+	vpaddd		%xmm1, %xmm15, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpblendw	$12, %xmm4, %xmm3, %xmm1
+	vpshufb		%xmm6, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm13, %xmm13
+	vpxor		%xmm13, %xmm14, %xmm14
+	vpshufd		$147, %xmm0, %xmm0
+	vpshufd		$78, %xmm13, %xmm13
+	vpslld		$25, %xmm14, %xmm12
+	vpsrld		$7, %xmm14, %xmm14
+	vpxor		%xmm12, %xmm14, %xmm14
+	vpsrldq		$4, %xmm5, %xmm12
+	vpblendw	$48, %xmm12, %xmm1, %xmm1
+	vpshufd		$33, %xmm5, %xmm12
+	vpshufd		$57, %xmm14, %xmm14
+	vpshufd		$108, %xmm1, %xmm1
+	vpblendw	$51, %xmm12, %xmm10, %xmm12
+	vpaddd		%xmm15, %xmm1, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpaddd		%xmm12, %xmm15, %xmm15
+	vpshufb		%xmm7, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm13, %xmm1
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpslld		$20, %xmm14, %xmm13
+	vpsrld		$12, %xmm14, %xmm14
+	vpxor		%xmm13, %xmm14, %xmm14
+	vpslldq		$12, %xmm3, %xmm13
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpshufb		%xmm6, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpshufd		$57, %xmm0, %xmm0
+	vpshufd		$78, %xmm1, %xmm1
+	vpslld		$25, %xmm14, %xmm12
+	vpsrld		$7, %xmm14, %xmm14
+	vpxor		%xmm12, %xmm14, %xmm14
+	vpblendw	$51, %xmm5, %xmm4, %xmm12
+	vpshufd		$147, %xmm14, %xmm14
+	vpblendw	$192, %xmm13, %xmm12, %xmm12
+	vpaddd		%xmm12, %xmm15, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpsrldq		$4, %xmm3, %xmm12
+	vpshufb		%xmm7, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpslld		$20, %xmm14, %xmm13
+	vpsrld		$12, %xmm14, %xmm14
+	vpxor		%xmm13, %xmm14, %xmm14
+	vpblendw	$48, %xmm2, %xmm5, %xmm13
+	vpblendw	$3, %xmm12, %xmm13, %xmm13
+	vpshufd		$156, %xmm13, %xmm13
+	vpaddd		%xmm15, %xmm13, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpshufb		%xmm6, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpshufd		$147, %xmm0, %xmm0
+	vpshufd		$78, %xmm1, %xmm1
+	vpslld		$25, %xmm14, %xmm13
+	vpsrld		$7, %xmm14, %xmm14
+	vpxor		%xmm13, %xmm14, %xmm14
+	vpunpcklqdq	%xmm2, %xmm4, %xmm13
+	vpshufd		$57, %xmm14, %xmm14
+	vpblendw	$12, %xmm12, %xmm13, %xmm12
+	vpshufd		$180, %xmm12, %xmm12
+	vpaddd		%xmm15, %xmm12, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpshufb		%xmm7, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpslld		$20, %xmm14, %xmm12
+	vpsrld		$12, %xmm14, %xmm14
+	vpxor		%xmm12, %xmm14, %xmm14
+	vpunpckhqdq	%xmm9, %xmm4, %xmm12
+	vpshufd		$198, %xmm12, %xmm12
+	vpaddd		%xmm15, %xmm12, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpaddd		%xmm15, %xmm8, %xmm15
+	vpshufb		%xmm6, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpshufd		$57, %xmm0, %xmm0
+	vpshufd		$78, %xmm1, %xmm1
+	vpslld		$25, %xmm14, %xmm12
+	vpsrld		$7, %xmm14, %xmm14
+	vpxor		%xmm12, %xmm14, %xmm14
+	vpsrldq		$4, %xmm4, %xmm12
+	vpshufd		$147, %xmm14, %xmm14
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm15, %xmm0, %xmm0
+	vpshufb		%xmm7, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpslld		$20, %xmm14, %xmm8
+	vpsrld		$12, %xmm14, %xmm14
+	vpxor		%xmm14, %xmm8, %xmm14
+	vpblendw	$48, %xmm5, %xmm2, %xmm8
+	vpblendw	$3, %xmm12, %xmm8, %xmm8
+	vpunpckhqdq	%xmm5, %xmm4, %xmm12
+	vpshufd		$75, %xmm8, %xmm8
+	vpblendw	$60, %xmm10, %xmm12, %xmm10
+	vpaddd		%xmm15, %xmm8, %xmm15
+	vpaddd		%xmm14, %xmm15, %xmm15
+	vpxor		%xmm0, %xmm15, %xmm0
+	vpshufd		$45, %xmm10, %xmm10
+	vpshufb		%xmm6, %xmm0, %xmm0
+	vpaddd		%xmm15, %xmm10, %xmm15
+	vpaddd		%xmm0, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm14, %xmm14
+	vpshufd		$147, %xmm0, %xmm0
+	vpshufd		$78, %xmm1, %xmm1
+	vpslld		$25, %xmm14, %xmm8
+	vpsrld		$7, %xmm14, %xmm14
+	vpxor		%xmm14, %xmm8, %xmm8
+	vpshufd		$57, %xmm8, %xmm8
+	vpaddd		%xmm8, %xmm15, %xmm15
+	vpxor		%xmm0, %xmm15, %xmm0
+	vpshufb		%xmm7, %xmm0, %xmm0
+	vpaddd		%xmm0, %xmm1, %xmm1
+	vpxor		%xmm8, %xmm1, %xmm8
+	vpslld		$20, %xmm8, %xmm10
+	vpsrld		$12, %xmm8, %xmm8
+	vpxor		%xmm8, %xmm10, %xmm10
+	vpunpckldq	%xmm3, %xmm4, %xmm8
+	vpunpcklqdq	%xmm9, %xmm8, %xmm9
+	vpaddd		%xmm9, %xmm15, %xmm9
+	vpaddd		%xmm10, %xmm9, %xmm9
+	vpxor		%xmm0, %xmm9, %xmm8
+	vpshufb		%xmm6, %xmm8, %xmm8
+	vpaddd		%xmm8, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm10, %xmm10
+	vpshufd		$57, %xmm8, %xmm8
+	vpshufd		$78, %xmm1, %xmm1
+	vpslld		$25, %xmm10, %xmm12
+	vpsrld		$7, %xmm10, %xmm10
+	vpxor		%xmm10, %xmm12, %xmm10
+	vpblendw	$48, %xmm4, %xmm3, %xmm12
+	vpshufd		$147, %xmm10, %xmm0
+	vpunpckhdq	%xmm5, %xmm3, %xmm10
+	vpshufd		$78, %xmm12, %xmm12
+	vpunpcklqdq	%xmm4, %xmm10, %xmm10
+	vpblendw	$192, %xmm2, %xmm10, %xmm10
+	vpshufhw	$78, %xmm10, %xmm10
+	vpaddd		%xmm10, %xmm9, %xmm10
+	vpaddd		%xmm0, %xmm10, %xmm10
+	vpxor		%xmm8, %xmm10, %xmm8
+	vpshufb		%xmm7, %xmm8, %xmm8
+	vpaddd		%xmm8, %xmm1, %xmm1
+	vpxor		%xmm0, %xmm1, %xmm9
+	vpslld		$20, %xmm9, %xmm0
+	vpsrld		$12, %xmm9, %xmm9
+	vpxor		%xmm9, %xmm0, %xmm0
+	vpunpckhdq	%xmm5, %xmm4, %xmm9
+	vpblendw	$240, %xmm9, %xmm2, %xmm13
+	vpshufd		$39, %xmm13, %xmm13
+	vpaddd		%xmm10, %xmm13, %xmm10
+	vpaddd		%xmm0, %xmm10, %xmm10
+	vpxor		%xmm8, %xmm10, %xmm8
+	vpblendw	$12, %xmm4, %xmm2, %xmm13
+	vpshufb		%xmm6, %xmm8, %xmm8
+	vpslldq		$4, %xmm13, %xmm13
+	vpblendw	$15, %xmm5, %xmm13, %xmm13
+	vpaddd		%xmm8, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm0, %xmm0
+	vpaddd		%xmm13, %xmm10, %xmm13
+	vpshufd		$147, %xmm8, %xmm8
+	vpshufd		$78, %xmm1, %xmm1
+	vpslld		$25, %xmm0, %xmm14
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm0, %xmm14, %xmm14
+	vpshufd		$57, %xmm14, %xmm14
+	vpaddd		%xmm14, %xmm13, %xmm13
+	vpxor		%xmm8, %xmm13, %xmm8
+	vpaddd		%xmm13, %xmm12, %xmm12
+	vpshufb		%xmm7, %xmm8, %xmm8
+	vpaddd		%xmm8, %xmm1, %xmm1
+	vpxor		%xmm14, %xmm1, %xmm14
+	vpslld		$20, %xmm14, %xmm10
+	vpsrld		$12, %xmm14, %xmm14
+	vpxor		%xmm14, %xmm10, %xmm10
+	vpaddd		%xmm10, %xmm12, %xmm12
+	vpxor		%xmm8, %xmm12, %xmm8
+	vpshufb		%xmm6, %xmm8, %xmm8
+	vpaddd		%xmm8, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm10, %xmm0
+	vpshufd		$57, %xmm8, %xmm8
+	vpshufd		$78, %xmm1, %xmm1
+	vpslld		$25, %xmm0, %xmm10
+	vpsrld		$7, %xmm0, %xmm0
+	vpxor		%xmm0, %xmm10, %xmm10
+	vpblendw	$48, %xmm2, %xmm3, %xmm0
+	vpblendw	$15, %xmm11, %xmm0, %xmm0
+	vpshufd		$147, %xmm10, %xmm10
+	vpshufd		$114, %xmm0, %xmm0
+	vpaddd		%xmm12, %xmm0, %xmm0
+	vpaddd		%xmm10, %xmm0, %xmm0
+	vpxor		%xmm8, %xmm0, %xmm8
+	vpshufb		%xmm7, %xmm8, %xmm8
+	vpaddd		%xmm8, %xmm1, %xmm1
+	vpxor		%xmm10, %xmm1, %xmm10
+	vpslld		$20, %xmm10, %xmm11
+	vpsrld		$12, %xmm10, %xmm10
+	vpxor		%xmm10, %xmm11, %xmm10
+	vpslldq		$4, %xmm4, %xmm11
+	vpblendw	$192, %xmm11, %xmm3, %xmm3
+	vpunpckldq	%xmm5, %xmm4, %xmm4
+	vpshufd		$99, %xmm3, %xmm3
+	vpaddd		%xmm0, %xmm3, %xmm3
+	vpaddd		%xmm10, %xmm3, %xmm3
+	vpxor		%xmm8, %xmm3, %xmm11
+	vpunpckldq	%xmm5, %xmm2, %xmm0
+	vpblendw	$192, %xmm2, %xmm5, %xmm2
+	vpshufb		%xmm6, %xmm11, %xmm11
+	vpunpckhqdq	%xmm0, %xmm9, %xmm0
+	vpblendw	$15, %xmm4, %xmm2, %xmm4
+	vpaddd		%xmm11, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm10, %xmm10
+	vpshufd		$147, %xmm11, %xmm11
+	vpshufd		$201, %xmm0, %xmm0
+	vpslld		$25, %xmm10, %xmm8
+	vpsrld		$7, %xmm10, %xmm10
+	vpxor		%xmm10, %xmm8, %xmm10
+	vpshufd		$78, %xmm1, %xmm1
+	vpaddd		%xmm3, %xmm0, %xmm0
+	vpshufd		$27, %xmm4, %xmm4
+	vpshufd		$57, %xmm10, %xmm10
+	vpaddd		%xmm10, %xmm0, %xmm0
+	vpxor		%xmm11, %xmm0, %xmm11
+	vpaddd		%xmm0, %xmm4, %xmm0
+	vpshufb		%xmm7, %xmm11, %xmm7
+	vpaddd		%xmm7, %xmm1, %xmm1
+	vpxor		%xmm10, %xmm1, %xmm10
+	vpslld		$20, %xmm10, %xmm8
+	vpsrld		$12, %xmm10, %xmm10
+	vpxor		%xmm10, %xmm8, %xmm8
+	vpaddd		%xmm8, %xmm0, %xmm0
+	vpxor		%xmm7, %xmm0, %xmm7
+	vpshufb		%xmm6, %xmm7, %xmm6
+	vpaddd		%xmm6, %xmm1, %xmm1
+	vpxor		%xmm1, %xmm8, %xmm8
+	vpshufd		$78, %xmm1, %xmm1
+	vpshufd		$57, %xmm6, %xmm6
+	vpslld		$25, %xmm8, %xmm2
+	vpsrld		$7, %xmm8, %xmm8
+	vpxor		%xmm8, %xmm2, %xmm8
+	vpxor		(%rdi), %xmm1, %xmm1
+	vpshufd		$147, %xmm8, %xmm8
+	vpxor		%xmm0, %xmm1, %xmm0
+	vmovups		%xmm0, (%rdi)
+	vpxor		16(%rdi), %xmm8, %xmm0
+	vpxor		%xmm6, %xmm0, %xmm6
+	vmovups		%xmm6, 16(%rdi)
+	addq		$64, %rsi
+	decq		%rdx
+	jnz .Lbeginofloop
+.Lendofloop:
+	ret
+ENDPROC(blake2s_compress_avx)
+#endif /* CONFIG_AS_AVX */
+
+#ifdef CONFIG_AS_AVX512
+ENTRY(blake2s_compress_avx512)
+	vmovdqu		(%rdi),%xmm0
+	vmovdqu		0x10(%rdi),%xmm1
+	vmovdqu		0x20(%rdi),%xmm4
+	vmovq		%rcx,%xmm5
+	vmovdqa		IV(%rip),%xmm14
+	vmovdqa		IV+16(%rip),%xmm15
+	jmp		.Lblake2s_compress_avx512_mainloop
+.align 32
+.Lblake2s_compress_avx512_mainloop:
+	vmovdqa		%xmm0,%xmm10
+	vmovdqa		%xmm1,%xmm11
+	vpaddq		%xmm5,%xmm4,%xmm4
+	vmovdqa		%xmm14,%xmm2
+	vpxor		%xmm15,%xmm4,%xmm3
+	vmovdqu		(%rsi),%ymm6
+	vmovdqu		0x20(%rsi),%ymm7
+	addq		$0x40,%rsi
+	leaq		SIGMA(%rip),%rax
+	movb		$0xa,%cl
+.Lblake2s_compress_avx512_roundloop:
+	addq		$0x40,%rax
+	vmovdqa		-0x40(%rax),%ymm8
+	vmovdqa		-0x20(%rax),%ymm9
+	vpermi2d	%ymm7,%ymm6,%ymm8
+	vpermi2d	%ymm7,%ymm6,%ymm9
+	vmovdqa		%ymm8,%ymm6
+	vmovdqa		%ymm9,%ymm7
+	vpaddd		%xmm8,%xmm0,%xmm0
+	vpaddd		%xmm1,%xmm0,%xmm0
+	vpxor		%xmm0,%xmm3,%xmm3
+	vprord		$0x10,%xmm3,%xmm3
+	vpaddd		%xmm3,%xmm2,%xmm2
+	vpxor		%xmm2,%xmm1,%xmm1
+	vprord		$0xc,%xmm1,%xmm1
+	vextracti128	$0x1,%ymm8,%xmm8
+	vpaddd		%xmm8,%xmm0,%xmm0
+	vpaddd		%xmm1,%xmm0,%xmm0
+	vpxor		%xmm0,%xmm3,%xmm3
+	vprord		$0x8,%xmm3,%xmm3
+	vpaddd		%xmm3,%xmm2,%xmm2
+	vpxor		%xmm2,%xmm1,%xmm1
+	vprord		$0x7,%xmm1,%xmm1
+	vpshufd		$0x39,%xmm1,%xmm1
+	vpshufd		$0x4e,%xmm2,%xmm2
+	vpshufd		$0x93,%xmm3,%xmm3
+	vpaddd		%xmm9,%xmm0,%xmm0
+	vpaddd		%xmm1,%xmm0,%xmm0
+	vpxor		%xmm0,%xmm3,%xmm3
+	vprord		$0x10,%xmm3,%xmm3
+	vpaddd		%xmm3,%xmm2,%xmm2
+	vpxor		%xmm2,%xmm1,%xmm1
+	vprord		$0xc,%xmm1,%xmm1
+	vextracti128	$0x1,%ymm9,%xmm9
+	vpaddd		%xmm9,%xmm0,%xmm0
+	vpaddd		%xmm1,%xmm0,%xmm0
+	vpxor		%xmm0,%xmm3,%xmm3
+	vprord		$0x8,%xmm3,%xmm3
+	vpaddd		%xmm3,%xmm2,%xmm2
+	vpxor		%xmm2,%xmm1,%xmm1
+	vprord		$0x7,%xmm1,%xmm1
+	vpshufd		$0x93,%xmm1,%xmm1
+	vpshufd		$0x4e,%xmm2,%xmm2
+	vpshufd		$0x39,%xmm3,%xmm3
+	decb		%cl
+	jne		.Lblake2s_compress_avx512_roundloop
+	vpxor		%xmm10,%xmm0,%xmm0
+	vpxor		%xmm11,%xmm1,%xmm1
+	vpxor		%xmm2,%xmm0,%xmm0
+	vpxor		%xmm3,%xmm1,%xmm1
+	decq		%rdx
+	jne		.Lblake2s_compress_avx512_mainloop
+	vmovdqu		%xmm0,(%rdi)
+	vmovdqu		%xmm1,0x10(%rdi)
+	vmovdqu		%xmm4,0x20(%rdi)
+	vzeroupper
+	retq
+ENDPROC(blake2s_compress_avx512)
+#endif /* CONFIG_AS_AVX512 */
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 14/20] zinc: Curve25519 generic C implementations and selftest
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (12 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 13/20] zinc: BLAKE2s x86_64 implementation Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 15/20] zinc: Curve25519 ARM implementation Jason A. Donenfeld
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Karthikeyan Bhargavan

This contains two formally verified C implementations of the Curve25519
scalar multiplication function, one for 32-bit systems, and one for
64-bit systems whose compiler supports efficient 128-bit integer types.
Not only are these implementations formally verified, but they are also
the fastest available C implementations. They have been modified to be
friendly to kernel space and to be generally less horrendous looking,
but still an effort has been made to retain their formally verified
characteristic, and so the C might look slightly unidiomatic.

The 64-bit version comes from HACL*: https://github.com/project-everest/hacl-star
The 32-bit version comes from Fiat: https://github.com/mit-plv/fiat-crypto

Information: https://cr.yp.to/ecdh.html

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Karthikeyan Bhargavan <karthik.bhargavan@gmail.com>
---
 include/zinc/curve25519.h               |   28 +
 lib/zinc/Kconfig                        |    5 +
 lib/zinc/Makefile                       |    4 +
 lib/zinc/curve25519/curve25519-fiat32.h |  862 +++++++++++++++
 lib/zinc/curve25519/curve25519-hacl64.h |  785 ++++++++++++++
 lib/zinc/curve25519/curve25519.c        |   83 ++
 lib/zinc/main.c                         |    5 +
 lib/zinc/selftest/curve25519.h          | 1321 +++++++++++++++++++++++
 8 files changed, 3093 insertions(+)
 create mode 100644 include/zinc/curve25519.h
 create mode 100644 lib/zinc/curve25519/curve25519-fiat32.h
 create mode 100644 lib/zinc/curve25519/curve25519-hacl64.h
 create mode 100644 lib/zinc/curve25519/curve25519.c
 create mode 100644 lib/zinc/selftest/curve25519.h

diff --git a/include/zinc/curve25519.h b/include/zinc/curve25519.h
new file mode 100644
index 000000000000..0e1caf02f1d8
--- /dev/null
+++ b/include/zinc/curve25519.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _ZINC_CURVE25519_H
+#define _ZINC_CURVE25519_H
+
+#include <linux/types.h>
+
+enum curve25519_lengths {
+	CURVE25519_POINT_SIZE = 32
+};
+
+bool __must_check curve25519(u8 mypublic[CURVE25519_POINT_SIZE],
+			     const u8 secret[CURVE25519_POINT_SIZE],
+			     const u8 basepoint[CURVE25519_POINT_SIZE]);
+void curve25519_generate_secret(u8 secret[CURVE25519_POINT_SIZE]);
+bool __must_check curve25519_generate_public(
+	u8 pub[CURVE25519_POINT_SIZE], const u8 secret[CURVE25519_POINT_SIZE]);
+
+void curve25519_fpu_init(void);
+
+#ifdef DEBUG
+bool curve25519_selftest(void);
+#endif
+
+#endif /* _ZINC_CURVE25519_H */
diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig
index 4db30012c2d6..e4e10a1f4027 100644
--- a/lib/zinc/Kconfig
+++ b/lib/zinc/Kconfig
@@ -21,6 +21,11 @@ config ZINC_BLAKE2S
 	bool
 	select ZINC
 
+config ZINC_CURVE25519
+	bool
+	select ZINC
+	select CONFIG_CRYPTO
+
 config ZINC_DEBUG
 	bool "Zinc cryptography library debugging and self-tests"
 	depends on ZINC
diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 49455abfdf5e..25826d3eb74a 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -51,6 +51,10 @@ ifeq ($(CONFIG_ZINC_CHACHA20POLY1305),y)
 zinc-y += chacha20poly1305.o
 endif
 
+ifeq ($(CONFIG_ZINC_CURVE25519),y)
+zinc-y += curve25519/curve25519.o
+endif
+
 ifeq ($(CONFIG_ZINC_BLAKE2S),y)
 zinc-y += blake2s/blake2s.o
 ifeq ($(CONFIG_ZINC_ARCH_X86_64),y)
diff --git a/lib/zinc/curve25519/curve25519-fiat32.h b/lib/zinc/curve25519/curve25519-fiat32.h
new file mode 100644
index 000000000000..8fa327cb1edc
--- /dev/null
+++ b/lib/zinc/curve25519/curve25519-fiat32.h
@@ -0,0 +1,862 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2016 The fiat-crypto Authors.
+ * Copyright (C) 2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is a machine-generated formally verified implementation of Curve25519
+ * ECDH from: <https://github.com/mit-plv/fiat-crypto>. Though originally
+ * machine generated, it has been tweaked to be suitable for use in the kernel.
+ * It is optimized for 32-bit machines and machines that cannot work efficiently
+ * with 128-bit integer types.
+ */
+
+/* fe means field element. Here the field is \Z/(2^255-19). An element t,
+ * entries t[0]...t[9], represents the integer t[0]+2^26 t[1]+2^51 t[2]+2^77
+ * t[3]+2^102 t[4]+...+2^230 t[9].
+ * fe limbs are bounded by 1.125*2^26,1.125*2^25,1.125*2^26,1.125*2^25,etc.
+ * Multiplication and carrying produce fe from fe_loose.
+ */
+typedef struct fe { u32 v[10]; } fe;
+
+/* fe_loose limbs are bounded by 3.375*2^26,3.375*2^25,3.375*2^26,3.375*2^25,etc
+ * Addition and subtraction produce fe_loose from (fe, fe).
+ */
+typedef struct fe_loose { u32 v[10]; } fe_loose;
+
+static __always_inline void fe_frombytes_impl(u32 h[10], const u8 *s)
+{
+	/* Ignores top bit of s. */
+	u32 a0 = get_unaligned_le32(s);
+	u32 a1 = get_unaligned_le32(s+4);
+	u32 a2 = get_unaligned_le32(s+8);
+	u32 a3 = get_unaligned_le32(s+12);
+	u32 a4 = get_unaligned_le32(s+16);
+	u32 a5 = get_unaligned_le32(s+20);
+	u32 a6 = get_unaligned_le32(s+24);
+	u32 a7 = get_unaligned_le32(s+28);
+	h[0] = a0&((1<<26)-1);                    /* 26 used, 32-26 left.   26 */
+	h[1] = (a0>>26) | ((a1&((1<<19)-1))<< 6); /* (32-26) + 19 =  6+19 = 25 */
+	h[2] = (a1>>19) | ((a2&((1<<13)-1))<<13); /* (32-19) + 13 = 13+13 = 26 */
+	h[3] = (a2>>13) | ((a3&((1<< 6)-1))<<19); /* (32-13) +  6 = 19+ 6 = 25 */
+	h[4] = (a3>> 6);                          /* (32- 6)              = 26 */
+	h[5] = a4&((1<<25)-1);                    /*                        25 */
+	h[6] = (a4>>25) | ((a5&((1<<19)-1))<< 7); /* (32-25) + 19 =  7+19 = 26 */
+	h[7] = (a5>>19) | ((a6&((1<<12)-1))<<13); /* (32-19) + 12 = 13+12 = 25 */
+	h[8] = (a6>>12) | ((a7&((1<< 6)-1))<<20); /* (32-12) +  6 = 20+ 6 = 26 */
+	h[9] = (a7>> 6)&((1<<25)-1); /*                                     25 */
+}
+
+static __always_inline void fe_frombytes(fe *h, const u8 *s)
+{
+	fe_frombytes_impl(h->v, s);
+}
+
+static __always_inline u8 /*bool*/
+addcarryx_u25(u8 /*bool*/ c, u32 a, u32 b, u32 *low)
+{
+	/* This function extracts 25 bits of result and 1 bit of carry
+	 * (26 total), so a 32-bit intermediate is sufficient.
+	 */
+	u32 x = a + b + c;
+	*low = x & ((1 << 25) - 1);
+	return (x >> 25) & 1;
+}
+
+static __always_inline u8 /*bool*/
+addcarryx_u26(u8 /*bool*/ c, u32 a, u32 b, u32 *low)
+{
+	/* This function extracts 26 bits of result and 1 bit of carry
+	 * (27 total), so a 32-bit intermediate is sufficient.
+	 */
+	u32 x = a + b + c;
+	*low = x & ((1 << 26) - 1);
+	return (x >> 26) & 1;
+}
+
+static __always_inline u8 /*bool*/
+subborrow_u25(u8 /*bool*/ c, u32 a, u32 b, u32 *low)
+{
+	/* This function extracts 25 bits of result and 1 bit of borrow
+	 * (26 total), so a 32-bit intermediate is sufficient.
+	 */
+	u32 x = a - b - c;
+	*low = x & ((1 << 25) - 1);
+	return x >> 31;
+}
+
+static __always_inline u8 /*bool*/
+subborrow_u26(u8 /*bool*/ c, u32 a, u32 b, u32 *low)
+{
+	/* This function extracts 26 bits of result and 1 bit of borrow
+	 *(27 total), so a 32-bit intermediate is sufficient.
+	 */
+	u32 x = a - b - c;
+	*low = x & ((1 << 26) - 1);
+	return x >> 31;
+}
+
+static __always_inline u32 cmovznz32(u32 t, u32 z, u32 nz)
+{
+	t = -!!t; /* all set if nonzero, 0 if 0 */
+	return (t&nz) | ((~t)&z);
+}
+
+static __always_inline void fe_freeze(u32 out[10], const u32 in1[10])
+{
+	{ const u32 x17 = in1[9];
+	{ const u32 x18 = in1[8];
+	{ const u32 x16 = in1[7];
+	{ const u32 x14 = in1[6];
+	{ const u32 x12 = in1[5];
+	{ const u32 x10 = in1[4];
+	{ const u32 x8 = in1[3];
+	{ const u32 x6 = in1[2];
+	{ const u32 x4 = in1[1];
+	{ const u32 x2 = in1[0];
+	{ u32 x20; u8/*bool*/ x21 = subborrow_u26(0x0, x2, 0x3ffffed, &x20);
+	{ u32 x23; u8/*bool*/ x24 = subborrow_u25(x21, x4, 0x1ffffff, &x23);
+	{ u32 x26; u8/*bool*/ x27 = subborrow_u26(x24, x6, 0x3ffffff, &x26);
+	{ u32 x29; u8/*bool*/ x30 = subborrow_u25(x27, x8, 0x1ffffff, &x29);
+	{ u32 x32; u8/*bool*/ x33 = subborrow_u26(x30, x10, 0x3ffffff, &x32);
+	{ u32 x35; u8/*bool*/ x36 = subborrow_u25(x33, x12, 0x1ffffff, &x35);
+	{ u32 x38; u8/*bool*/ x39 = subborrow_u26(x36, x14, 0x3ffffff, &x38);
+	{ u32 x41; u8/*bool*/ x42 = subborrow_u25(x39, x16, 0x1ffffff, &x41);
+	{ u32 x44; u8/*bool*/ x45 = subborrow_u26(x42, x18, 0x3ffffff, &x44);
+	{ u32 x47; u8/*bool*/ x48 = subborrow_u25(x45, x17, 0x1ffffff, &x47);
+	{ u32 x49 = cmovznz32(x48, 0x0, 0xffffffff);
+	{ u32 x50 = (x49 & 0x3ffffed);
+	{ u32 x52; u8/*bool*/ x53 = addcarryx_u26(0x0, x20, x50, &x52);
+	{ u32 x54 = (x49 & 0x1ffffff);
+	{ u32 x56; u8/*bool*/ x57 = addcarryx_u25(x53, x23, x54, &x56);
+	{ u32 x58 = (x49 & 0x3ffffff);
+	{ u32 x60; u8/*bool*/ x61 = addcarryx_u26(x57, x26, x58, &x60);
+	{ u32 x62 = (x49 & 0x1ffffff);
+	{ u32 x64; u8/*bool*/ x65 = addcarryx_u25(x61, x29, x62, &x64);
+	{ u32 x66 = (x49 & 0x3ffffff);
+	{ u32 x68; u8/*bool*/ x69 = addcarryx_u26(x65, x32, x66, &x68);
+	{ u32 x70 = (x49 & 0x1ffffff);
+	{ u32 x72; u8/*bool*/ x73 = addcarryx_u25(x69, x35, x70, &x72);
+	{ u32 x74 = (x49 & 0x3ffffff);
+	{ u32 x76; u8/*bool*/ x77 = addcarryx_u26(x73, x38, x74, &x76);
+	{ u32 x78 = (x49 & 0x1ffffff);
+	{ u32 x80; u8/*bool*/ x81 = addcarryx_u25(x77, x41, x78, &x80);
+	{ u32 x82 = (x49 & 0x3ffffff);
+	{ u32 x84; u8/*bool*/ x85 = addcarryx_u26(x81, x44, x82, &x84);
+	{ u32 x86 = (x49 & 0x1ffffff);
+	{ u32 x88; addcarryx_u25(x85, x47, x86, &x88);
+	out[0] = x52;
+	out[1] = x56;
+	out[2] = x60;
+	out[3] = x64;
+	out[4] = x68;
+	out[5] = x72;
+	out[6] = x76;
+	out[7] = x80;
+	out[8] = x84;
+	out[9] = x88;
+	}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
+}
+
+static __always_inline void fe_tobytes(u8 s[32], const fe *f)
+{
+	u32 h[10];
+	fe_freeze(h, f->v);
+	s[0] = h[0] >> 0;
+	s[1] = h[0] >> 8;
+	s[2] = h[0] >> 16;
+	s[3] = (h[0] >> 24) | (h[1] << 2);
+	s[4] = h[1] >> 6;
+	s[5] = h[1] >> 14;
+	s[6] = (h[1] >> 22) | (h[2] << 3);
+	s[7] = h[2] >> 5;
+	s[8] = h[2] >> 13;
+	s[9] = (h[2] >> 21) | (h[3] << 5);
+	s[10] = h[3] >> 3;
+	s[11] = h[3] >> 11;
+	s[12] = (h[3] >> 19) | (h[4] << 6);
+	s[13] = h[4] >> 2;
+	s[14] = h[4] >> 10;
+	s[15] = h[4] >> 18;
+	s[16] = h[5] >> 0;
+	s[17] = h[5] >> 8;
+	s[18] = h[5] >> 16;
+	s[19] = (h[5] >> 24) | (h[6] << 1);
+	s[20] = h[6] >> 7;
+	s[21] = h[6] >> 15;
+	s[22] = (h[6] >> 23) | (h[7] << 3);
+	s[23] = h[7] >> 5;
+	s[24] = h[7] >> 13;
+	s[25] = (h[7] >> 21) | (h[8] << 4);
+	s[26] = h[8] >> 4;
+	s[27] = h[8] >> 12;
+	s[28] = (h[8] >> 20) | (h[9] << 6);
+	s[29] = h[9] >> 2;
+	s[30] = h[9] >> 10;
+	s[31] = h[9] >> 18;
+}
+
+/* h = f */
+static __always_inline void fe_copy(fe *h, const fe *f)
+{
+	memmove(h, f, sizeof(u32) * 10);
+}
+
+static __always_inline void fe_copy_lt(fe_loose *h, const fe *f)
+{
+	memmove(h, f, sizeof(u32) * 10);
+}
+
+/* h = 0 */
+static __always_inline void fe_0(fe *h)
+{
+	memset(h, 0, sizeof(u32) * 10);
+}
+
+/* h = 1 */
+static __always_inline void fe_1(fe *h)
+{
+	memset(h, 0, sizeof(u32) * 10);
+	h->v[0] = 1;
+}
+
+static void fe_add_impl(u32 out[10], const u32 in1[10], const u32 in2[10])
+{
+	{ const u32 x20 = in1[9];
+	{ const u32 x21 = in1[8];
+	{ const u32 x19 = in1[7];
+	{ const u32 x17 = in1[6];
+	{ const u32 x15 = in1[5];
+	{ const u32 x13 = in1[4];
+	{ const u32 x11 = in1[3];
+	{ const u32 x9 = in1[2];
+	{ const u32 x7 = in1[1];
+	{ const u32 x5 = in1[0];
+	{ const u32 x38 = in2[9];
+	{ const u32 x39 = in2[8];
+	{ const u32 x37 = in2[7];
+	{ const u32 x35 = in2[6];
+	{ const u32 x33 = in2[5];
+	{ const u32 x31 = in2[4];
+	{ const u32 x29 = in2[3];
+	{ const u32 x27 = in2[2];
+	{ const u32 x25 = in2[1];
+	{ const u32 x23 = in2[0];
+	out[0] = (x5 + x23);
+	out[1] = (x7 + x25);
+	out[2] = (x9 + x27);
+	out[3] = (x11 + x29);
+	out[4] = (x13 + x31);
+	out[5] = (x15 + x33);
+	out[6] = (x17 + x35);
+	out[7] = (x19 + x37);
+	out[8] = (x21 + x39);
+	out[9] = (x20 + x38);
+	}}}}}}}}}}}}}}}}}}}}
+}
+
+/* h = f + g
+ * Can overlap h with f or g.
+ */
+static __always_inline void fe_add(fe_loose *h, const fe *f, const fe *g)
+{
+	fe_add_impl(h->v, f->v, g->v);
+}
+
+static void fe_sub_impl(u32 out[10], const u32 in1[10], const u32 in2[10])
+{
+	{ const u32 x20 = in1[9];
+	{ const u32 x21 = in1[8];
+	{ const u32 x19 = in1[7];
+	{ const u32 x17 = in1[6];
+	{ const u32 x15 = in1[5];
+	{ const u32 x13 = in1[4];
+	{ const u32 x11 = in1[3];
+	{ const u32 x9 = in1[2];
+	{ const u32 x7 = in1[1];
+	{ const u32 x5 = in1[0];
+	{ const u32 x38 = in2[9];
+	{ const u32 x39 = in2[8];
+	{ const u32 x37 = in2[7];
+	{ const u32 x35 = in2[6];
+	{ const u32 x33 = in2[5];
+	{ const u32 x31 = in2[4];
+	{ const u32 x29 = in2[3];
+	{ const u32 x27 = in2[2];
+	{ const u32 x25 = in2[1];
+	{ const u32 x23 = in2[0];
+	out[0] = ((0x7ffffda + x5) - x23);
+	out[1] = ((0x3fffffe + x7) - x25);
+	out[2] = ((0x7fffffe + x9) - x27);
+	out[3] = ((0x3fffffe + x11) - x29);
+	out[4] = ((0x7fffffe + x13) - x31);
+	out[5] = ((0x3fffffe + x15) - x33);
+	out[6] = ((0x7fffffe + x17) - x35);
+	out[7] = ((0x3fffffe + x19) - x37);
+	out[8] = ((0x7fffffe + x21) - x39);
+	out[9] = ((0x3fffffe + x20) - x38);
+	}}}}}}}}}}}}}}}}}}}}
+}
+
+/* h = f - g
+ * Can overlap h with f or g.
+ */
+static __always_inline void fe_sub(fe_loose *h, const fe *f, const fe *g)
+{
+	fe_sub_impl(h->v, f->v, g->v);
+}
+
+static void fe_mul_impl(u32 out[10], const u32 in1[10], const u32 in2[10])
+{
+	{ const u32 x20 = in1[9];
+	{ const u32 x21 = in1[8];
+	{ const u32 x19 = in1[7];
+	{ const u32 x17 = in1[6];
+	{ const u32 x15 = in1[5];
+	{ const u32 x13 = in1[4];
+	{ const u32 x11 = in1[3];
+	{ const u32 x9 = in1[2];
+	{ const u32 x7 = in1[1];
+	{ const u32 x5 = in1[0];
+	{ const u32 x38 = in2[9];
+	{ const u32 x39 = in2[8];
+	{ const u32 x37 = in2[7];
+	{ const u32 x35 = in2[6];
+	{ const u32 x33 = in2[5];
+	{ const u32 x31 = in2[4];
+	{ const u32 x29 = in2[3];
+	{ const u32 x27 = in2[2];
+	{ const u32 x25 = in2[1];
+	{ const u32 x23 = in2[0];
+	{ u64 x40 = ((u64)x23 * x5);
+	{ u64 x41 = (((u64)x23 * x7) + ((u64)x25 * x5));
+	{ u64 x42 = ((((u64)(0x2 * x25) * x7) + ((u64)x23 * x9)) + ((u64)x27 * x5));
+	{ u64 x43 = (((((u64)x25 * x9) + ((u64)x27 * x7)) + ((u64)x23 * x11)) + ((u64)x29 * x5));
+	{ u64 x44 = (((((u64)x27 * x9) + (0x2 * (((u64)x25 * x11) + ((u64)x29 * x7)))) + ((u64)x23 * x13)) + ((u64)x31 * x5));
+	{ u64 x45 = (((((((u64)x27 * x11) + ((u64)x29 * x9)) + ((u64)x25 * x13)) + ((u64)x31 * x7)) + ((u64)x23 * x15)) + ((u64)x33 * x5));
+	{ u64 x46 = (((((0x2 * ((((u64)x29 * x11) + ((u64)x25 * x15)) + ((u64)x33 * x7))) + ((u64)x27 * x13)) + ((u64)x31 * x9)) + ((u64)x23 * x17)) + ((u64)x35 * x5));
+	{ u64 x47 = (((((((((u64)x29 * x13) + ((u64)x31 * x11)) + ((u64)x27 * x15)) + ((u64)x33 * x9)) + ((u64)x25 * x17)) + ((u64)x35 * x7)) + ((u64)x23 * x19)) + ((u64)x37 * x5));
+	{ u64 x48 = (((((((u64)x31 * x13) + (0x2 * (((((u64)x29 * x15) + ((u64)x33 * x11)) + ((u64)x25 * x19)) + ((u64)x37 * x7)))) + ((u64)x27 * x17)) + ((u64)x35 * x9)) + ((u64)x23 * x21)) + ((u64)x39 * x5));
+	{ u64 x49 = (((((((((((u64)x31 * x15) + ((u64)x33 * x13)) + ((u64)x29 * x17)) + ((u64)x35 * x11)) + ((u64)x27 * x19)) + ((u64)x37 * x9)) + ((u64)x25 * x21)) + ((u64)x39 * x7)) + ((u64)x23 * x20)) + ((u64)x38 * x5));
+	{ u64 x50 = (((((0x2 * ((((((u64)x33 * x15) + ((u64)x29 * x19)) + ((u64)x37 * x11)) + ((u64)x25 * x20)) + ((u64)x38 * x7))) + ((u64)x31 * x17)) + ((u64)x35 * x13)) + ((u64)x27 * x21)) + ((u64)x39 * x9));
+	{ u64 x51 = (((((((((u64)x33 * x17) + ((u64)x35 * x15)) + ((u64)x31 * x19)) + ((u64)x37 * x13)) + ((u64)x29 * x21)) + ((u64)x39 * x11)) + ((u64)x27 * x20)) + ((u64)x38 * x9));
+	{ u64 x52 = (((((u64)x35 * x17) + (0x2 * (((((u64)x33 * x19) + ((u64)x37 * x15)) + ((u64)x29 * x20)) + ((u64)x38 * x11)))) + ((u64)x31 * x21)) + ((u64)x39 * x13));
+	{ u64 x53 = (((((((u64)x35 * x19) + ((u64)x37 * x17)) + ((u64)x33 * x21)) + ((u64)x39 * x15)) + ((u64)x31 * x20)) + ((u64)x38 * x13));
+	{ u64 x54 = (((0x2 * ((((u64)x37 * x19) + ((u64)x33 * x20)) + ((u64)x38 * x15))) + ((u64)x35 * x21)) + ((u64)x39 * x17));
+	{ u64 x55 = (((((u64)x37 * x21) + ((u64)x39 * x19)) + ((u64)x35 * x20)) + ((u64)x38 * x17));
+	{ u64 x56 = (((u64)x39 * x21) + (0x2 * (((u64)x37 * x20) + ((u64)x38 * x19))));
+	{ u64 x57 = (((u64)x39 * x20) + ((u64)x38 * x21));
+	{ u64 x58 = ((u64)(0x2 * x38) * x20);
+	{ u64 x59 = (x48 + (x58 << 0x4));
+	{ u64 x60 = (x59 + (x58 << 0x1));
+	{ u64 x61 = (x60 + x58);
+	{ u64 x62 = (x47 + (x57 << 0x4));
+	{ u64 x63 = (x62 + (x57 << 0x1));
+	{ u64 x64 = (x63 + x57);
+	{ u64 x65 = (x46 + (x56 << 0x4));
+	{ u64 x66 = (x65 + (x56 << 0x1));
+	{ u64 x67 = (x66 + x56);
+	{ u64 x68 = (x45 + (x55 << 0x4));
+	{ u64 x69 = (x68 + (x55 << 0x1));
+	{ u64 x70 = (x69 + x55);
+	{ u64 x71 = (x44 + (x54 << 0x4));
+	{ u64 x72 = (x71 + (x54 << 0x1));
+	{ u64 x73 = (x72 + x54);
+	{ u64 x74 = (x43 + (x53 << 0x4));
+	{ u64 x75 = (x74 + (x53 << 0x1));
+	{ u64 x76 = (x75 + x53);
+	{ u64 x77 = (x42 + (x52 << 0x4));
+	{ u64 x78 = (x77 + (x52 << 0x1));
+	{ u64 x79 = (x78 + x52);
+	{ u64 x80 = (x41 + (x51 << 0x4));
+	{ u64 x81 = (x80 + (x51 << 0x1));
+	{ u64 x82 = (x81 + x51);
+	{ u64 x83 = (x40 + (x50 << 0x4));
+	{ u64 x84 = (x83 + (x50 << 0x1));
+	{ u64 x85 = (x84 + x50);
+	{ u64 x86 = (x85 >> 0x1a);
+	{ u32 x87 = ((u32)x85 & 0x3ffffff);
+	{ u64 x88 = (x86 + x82);
+	{ u64 x89 = (x88 >> 0x19);
+	{ u32 x90 = ((u32)x88 & 0x1ffffff);
+	{ u64 x91 = (x89 + x79);
+	{ u64 x92 = (x91 >> 0x1a);
+	{ u32 x93 = ((u32)x91 & 0x3ffffff);
+	{ u64 x94 = (x92 + x76);
+	{ u64 x95 = (x94 >> 0x19);
+	{ u32 x96 = ((u32)x94 & 0x1ffffff);
+	{ u64 x97 = (x95 + x73);
+	{ u64 x98 = (x97 >> 0x1a);
+	{ u32 x99 = ((u32)x97 & 0x3ffffff);
+	{ u64 x100 = (x98 + x70);
+	{ u64 x101 = (x100 >> 0x19);
+	{ u32 x102 = ((u32)x100 & 0x1ffffff);
+	{ u64 x103 = (x101 + x67);
+	{ u64 x104 = (x103 >> 0x1a);
+	{ u32 x105 = ((u32)x103 & 0x3ffffff);
+	{ u64 x106 = (x104 + x64);
+	{ u64 x107 = (x106 >> 0x19);
+	{ u32 x108 = ((u32)x106 & 0x1ffffff);
+	{ u64 x109 = (x107 + x61);
+	{ u64 x110 = (x109 >> 0x1a);
+	{ u32 x111 = ((u32)x109 & 0x3ffffff);
+	{ u64 x112 = (x110 + x49);
+	{ u64 x113 = (x112 >> 0x19);
+	{ u32 x114 = ((u32)x112 & 0x1ffffff);
+	{ u64 x115 = (x87 + (0x13 * x113));
+	{ u32 x116 = (u32) (x115 >> 0x1a);
+	{ u32 x117 = ((u32)x115 & 0x3ffffff);
+	{ u32 x118 = (x116 + x90);
+	{ u32 x119 = (x118 >> 0x19);
+	{ u32 x120 = (x118 & 0x1ffffff);
+	out[0] = x117;
+	out[1] = x120;
+	out[2] = (x119 + x93);
+	out[3] = x96;
+	out[4] = x99;
+	out[5] = x102;
+	out[6] = x105;
+	out[7] = x108;
+	out[8] = x111;
+	out[9] = x114;
+	}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
+}
+
+static __always_inline void fe_mul_ttt(fe *h, const fe *f, const fe *g)
+{
+	fe_mul_impl(h->v, f->v, g->v);
+}
+
+static __always_inline void fe_mul_tlt(fe *h, const fe_loose *f, const fe *g)
+{
+	fe_mul_impl(h->v, f->v, g->v);
+}
+
+static __always_inline void
+fe_mul_tll(fe *h, const fe_loose *f, const fe_loose *g)
+{
+	fe_mul_impl(h->v, f->v, g->v);
+}
+
+static void fe_sqr_impl(u32 out[10], const u32 in1[10])
+{
+	{ const u32 x17 = in1[9];
+	{ const u32 x18 = in1[8];
+	{ const u32 x16 = in1[7];
+	{ const u32 x14 = in1[6];
+	{ const u32 x12 = in1[5];
+	{ const u32 x10 = in1[4];
+	{ const u32 x8 = in1[3];
+	{ const u32 x6 = in1[2];
+	{ const u32 x4 = in1[1];
+	{ const u32 x2 = in1[0];
+	{ u64 x19 = ((u64)x2 * x2);
+	{ u64 x20 = ((u64)(0x2 * x2) * x4);
+	{ u64 x21 = (0x2 * (((u64)x4 * x4) + ((u64)x2 * x6)));
+	{ u64 x22 = (0x2 * (((u64)x4 * x6) + ((u64)x2 * x8)));
+	{ u64 x23 = ((((u64)x6 * x6) + ((u64)(0x4 * x4) * x8)) + ((u64)(0x2 * x2) * x10));
+	{ u64 x24 = (0x2 * ((((u64)x6 * x8) + ((u64)x4 * x10)) + ((u64)x2 * x12)));
+	{ u64 x25 = (0x2 * (((((u64)x8 * x8) + ((u64)x6 * x10)) + ((u64)x2 * x14)) + ((u64)(0x2 * x4) * x12)));
+	{ u64 x26 = (0x2 * (((((u64)x8 * x10) + ((u64)x6 * x12)) + ((u64)x4 * x14)) + ((u64)x2 * x16)));
+	{ u64 x27 = (((u64)x10 * x10) + (0x2 * ((((u64)x6 * x14) + ((u64)x2 * x18)) + (0x2 * (((u64)x4 * x16) + ((u64)x8 * x12))))));
+	{ u64 x28 = (0x2 * ((((((u64)x10 * x12) + ((u64)x8 * x14)) + ((u64)x6 * x16)) + ((u64)x4 * x18)) + ((u64)x2 * x17)));
+	{ u64 x29 = (0x2 * (((((u64)x12 * x12) + ((u64)x10 * x14)) + ((u64)x6 * x18)) + (0x2 * (((u64)x8 * x16) + ((u64)x4 * x17)))));
+	{ u64 x30 = (0x2 * (((((u64)x12 * x14) + ((u64)x10 * x16)) + ((u64)x8 * x18)) + ((u64)x6 * x17)));
+	{ u64 x31 = (((u64)x14 * x14) + (0x2 * (((u64)x10 * x18) + (0x2 * (((u64)x12 * x16) + ((u64)x8 * x17))))));
+	{ u64 x32 = (0x2 * ((((u64)x14 * x16) + ((u64)x12 * x18)) + ((u64)x10 * x17)));
+	{ u64 x33 = (0x2 * ((((u64)x16 * x16) + ((u64)x14 * x18)) + ((u64)(0x2 * x12) * x17)));
+	{ u64 x34 = (0x2 * (((u64)x16 * x18) + ((u64)x14 * x17)));
+	{ u64 x35 = (((u64)x18 * x18) + ((u64)(0x4 * x16) * x17));
+	{ u64 x36 = ((u64)(0x2 * x18) * x17);
+	{ u64 x37 = ((u64)(0x2 * x17) * x17);
+	{ u64 x38 = (x27 + (x37 << 0x4));
+	{ u64 x39 = (x38 + (x37 << 0x1));
+	{ u64 x40 = (x39 + x37);
+	{ u64 x41 = (x26 + (x36 << 0x4));
+	{ u64 x42 = (x41 + (x36 << 0x1));
+	{ u64 x43 = (x42 + x36);
+	{ u64 x44 = (x25 + (x35 << 0x4));
+	{ u64 x45 = (x44 + (x35 << 0x1));
+	{ u64 x46 = (x45 + x35);
+	{ u64 x47 = (x24 + (x34 << 0x4));
+	{ u64 x48 = (x47 + (x34 << 0x1));
+	{ u64 x49 = (x48 + x34);
+	{ u64 x50 = (x23 + (x33 << 0x4));
+	{ u64 x51 = (x50 + (x33 << 0x1));
+	{ u64 x52 = (x51 + x33);
+	{ u64 x53 = (x22 + (x32 << 0x4));
+	{ u64 x54 = (x53 + (x32 << 0x1));
+	{ u64 x55 = (x54 + x32);
+	{ u64 x56 = (x21 + (x31 << 0x4));
+	{ u64 x57 = (x56 + (x31 << 0x1));
+	{ u64 x58 = (x57 + x31);
+	{ u64 x59 = (x20 + (x30 << 0x4));
+	{ u64 x60 = (x59 + (x30 << 0x1));
+	{ u64 x61 = (x60 + x30);
+	{ u64 x62 = (x19 + (x29 << 0x4));
+	{ u64 x63 = (x62 + (x29 << 0x1));
+	{ u64 x64 = (x63 + x29);
+	{ u64 x65 = (x64 >> 0x1a);
+	{ u32 x66 = ((u32)x64 & 0x3ffffff);
+	{ u64 x67 = (x65 + x61);
+	{ u64 x68 = (x67 >> 0x19);
+	{ u32 x69 = ((u32)x67 & 0x1ffffff);
+	{ u64 x70 = (x68 + x58);
+	{ u64 x71 = (x70 >> 0x1a);
+	{ u32 x72 = ((u32)x70 & 0x3ffffff);
+	{ u64 x73 = (x71 + x55);
+	{ u64 x74 = (x73 >> 0x19);
+	{ u32 x75 = ((u32)x73 & 0x1ffffff);
+	{ u64 x76 = (x74 + x52);
+	{ u64 x77 = (x76 >> 0x1a);
+	{ u32 x78 = ((u32)x76 & 0x3ffffff);
+	{ u64 x79 = (x77 + x49);
+	{ u64 x80 = (x79 >> 0x19);
+	{ u32 x81 = ((u32)x79 & 0x1ffffff);
+	{ u64 x82 = (x80 + x46);
+	{ u64 x83 = (x82 >> 0x1a);
+	{ u32 x84 = ((u32)x82 & 0x3ffffff);
+	{ u64 x85 = (x83 + x43);
+	{ u64 x86 = (x85 >> 0x19);
+	{ u32 x87 = ((u32)x85 & 0x1ffffff);
+	{ u64 x88 = (x86 + x40);
+	{ u64 x89 = (x88 >> 0x1a);
+	{ u32 x90 = ((u32)x88 & 0x3ffffff);
+	{ u64 x91 = (x89 + x28);
+	{ u64 x92 = (x91 >> 0x19);
+	{ u32 x93 = ((u32)x91 & 0x1ffffff);
+	{ u64 x94 = (x66 + (0x13 * x92));
+	{ u32 x95 = (u32) (x94 >> 0x1a);
+	{ u32 x96 = ((u32)x94 & 0x3ffffff);
+	{ u32 x97 = (x95 + x69);
+	{ u32 x98 = (x97 >> 0x19);
+	{ u32 x99 = (x97 & 0x1ffffff);
+	out[0] = x96;
+	out[1] = x99;
+	out[2] = (x98 + x72);
+	out[3] = x75;
+	out[4] = x78;
+	out[5] = x81;
+	out[6] = x84;
+	out[7] = x87;
+	out[8] = x90;
+	out[9] = x93;
+	}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
+}
+
+static __always_inline void fe_sq_tl(fe *h, const fe_loose *f)
+{
+	fe_sqr_impl(h->v, f->v);
+}
+
+static __always_inline void fe_sq_tt(fe *h, const fe *f)
+{
+	fe_sqr_impl(h->v, f->v);
+}
+
+static __always_inline void fe_loose_invert(fe *out, const fe_loose *z)
+{
+	fe t0;
+	fe t1;
+	fe t2;
+	fe t3;
+	int i;
+
+	fe_sq_tl(&t0, z);
+	fe_sq_tt(&t1, &t0);
+	for (i = 1; i < 2; ++i)
+		fe_sq_tt(&t1, &t1);
+	fe_mul_tlt(&t1, z, &t1);
+	fe_mul_ttt(&t0, &t0, &t1);
+	fe_sq_tt(&t2, &t0);
+	fe_mul_ttt(&t1, &t1, &t2);
+	fe_sq_tt(&t2, &t1);
+	for (i = 1; i < 5; ++i)
+		fe_sq_tt(&t2, &t2);
+	fe_mul_ttt(&t1, &t2, &t1);
+	fe_sq_tt(&t2, &t1);
+	for (i = 1; i < 10; ++i)
+		fe_sq_tt(&t2, &t2);
+	fe_mul_ttt(&t2, &t2, &t1);
+	fe_sq_tt(&t3, &t2);
+	for (i = 1; i < 20; ++i)
+		fe_sq_tt(&t3, &t3);
+	fe_mul_ttt(&t2, &t3, &t2);
+	fe_sq_tt(&t2, &t2);
+	for (i = 1; i < 10; ++i)
+		fe_sq_tt(&t2, &t2);
+	fe_mul_ttt(&t1, &t2, &t1);
+	fe_sq_tt(&t2, &t1);
+	for (i = 1; i < 50; ++i)
+		fe_sq_tt(&t2, &t2);
+	fe_mul_ttt(&t2, &t2, &t1);
+	fe_sq_tt(&t3, &t2);
+	for (i = 1; i < 100; ++i)
+		fe_sq_tt(&t3, &t3);
+	fe_mul_ttt(&t2, &t3, &t2);
+	fe_sq_tt(&t2, &t2);
+	for (i = 1; i < 50; ++i)
+		fe_sq_tt(&t2, &t2);
+	fe_mul_ttt(&t1, &t2, &t1);
+	fe_sq_tt(&t1, &t1);
+	for (i = 1; i < 5; ++i)
+		fe_sq_tt(&t1, &t1);
+	fe_mul_ttt(out, &t1, &t0);
+}
+
+static __always_inline void fe_invert(fe *out, const fe *z)
+{
+	fe_loose l;
+	fe_copy_lt(&l, z);
+	fe_loose_invert(out, &l);
+}
+
+/* Replace (f,g) with (g,f) if b == 1;
+ * replace (f,g) with (f,g) if b == 0.
+ *
+ * Preconditions: b in {0,1}
+ */
+static __always_inline void fe_cswap(fe *f, fe *g, unsigned int b)
+{
+	unsigned i;
+	b = 0-b;
+	for (i = 0; i < 10; i++) {
+		u32 x = f->v[i] ^ g->v[i];
+		x &= b;
+		f->v[i] ^= x;
+		g->v[i] ^= x;
+	}
+}
+
+/* NOTE: based on fiat-crypto fe_mul, edited for in2=121666, 0, 0.*/
+static __always_inline void fe_mul_121666_impl(u32 out[10], const u32 in1[10])
+{
+	{ const u32 x20 = in1[9];
+	{ const u32 x21 = in1[8];
+	{ const u32 x19 = in1[7];
+	{ const u32 x17 = in1[6];
+	{ const u32 x15 = in1[5];
+	{ const u32 x13 = in1[4];
+	{ const u32 x11 = in1[3];
+	{ const u32 x9 = in1[2];
+	{ const u32 x7 = in1[1];
+	{ const u32 x5 = in1[0];
+	{ const u32 x38 = 0;
+	{ const u32 x39 = 0;
+	{ const u32 x37 = 0;
+	{ const u32 x35 = 0;
+	{ const u32 x33 = 0;
+	{ const u32 x31 = 0;
+	{ const u32 x29 = 0;
+	{ const u32 x27 = 0;
+	{ const u32 x25 = 0;
+	{ const u32 x23 = 121666;
+	{ u64 x40 = ((u64)x23 * x5);
+	{ u64 x41 = (((u64)x23 * x7) + ((u64)x25 * x5));
+	{ u64 x42 = ((((u64)(0x2 * x25) * x7) + ((u64)x23 * x9)) + ((u64)x27 * x5));
+	{ u64 x43 = (((((u64)x25 * x9) + ((u64)x27 * x7)) + ((u64)x23 * x11)) + ((u64)x29 * x5));
+	{ u64 x44 = (((((u64)x27 * x9) + (0x2 * (((u64)x25 * x11) + ((u64)x29 * x7)))) + ((u64)x23 * x13)) + ((u64)x31 * x5));
+	{ u64 x45 = (((((((u64)x27 * x11) + ((u64)x29 * x9)) + ((u64)x25 * x13)) + ((u64)x31 * x7)) + ((u64)x23 * x15)) + ((u64)x33 * x5));
+	{ u64 x46 = (((((0x2 * ((((u64)x29 * x11) + ((u64)x25 * x15)) + ((u64)x33 * x7))) + ((u64)x27 * x13)) + ((u64)x31 * x9)) + ((u64)x23 * x17)) + ((u64)x35 * x5));
+	{ u64 x47 = (((((((((u64)x29 * x13) + ((u64)x31 * x11)) + ((u64)x27 * x15)) + ((u64)x33 * x9)) + ((u64)x25 * x17)) + ((u64)x35 * x7)) + ((u64)x23 * x19)) + ((u64)x37 * x5));
+	{ u64 x48 = (((((((u64)x31 * x13) + (0x2 * (((((u64)x29 * x15) + ((u64)x33 * x11)) + ((u64)x25 * x19)) + ((u64)x37 * x7)))) + ((u64)x27 * x17)) + ((u64)x35 * x9)) + ((u64)x23 * x21)) + ((u64)x39 * x5));
+	{ u64 x49 = (((((((((((u64)x31 * x15) + ((u64)x33 * x13)) + ((u64)x29 * x17)) + ((u64)x35 * x11)) + ((u64)x27 * x19)) + ((u64)x37 * x9)) + ((u64)x25 * x21)) + ((u64)x39 * x7)) + ((u64)x23 * x20)) + ((u64)x38 * x5));
+	{ u64 x50 = (((((0x2 * ((((((u64)x33 * x15) + ((u64)x29 * x19)) + ((u64)x37 * x11)) + ((u64)x25 * x20)) + ((u64)x38 * x7))) + ((u64)x31 * x17)) + ((u64)x35 * x13)) + ((u64)x27 * x21)) + ((u64)x39 * x9));
+	{ u64 x51 = (((((((((u64)x33 * x17) + ((u64)x35 * x15)) + ((u64)x31 * x19)) + ((u64)x37 * x13)) + ((u64)x29 * x21)) + ((u64)x39 * x11)) + ((u64)x27 * x20)) + ((u64)x38 * x9));
+	{ u64 x52 = (((((u64)x35 * x17) + (0x2 * (((((u64)x33 * x19) + ((u64)x37 * x15)) + ((u64)x29 * x20)) + ((u64)x38 * x11)))) + ((u64)x31 * x21)) + ((u64)x39 * x13));
+	{ u64 x53 = (((((((u64)x35 * x19) + ((u64)x37 * x17)) + ((u64)x33 * x21)) + ((u64)x39 * x15)) + ((u64)x31 * x20)) + ((u64)x38 * x13));
+	{ u64 x54 = (((0x2 * ((((u64)x37 * x19) + ((u64)x33 * x20)) + ((u64)x38 * x15))) + ((u64)x35 * x21)) + ((u64)x39 * x17));
+	{ u64 x55 = (((((u64)x37 * x21) + ((u64)x39 * x19)) + ((u64)x35 * x20)) + ((u64)x38 * x17));
+	{ u64 x56 = (((u64)x39 * x21) + (0x2 * (((u64)x37 * x20) + ((u64)x38 * x19))));
+	{ u64 x57 = (((u64)x39 * x20) + ((u64)x38 * x21));
+	{ u64 x58 = ((u64)(0x2 * x38) * x20);
+	{ u64 x59 = (x48 + (x58 << 0x4));
+	{ u64 x60 = (x59 + (x58 << 0x1));
+	{ u64 x61 = (x60 + x58);
+	{ u64 x62 = (x47 + (x57 << 0x4));
+	{ u64 x63 = (x62 + (x57 << 0x1));
+	{ u64 x64 = (x63 + x57);
+	{ u64 x65 = (x46 + (x56 << 0x4));
+	{ u64 x66 = (x65 + (x56 << 0x1));
+	{ u64 x67 = (x66 + x56);
+	{ u64 x68 = (x45 + (x55 << 0x4));
+	{ u64 x69 = (x68 + (x55 << 0x1));
+	{ u64 x70 = (x69 + x55);
+	{ u64 x71 = (x44 + (x54 << 0x4));
+	{ u64 x72 = (x71 + (x54 << 0x1));
+	{ u64 x73 = (x72 + x54);
+	{ u64 x74 = (x43 + (x53 << 0x4));
+	{ u64 x75 = (x74 + (x53 << 0x1));
+	{ u64 x76 = (x75 + x53);
+	{ u64 x77 = (x42 + (x52 << 0x4));
+	{ u64 x78 = (x77 + (x52 << 0x1));
+	{ u64 x79 = (x78 + x52);
+	{ u64 x80 = (x41 + (x51 << 0x4));
+	{ u64 x81 = (x80 + (x51 << 0x1));
+	{ u64 x82 = (x81 + x51);
+	{ u64 x83 = (x40 + (x50 << 0x4));
+	{ u64 x84 = (x83 + (x50 << 0x1));
+	{ u64 x85 = (x84 + x50);
+	{ u64 x86 = (x85 >> 0x1a);
+	{ u32 x87 = ((u32)x85 & 0x3ffffff);
+	{ u64 x88 = (x86 + x82);
+	{ u64 x89 = (x88 >> 0x19);
+	{ u32 x90 = ((u32)x88 & 0x1ffffff);
+	{ u64 x91 = (x89 + x79);
+	{ u64 x92 = (x91 >> 0x1a);
+	{ u32 x93 = ((u32)x91 & 0x3ffffff);
+	{ u64 x94 = (x92 + x76);
+	{ u64 x95 = (x94 >> 0x19);
+	{ u32 x96 = ((u32)x94 & 0x1ffffff);
+	{ u64 x97 = (x95 + x73);
+	{ u64 x98 = (x97 >> 0x1a);
+	{ u32 x99 = ((u32)x97 & 0x3ffffff);
+	{ u64 x100 = (x98 + x70);
+	{ u64 x101 = (x100 >> 0x19);
+	{ u32 x102 = ((u32)x100 & 0x1ffffff);
+	{ u64 x103 = (x101 + x67);
+	{ u64 x104 = (x103 >> 0x1a);
+	{ u32 x105 = ((u32)x103 & 0x3ffffff);
+	{ u64 x106 = (x104 + x64);
+	{ u64 x107 = (x106 >> 0x19);
+	{ u32 x108 = ((u32)x106 & 0x1ffffff);
+	{ u64 x109 = (x107 + x61);
+	{ u64 x110 = (x109 >> 0x1a);
+	{ u32 x111 = ((u32)x109 & 0x3ffffff);
+	{ u64 x112 = (x110 + x49);
+	{ u64 x113 = (x112 >> 0x19);
+	{ u32 x114 = ((u32)x112 & 0x1ffffff);
+	{ u64 x115 = (x87 + (0x13 * x113));
+	{ u32 x116 = (u32) (x115 >> 0x1a);
+	{ u32 x117 = ((u32)x115 & 0x3ffffff);
+	{ u32 x118 = (x116 + x90);
+	{ u32 x119 = (x118 >> 0x19);
+	{ u32 x120 = (x118 & 0x1ffffff);
+	out[0] = x117;
+	out[1] = x120;
+	out[2] = (x119 + x93);
+	out[3] = x96;
+	out[4] = x99;
+	out[5] = x102;
+	out[6] = x105;
+	out[7] = x108;
+	out[8] = x111;
+	out[9] = x114;
+	}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
+}
+
+static __always_inline void fe_mul121666(fe *h, const fe_loose *f)
+{
+	fe_mul_121666_impl(h->v, f->v);
+}
+
+static void curve25519_generic(u8 out[CURVE25519_POINT_SIZE],
+			       const u8 scalar[CURVE25519_POINT_SIZE],
+			       const u8 point[CURVE25519_POINT_SIZE])
+{
+	fe x1, x2, z2, x3, z3, tmp0, tmp1;
+	fe_loose x2l, z2l, x3l, tmp0l, tmp1l;
+	unsigned swap = 0;
+	int pos;
+	u8 e[32];
+
+	memcpy(e, scalar, 32);
+	normalize_secret(e);
+
+	/* The following implementation was transcribed to Coq and proven to
+	 * correspond to unary scalar multiplication in affine coordinates given
+	 * that x1 != 0 is the x coordinate of some point on the curve. It was
+	 * also checked in Coq that doing a ladderstep with x1 = x3 = 0 gives
+	 * z2' = z3' = 0, and z2 = z3 = 0 gives z2' = z3' = 0. The statement was
+	 * quantified over the underlying field, so it applies to Curve25519
+	 * itself and the quadratic twist of Curve25519. It was not proven in
+	 * Coq that prime-field arithmetic correctly simulates extension-field
+	 * arithmetic on prime-field values. The decoding of the byte array
+	 * representation of e was not considered.
+	 *
+	 * Specification of Montgomery curves in affine coordinates:
+	 * <https://github.com/mit-plv/fiat-crypto/blob/2456d821825521f7e03e65882cc3521795b0320f/src/Spec/MontgomeryCurve.v#L27>
+	 *
+	 * Proof that these form a group that is isomorphic to a Weierstrass
+	 * curve:
+	 * <https://github.com/mit-plv/fiat-crypto/blob/2456d821825521f7e03e65882cc3521795b0320f/src/Curves/Montgomery/AffineProofs.v#L35>
+	 *
+	 * Coq transcription and correctness proof of the loop
+	 * (where scalarbits=255):
+	 * <https://github.com/mit-plv/fiat-crypto/blob/2456d821825521f7e03e65882cc3521795b0320f/src/Curves/Montgomery/XZ.v#L118>
+	 * <https://github.com/mit-plv/fiat-crypto/blob/2456d821825521f7e03e65882cc3521795b0320f/src/Curves/Montgomery/XZProofs.v#L278>
+	 * preconditions: 0 <= e < 2^255 (not necessarily e < order),
+	 * fe_invert(0) = 0
+	 */
+	fe_frombytes(&x1, point);
+	fe_1(&x2);
+	fe_0(&z2);
+	fe_copy(&x3, &x1);
+	fe_1(&z3);
+
+	for (pos = 254; pos >= 0; --pos) {
+		/* loop invariant as of right before the test, for the case
+		 * where x1 != 0:
+		 *   pos >= -1; if z2 = 0 then x2 is nonzero; if z3 = 0 then x3
+		 *   is nonzero
+		 *   let r := e >> (pos+1) in the following equalities of
+		 *   projective points:
+		 *   to_xz (r*P)     === if swap then (x3, z3) else (x2, z2)
+		 *   to_xz ((r+1)*P) === if swap then (x2, z2) else (x3, z3)
+		 *   x1 is the nonzero x coordinate of the nonzero
+		 *   point (r*P-(r+1)*P)
+		 */
+		unsigned b = 1 & (e[pos / 8] >> (pos & 7));
+		swap ^= b;
+		fe_cswap(&x2, &x3, swap);
+		fe_cswap(&z2, &z3, swap);
+		swap = b;
+		/* Coq transcription of ladderstep formula (called from
+		 * transcribed loop):
+		 * <https://github.com/mit-plv/fiat-crypto/blob/2456d821825521f7e03e65882cc3521795b0320f/src/Curves/Montgomery/XZ.v#L89>
+		 * <https://github.com/mit-plv/fiat-crypto/blob/2456d821825521f7e03e65882cc3521795b0320f/src/Curves/Montgomery/XZProofs.v#L131>
+		 * x1 != 0 <https://github.com/mit-plv/fiat-crypto/blob/2456d821825521f7e03e65882cc3521795b0320f/src/Curves/Montgomery/XZProofs.v#L217>
+		 * x1  = 0 <https://github.com/mit-plv/fiat-crypto/blob/2456d821825521f7e03e65882cc3521795b0320f/src/Curves/Montgomery/XZProofs.v#L147>
+		 */
+		fe_sub(&tmp0l, &x3, &z3);
+		fe_sub(&tmp1l, &x2, &z2);
+		fe_add(&x2l, &x2, &z2);
+		fe_add(&z2l, &x3, &z3);
+		fe_mul_tll(&z3, &tmp0l, &x2l);
+		fe_mul_tll(&z2, &z2l, &tmp1l);
+		fe_sq_tl(&tmp0, &tmp1l);
+		fe_sq_tl(&tmp1, &x2l);
+		fe_add(&x3l, &z3, &z2);
+		fe_sub(&z2l, &z3, &z2);
+		fe_mul_ttt(&x2, &tmp1, &tmp0);
+		fe_sub(&tmp1l, &tmp1, &tmp0);
+		fe_sq_tl(&z2, &z2l);
+		fe_mul121666(&z3, &tmp1l);
+		fe_sq_tl(&x3, &x3l);
+		fe_add(&tmp0l, &tmp0, &z3);
+		fe_mul_ttt(&z3, &x1, &z2);
+		fe_mul_tll(&z2, &tmp1l, &tmp0l);
+	}
+	/* here pos=-1, so r=e, so to_xz (e*P) === if swap then (x3, z3)
+	 * else (x2, z2)
+	 */
+	fe_cswap(&x2, &x3, swap);
+	fe_cswap(&z2, &z3, swap);
+
+	fe_invert(&z2, &z2);
+	fe_mul_ttt(&x2, &x2, &z2);
+	fe_tobytes(out, &x2);
+
+	memzero_explicit(&x1, sizeof(x1));
+	memzero_explicit(&x2, sizeof(x2));
+	memzero_explicit(&z2, sizeof(z2));
+	memzero_explicit(&x3, sizeof(x3));
+	memzero_explicit(&z3, sizeof(z3));
+	memzero_explicit(&tmp0, sizeof(tmp0));
+	memzero_explicit(&tmp1, sizeof(tmp1));
+	memzero_explicit(&x2l, sizeof(x2l));
+	memzero_explicit(&z2l, sizeof(z2l));
+	memzero_explicit(&x3l, sizeof(x3l));
+	memzero_explicit(&tmp0l, sizeof(tmp0l));
+	memzero_explicit(&tmp1l, sizeof(tmp1l));
+	memzero_explicit(&e, sizeof(e));
+}
diff --git a/lib/zinc/curve25519/curve25519-hacl64.h b/lib/zinc/curve25519/curve25519-hacl64.h
new file mode 100644
index 000000000000..ebe88fcdc937
--- /dev/null
+++ b/lib/zinc/curve25519/curve25519-hacl64.h
@@ -0,0 +1,785 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2016-2017 INRIA and Microsoft Corporation.
+ * Copyright (C) 2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is a machine-generated formally verified implementation of Curve25519
+ * ECDH from: <https://github.com/mitls/hacl-star>. Though originally machine
+ * generated, it has been tweaked to be suitable for use in the kernel. It is
+ * optimized for 64-bit machines that can efficiently work with 128-bit
+ * integer types.
+ */
+
+typedef __uint128_t u128;
+
+static __always_inline u64 u64_eq_mask(u64 a, u64 b)
+{
+	u64 x = a ^ b;
+	u64 minus_x = ~x + (u64)1U;
+	u64 x_or_minus_x = x | minus_x;
+	u64 xnx = x_or_minus_x >> (u32)63U;
+	u64 c = xnx - (u64)1U;
+	return c;
+}
+
+static __always_inline u64 u64_gte_mask(u64 a, u64 b)
+{
+	u64 x = a;
+	u64 y = b;
+	u64 x_xor_y = x ^ y;
+	u64 x_sub_y = x - y;
+	u64 x_sub_y_xor_y = x_sub_y ^ y;
+	u64 q = x_xor_y | x_sub_y_xor_y;
+	u64 x_xor_q = x ^ q;
+	u64 x_xor_q_ = x_xor_q >> (u32)63U;
+	u64 c = x_xor_q_ - (u64)1U;
+	return c;
+}
+
+static __always_inline void modulo_carry_top(u64 *b)
+{
+	u64 b4 = b[4];
+	u64 b0 = b[0];
+	u64 b4_ = b4 & 0x7ffffffffffffLLU;
+	u64 b0_ = b0 + 19 * (b4 >> 51);
+	b[4] = b4_;
+	b[0] = b0_;
+}
+
+static __always_inline void fproduct_copy_from_wide_(u64 *output, u128 *input)
+{
+	{
+		u128 xi = input[0];
+		output[0] = ((u64)(xi));
+	}
+	{
+		u128 xi = input[1];
+		output[1] = ((u64)(xi));
+	}
+	{
+		u128 xi = input[2];
+		output[2] = ((u64)(xi));
+	}
+	{
+		u128 xi = input[3];
+		output[3] = ((u64)(xi));
+	}
+	{
+		u128 xi = input[4];
+		output[4] = ((u64)(xi));
+	}
+}
+
+static __always_inline void
+fproduct_sum_scalar_multiplication_(u128 *output, u64 *input, u64 s)
+{
+	output[0] += (u128)input[0] * s;
+	output[1] += (u128)input[1] * s;
+	output[2] += (u128)input[2] * s;
+	output[3] += (u128)input[3] * s;
+	output[4] += (u128)input[4] * s;
+}
+
+static __always_inline void fproduct_carry_wide_(u128 *tmp)
+{
+	{
+		u32 ctr = 0;
+		u128 tctr = tmp[ctr];
+		u128 tctrp1 = tmp[ctr + 1];
+		u64 r0 = ((u64)(tctr)) & 0x7ffffffffffffLLU;
+		u128 c = ((tctr) >> (51));
+		tmp[ctr] = ((u128)(r0));
+		tmp[ctr + 1] = ((tctrp1) + (c));
+	}
+	{
+		u32 ctr = 1;
+		u128 tctr = tmp[ctr];
+		u128 tctrp1 = tmp[ctr + 1];
+		u64 r0 = ((u64)(tctr)) & 0x7ffffffffffffLLU;
+		u128 c = ((tctr) >> (51));
+		tmp[ctr] = ((u128)(r0));
+		tmp[ctr + 1] = ((tctrp1) + (c));
+	}
+
+	{
+		u32 ctr = 2;
+		u128 tctr = tmp[ctr];
+		u128 tctrp1 = tmp[ctr + 1];
+		u64 r0 = ((u64)(tctr)) & 0x7ffffffffffffLLU;
+		u128 c = ((tctr) >> (51));
+		tmp[ctr] = ((u128)(r0));
+		tmp[ctr + 1] = ((tctrp1) + (c));
+	}
+	{
+		u32 ctr = 3;
+		u128 tctr = tmp[ctr];
+		u128 tctrp1 = tmp[ctr + 1];
+		u64 r0 = ((u64)(tctr)) & 0x7ffffffffffffLLU;
+		u128 c = ((tctr) >> (51));
+		tmp[ctr] = ((u128)(r0));
+		tmp[ctr + 1] = ((tctrp1) + (c));
+	}
+}
+
+static __always_inline void fmul_shift_reduce(u64 *output)
+{
+	u64 tmp = output[4];
+	u64 b0;
+	{
+		u32 ctr = 5 - 0 - 1;
+		u64 z = output[ctr - 1];
+		output[ctr] = z;
+	}
+	{
+		u32 ctr = 5 - 1 - 1;
+		u64 z = output[ctr - 1];
+		output[ctr] = z;
+	}
+	{
+		u32 ctr = 5 - 2 - 1;
+		u64 z = output[ctr - 1];
+		output[ctr] = z;
+	}
+	{
+		u32 ctr = 5 - 3 - 1;
+		u64 z = output[ctr - 1];
+		output[ctr] = z;
+	}
+	output[0] = tmp;
+	b0 = output[0];
+	output[0] = 19 * b0;
+}
+
+static __always_inline void fmul_mul_shift_reduce_(u128 *output, u64 *input,
+						   u64 *input21)
+{
+	u32 i;
+	u64 input2i;
+	{
+		u64 input2i = input21[0];
+		fproduct_sum_scalar_multiplication_(output, input, input2i);
+		fmul_shift_reduce(input);
+	}
+	{
+		u64 input2i = input21[1];
+		fproduct_sum_scalar_multiplication_(output, input, input2i);
+		fmul_shift_reduce(input);
+	}
+	{
+		u64 input2i = input21[2];
+		fproduct_sum_scalar_multiplication_(output, input, input2i);
+		fmul_shift_reduce(input);
+	}
+	{
+		u64 input2i = input21[3];
+		fproduct_sum_scalar_multiplication_(output, input, input2i);
+		fmul_shift_reduce(input);
+	}
+	i = 4;
+	input2i = input21[i];
+	fproduct_sum_scalar_multiplication_(output, input, input2i);
+}
+
+static __always_inline void fmul_fmul(u64 *output, u64 *input, u64 *input21)
+{
+	u64 tmp[5];
+	memcpy(tmp, input, 5 * sizeof(*input));
+	{
+		u128 b4;
+		u128 b0;
+		u128 b4_;
+		u128 b0_;
+		u64 i0;
+		u64 i1;
+		u64 i0_;
+		u64 i1_;
+		u128 t[5] = { 0 };
+		fmul_mul_shift_reduce_(t, tmp, input21);
+		fproduct_carry_wide_(t);
+		b4 = t[4];
+		b0 = t[0];
+		b4_ = ((b4) & (((u128)(0x7ffffffffffffLLU))));
+		b0_ = ((b0) + (((u128)(19) * (((u64)(((b4) >> (51))))))));
+		t[4] = b4_;
+		t[0] = b0_;
+		fproduct_copy_from_wide_(output, t);
+		i0 = output[0];
+		i1 = output[1];
+		i0_ = i0 & 0x7ffffffffffffLLU;
+		i1_ = i1 + (i0 >> 51);
+		output[0] = i0_;
+		output[1] = i1_;
+	}
+}
+
+static __always_inline void fsquare_fsquare__(u128 *tmp, u64 *output)
+{
+	u64 r0 = output[0];
+	u64 r1 = output[1];
+	u64 r2 = output[2];
+	u64 r3 = output[3];
+	u64 r4 = output[4];
+	u64 d0 = r0 * 2;
+	u64 d1 = r1 * 2;
+	u64 d2 = r2 * 2 * 19;
+	u64 d419 = r4 * 19;
+	u64 d4 = d419 * 2;
+	u128 s0 = ((((((u128)(r0) * (r0))) + (((u128)(d4) * (r1))))) +
+		   (((u128)(d2) * (r3))));
+	u128 s1 = ((((((u128)(d0) * (r1))) + (((u128)(d4) * (r2))))) +
+		   (((u128)(r3 * 19) * (r3))));
+	u128 s2 = ((((((u128)(d0) * (r2))) + (((u128)(r1) * (r1))))) +
+		   (((u128)(d4) * (r3))));
+	u128 s3 = ((((((u128)(d0) * (r3))) + (((u128)(d1) * (r2))))) +
+		   (((u128)(r4) * (d419))));
+	u128 s4 = ((((((u128)(d0) * (r4))) + (((u128)(d1) * (r3))))) +
+		   (((u128)(r2) * (r2))));
+	tmp[0] = s0;
+	tmp[1] = s1;
+	tmp[2] = s2;
+	tmp[3] = s3;
+	tmp[4] = s4;
+}
+
+static __always_inline void fsquare_fsquare_(u128 *tmp, u64 *output)
+{
+	u128 b4;
+	u128 b0;
+	u128 b4_;
+	u128 b0_;
+	u64 i0;
+	u64 i1;
+	u64 i0_;
+	u64 i1_;
+	fsquare_fsquare__(tmp, output);
+	fproduct_carry_wide_(tmp);
+	b4 = tmp[4];
+	b0 = tmp[0];
+	b4_ = ((b4) & (((u128)(0x7ffffffffffffLLU))));
+	b0_ = ((b0) + (((u128)(19) * (((u64)(((b4) >> (51))))))));
+	tmp[4] = b4_;
+	tmp[0] = b0_;
+	fproduct_copy_from_wide_(output, tmp);
+	i0 = output[0];
+	i1 = output[1];
+	i0_ = i0 & 0x7ffffffffffffLLU;
+	i1_ = i1 + (i0 >> 51);
+	output[0] = i0_;
+	output[1] = i1_;
+}
+
+static __always_inline void fsquare_fsquare_times_(u64 *output, u128 *tmp,
+						   u32 count1)
+{
+	u32 i;
+	fsquare_fsquare_(tmp, output);
+	for (i = 1; i < count1; ++i)
+		fsquare_fsquare_(tmp, output);
+}
+
+static __always_inline void fsquare_fsquare_times(u64 *output, u64 *input,
+						  u32 count1)
+{
+	u128 t[5];
+	memcpy(output, input, 5 * sizeof(*input));
+	fsquare_fsquare_times_(output, t, count1);
+}
+
+static __always_inline void fsquare_fsquare_times_inplace(u64 *output,
+							  u32 count1)
+{
+	u128 t[5];
+	fsquare_fsquare_times_(output, t, count1);
+}
+
+static __always_inline void crecip_crecip(u64 *out, u64 *z)
+{
+	u64 buf[20] = { 0 };
+	u64 *a0 = buf;
+	u64 *t00 = buf + 5;
+	u64 *b0 = buf + 10;
+	u64 *t01;
+	u64 *b1;
+	u64 *c0;
+	u64 *a;
+	u64 *t0;
+	u64 *b;
+	u64 *c;
+	fsquare_fsquare_times(a0, z, 1);
+	fsquare_fsquare_times(t00, a0, 2);
+	fmul_fmul(b0, t00, z);
+	fmul_fmul(a0, b0, a0);
+	fsquare_fsquare_times(t00, a0, 1);
+	fmul_fmul(b0, t00, b0);
+	fsquare_fsquare_times(t00, b0, 5);
+	t01 = buf + 5;
+	b1 = buf + 10;
+	c0 = buf + 15;
+	fmul_fmul(b1, t01, b1);
+	fsquare_fsquare_times(t01, b1, 10);
+	fmul_fmul(c0, t01, b1);
+	fsquare_fsquare_times(t01, c0, 20);
+	fmul_fmul(t01, t01, c0);
+	fsquare_fsquare_times_inplace(t01, 10);
+	fmul_fmul(b1, t01, b1);
+	fsquare_fsquare_times(t01, b1, 50);
+	a = buf;
+	t0 = buf + 5;
+	b = buf + 10;
+	c = buf + 15;
+	fmul_fmul(c, t0, b);
+	fsquare_fsquare_times(t0, c, 100);
+	fmul_fmul(t0, t0, c);
+	fsquare_fsquare_times_inplace(t0, 50);
+	fmul_fmul(t0, t0, b);
+	fsquare_fsquare_times_inplace(t0, 5);
+	fmul_fmul(out, t0, a);
+}
+
+static __always_inline void fsum(u64 *a, u64 *b)
+{
+	a[0] += b[0];
+	a[1] += b[1];
+	a[2] += b[2];
+	a[3] += b[3];
+	a[4] += b[4];
+}
+
+static __always_inline void fdifference(u64 *a, u64 *b)
+{
+	u64 tmp[5] = { 0 };
+	u64 b0;
+	u64 b1;
+	u64 b2;
+	u64 b3;
+	u64 b4;
+	memcpy(tmp, b, 5 * sizeof(*b));
+	b0 = tmp[0];
+	b1 = tmp[1];
+	b2 = tmp[2];
+	b3 = tmp[3];
+	b4 = tmp[4];
+	tmp[0] = b0 + 0x3fffffffffff68LLU;
+	tmp[1] = b1 + 0x3ffffffffffff8LLU;
+	tmp[2] = b2 + 0x3ffffffffffff8LLU;
+	tmp[3] = b3 + 0x3ffffffffffff8LLU;
+	tmp[4] = b4 + 0x3ffffffffffff8LLU;
+	{
+		u64 xi = a[0];
+		u64 yi = tmp[0];
+		a[0] = yi - xi;
+	}
+	{
+		u64 xi = a[1];
+		u64 yi = tmp[1];
+		a[1] = yi - xi;
+	}
+	{
+		u64 xi = a[2];
+		u64 yi = tmp[2];
+		a[2] = yi - xi;
+	}
+	{
+		u64 xi = a[3];
+		u64 yi = tmp[3];
+		a[3] = yi - xi;
+	}
+	{
+		u64 xi = a[4];
+		u64 yi = tmp[4];
+		a[4] = yi - xi;
+	}
+}
+
+static __always_inline void fscalar(u64 *output, u64 *b, u64 s)
+{
+	u128 tmp[5];
+	u128 b4;
+	u128 b0;
+	u128 b4_;
+	u128 b0_;
+	{
+		u64 xi = b[0];
+		tmp[0] = ((u128)(xi) * (s));
+	}
+	{
+		u64 xi = b[1];
+		tmp[1] = ((u128)(xi) * (s));
+	}
+	{
+		u64 xi = b[2];
+		tmp[2] = ((u128)(xi) * (s));
+	}
+	{
+		u64 xi = b[3];
+		tmp[3] = ((u128)(xi) * (s));
+	}
+	{
+		u64 xi = b[4];
+		tmp[4] = ((u128)(xi) * (s));
+	}
+	fproduct_carry_wide_(tmp);
+	b4 = tmp[4];
+	b0 = tmp[0];
+	b4_ = ((b4) & (((u128)(0x7ffffffffffffLLU))));
+	b0_ = ((b0) + (((u128)(19) * (((u64)(((b4) >> (51))))))));
+	tmp[4] = b4_;
+	tmp[0] = b0_;
+	fproduct_copy_from_wide_(output, tmp);
+}
+
+static __always_inline void fmul(u64 *output, u64 *a, u64 *b)
+{
+	fmul_fmul(output, a, b);
+}
+
+static __always_inline void crecip(u64 *output, u64 *input)
+{
+	crecip_crecip(output, input);
+}
+
+static __always_inline void point_swap_conditional_step(u64 *a, u64 *b,
+							u64 swap1, u32 ctr)
+{
+	u32 i = ctr - 1;
+	u64 ai = a[i];
+	u64 bi = b[i];
+	u64 x = swap1 & (ai ^ bi);
+	u64 ai1 = ai ^ x;
+	u64 bi1 = bi ^ x;
+	a[i] = ai1;
+	b[i] = bi1;
+}
+
+static __always_inline void point_swap_conditional5(u64 *a, u64 *b, u64 swap1)
+{
+	point_swap_conditional_step(a, b, swap1, 5);
+	point_swap_conditional_step(a, b, swap1, 4);
+	point_swap_conditional_step(a, b, swap1, 3);
+	point_swap_conditional_step(a, b, swap1, 2);
+	point_swap_conditional_step(a, b, swap1, 1);
+}
+
+static __always_inline void point_swap_conditional(u64 *a, u64 *b, u64 iswap)
+{
+	u64 swap1 = 0 - iswap;
+	point_swap_conditional5(a, b, swap1);
+	point_swap_conditional5(a + 5, b + 5, swap1);
+}
+
+static __always_inline void point_copy(u64 *output, u64 *input)
+{
+	memcpy(output, input, 5 * sizeof(*input));
+	memcpy(output + 5, input + 5, 5 * sizeof(*input));
+}
+
+static __always_inline void addanddouble_fmonty(u64 *pp, u64 *ppq, u64 *p,
+						u64 *pq, u64 *qmqp)
+{
+	u64 *qx = qmqp;
+	u64 *x2 = pp;
+	u64 *z2 = pp + 5;
+	u64 *x3 = ppq;
+	u64 *z3 = ppq + 5;
+	u64 *x = p;
+	u64 *z = p + 5;
+	u64 *xprime = pq;
+	u64 *zprime = pq + 5;
+	u64 buf[40] = { 0 };
+	u64 *origx = buf;
+	u64 *origxprime0 = buf + 5;
+	u64 *xxprime0;
+	u64 *zzprime0;
+	u64 *origxprime;
+	xxprime0 = buf + 25;
+	zzprime0 = buf + 30;
+	memcpy(origx, x, 5 * sizeof(*x));
+	fsum(x, z);
+	fdifference(z, origx);
+	memcpy(origxprime0, xprime, 5 * sizeof(*xprime));
+	fsum(xprime, zprime);
+	fdifference(zprime, origxprime0);
+	fmul(xxprime0, xprime, z);
+	fmul(zzprime0, x, zprime);
+	origxprime = buf + 5;
+	{
+		u64 *xx0;
+		u64 *zz0;
+		u64 *xxprime;
+		u64 *zzprime;
+		u64 *zzzprime;
+		xx0 = buf + 15;
+		zz0 = buf + 20;
+		xxprime = buf + 25;
+		zzprime = buf + 30;
+		zzzprime = buf + 35;
+		memcpy(origxprime, xxprime, 5 * sizeof(*xxprime));
+		fsum(xxprime, zzprime);
+		fdifference(zzprime, origxprime);
+		fsquare_fsquare_times(x3, xxprime, 1);
+		fsquare_fsquare_times(zzzprime, zzprime, 1);
+		fmul(z3, zzzprime, qx);
+		fsquare_fsquare_times(xx0, x, 1);
+		fsquare_fsquare_times(zz0, z, 1);
+		{
+			u64 *zzz;
+			u64 *xx;
+			u64 *zz;
+			u64 scalar;
+			zzz = buf + 10;
+			xx = buf + 15;
+			zz = buf + 20;
+			fmul(x2, xx, zz);
+			fdifference(zz, xx);
+			scalar = 121665;
+			fscalar(zzz, zz, scalar);
+			fsum(zzz, xx);
+			fmul(z2, zzz, zz);
+		}
+	}
+}
+
+static __always_inline void
+ladder_smallloop_cmult_small_loop_step(u64 *nq, u64 *nqpq, u64 *nq2, u64 *nqpq2,
+				       u64 *q, u8 byt)
+{
+	u64 bit0 = (u64)(byt >> 7);
+	u64 bit;
+	point_swap_conditional(nq, nqpq, bit0);
+	addanddouble_fmonty(nq2, nqpq2, nq, nqpq, q);
+	bit = (u64)(byt >> 7);
+	point_swap_conditional(nq2, nqpq2, bit);
+}
+
+static __always_inline void
+ladder_smallloop_cmult_small_loop_double_step(u64 *nq, u64 *nqpq, u64 *nq2,
+					      u64 *nqpq2, u64 *q, u8 byt)
+{
+	u8 byt1;
+	ladder_smallloop_cmult_small_loop_step(nq, nqpq, nq2, nqpq2, q, byt);
+	byt1 = byt << 1;
+	ladder_smallloop_cmult_small_loop_step(nq2, nqpq2, nq, nqpq, q, byt1);
+}
+
+static __always_inline void
+ladder_smallloop_cmult_small_loop(u64 *nq, u64 *nqpq, u64 *nq2, u64 *nqpq2,
+				  u64 *q, u8 byt, u32 i)
+{
+	while (i--) {
+		ladder_smallloop_cmult_small_loop_double_step(nq, nqpq, nq2,
+							      nqpq2, q, byt);
+		byt <<= 2;
+	}
+}
+
+static __always_inline void ladder_bigloop_cmult_big_loop(u8 *n1, u64 *nq,
+							  u64 *nqpq, u64 *nq2,
+							  u64 *nqpq2, u64 *q,
+							  u32 i)
+{
+	while (i--) {
+		u8 byte = n1[i];
+		ladder_smallloop_cmult_small_loop(nq, nqpq, nq2, nqpq2, q,
+						  byte, 4);
+	}
+}
+
+static __always_inline void ladder_cmult(u64 *result, u8 *n1, u64 *q)
+{
+	u64 point_buf[40] = { 0 };
+	u64 *nq = point_buf;
+	u64 *nqpq = point_buf + 10;
+	u64 *nq2 = point_buf + 20;
+	u64 *nqpq2 = point_buf + 30;
+	point_copy(nqpq, q);
+	nq[0] = 1;
+	ladder_bigloop_cmult_big_loop(n1, nq, nqpq, nq2, nqpq2, q, 32);
+	point_copy(result, nq);
+}
+
+static __always_inline void format_fexpand(u64 *output, const u8 *input)
+{
+	const u8 *x00 = input + 6;
+	const u8 *x01 = input + 12;
+	const u8 *x02 = input + 19;
+	const u8 *x0 = input + 24;
+	u64 i0, i1, i2, i3, i4, output0, output1, output2, output3, output4;
+	i0 = get_unaligned_le64(input);
+	i1 = get_unaligned_le64(x00);
+	i2 = get_unaligned_le64(x01);
+	i3 = get_unaligned_le64(x02);
+	i4 = get_unaligned_le64(x0);
+	output0 = i0 & 0x7ffffffffffffLLU;
+	output1 = i1 >> 3 & 0x7ffffffffffffLLU;
+	output2 = i2 >> 6 & 0x7ffffffffffffLLU;
+	output3 = i3 >> 1 & 0x7ffffffffffffLLU;
+	output4 = i4 >> 12 & 0x7ffffffffffffLLU;
+	output[0] = output0;
+	output[1] = output1;
+	output[2] = output2;
+	output[3] = output3;
+	output[4] = output4;
+}
+
+static __always_inline void format_fcontract_first_carry_pass(u64 *input)
+{
+	u64 t0 = input[0];
+	u64 t1 = input[1];
+	u64 t2 = input[2];
+	u64 t3 = input[3];
+	u64 t4 = input[4];
+	u64 t1_ = t1 + (t0 >> 51);
+	u64 t0_ = t0 & 0x7ffffffffffffLLU;
+	u64 t2_ = t2 + (t1_ >> 51);
+	u64 t1__ = t1_ & 0x7ffffffffffffLLU;
+	u64 t3_ = t3 + (t2_ >> 51);
+	u64 t2__ = t2_ & 0x7ffffffffffffLLU;
+	u64 t4_ = t4 + (t3_ >> 51);
+	u64 t3__ = t3_ & 0x7ffffffffffffLLU;
+	input[0] = t0_;
+	input[1] = t1__;
+	input[2] = t2__;
+	input[3] = t3__;
+	input[4] = t4_;
+}
+
+static __always_inline void format_fcontract_first_carry_full(u64 *input)
+{
+	format_fcontract_first_carry_pass(input);
+	modulo_carry_top(input);
+}
+
+static __always_inline void format_fcontract_second_carry_pass(u64 *input)
+{
+	u64 t0 = input[0];
+	u64 t1 = input[1];
+	u64 t2 = input[2];
+	u64 t3 = input[3];
+	u64 t4 = input[4];
+	u64 t1_ = t1 + (t0 >> 51);
+	u64 t0_ = t0 & 0x7ffffffffffffLLU;
+	u64 t2_ = t2 + (t1_ >> 51);
+	u64 t1__ = t1_ & 0x7ffffffffffffLLU;
+	u64 t3_ = t3 + (t2_ >> 51);
+	u64 t2__ = t2_ & 0x7ffffffffffffLLU;
+	u64 t4_ = t4 + (t3_ >> 51);
+	u64 t3__ = t3_ & 0x7ffffffffffffLLU;
+	input[0] = t0_;
+	input[1] = t1__;
+	input[2] = t2__;
+	input[3] = t3__;
+	input[4] = t4_;
+}
+
+static __always_inline void format_fcontract_second_carry_full(u64 *input)
+{
+	u64 i0;
+	u64 i1;
+	u64 i0_;
+	u64 i1_;
+	format_fcontract_second_carry_pass(input);
+	modulo_carry_top(input);
+	i0 = input[0];
+	i1 = input[1];
+	i0_ = i0 & 0x7ffffffffffffLLU;
+	i1_ = i1 + (i0 >> 51);
+	input[0] = i0_;
+	input[1] = i1_;
+}
+
+static __always_inline void format_fcontract_trim(u64 *input)
+{
+	u64 a0 = input[0];
+	u64 a1 = input[1];
+	u64 a2 = input[2];
+	u64 a3 = input[3];
+	u64 a4 = input[4];
+	u64 mask0 = u64_gte_mask(a0, 0x7ffffffffffedLLU);
+	u64 mask1 = u64_eq_mask(a1, 0x7ffffffffffffLLU);
+	u64 mask2 = u64_eq_mask(a2, 0x7ffffffffffffLLU);
+	u64 mask3 = u64_eq_mask(a3, 0x7ffffffffffffLLU);
+	u64 mask4 = u64_eq_mask(a4, 0x7ffffffffffffLLU);
+	u64 mask = (((mask0 & mask1) & mask2) & mask3) & mask4;
+	u64 a0_ = a0 - (0x7ffffffffffedLLU & mask);
+	u64 a1_ = a1 - (0x7ffffffffffffLLU & mask);
+	u64 a2_ = a2 - (0x7ffffffffffffLLU & mask);
+	u64 a3_ = a3 - (0x7ffffffffffffLLU & mask);
+	u64 a4_ = a4 - (0x7ffffffffffffLLU & mask);
+	input[0] = a0_;
+	input[1] = a1_;
+	input[2] = a2_;
+	input[3] = a3_;
+	input[4] = a4_;
+}
+
+static __always_inline void format_fcontract_store(u8 *output, u64 *input)
+{
+	u64 t0 = input[0];
+	u64 t1 = input[1];
+	u64 t2 = input[2];
+	u64 t3 = input[3];
+	u64 t4 = input[4];
+	u64 o0 = t1 << 51 | t0;
+	u64 o1 = t2 << 38 | t1 >> 13;
+	u64 o2 = t3 << 25 | t2 >> 26;
+	u64 o3 = t4 << 12 | t3 >> 39;
+	u8 *b0 = output;
+	u8 *b1 = output + 8;
+	u8 *b2 = output + 16;
+	u8 *b3 = output + 24;
+	put_unaligned_le64(o0, b0);
+	put_unaligned_le64(o1, b1);
+	put_unaligned_le64(o2, b2);
+	put_unaligned_le64(o3, b3);
+}
+
+static __always_inline void format_fcontract(u8 *output, u64 *input)
+{
+	format_fcontract_first_carry_full(input);
+	format_fcontract_second_carry_full(input);
+	format_fcontract_trim(input);
+	format_fcontract_store(output, input);
+}
+
+static __always_inline void format_scalar_of_point(u8 *scalar, u64 *point)
+{
+	u64 *x = point;
+	u64 *z = point + 5;
+	u64 buf[10] __aligned(32) = { 0 };
+	u64 *zmone = buf;
+	u64 *sc = buf + 5;
+	crecip(zmone, z);
+	fmul(sc, x, zmone);
+	format_fcontract(scalar, sc);
+}
+
+static void curve25519_generic(u8 mypublic[CURVE25519_POINT_SIZE],
+			       const u8 secret[CURVE25519_POINT_SIZE],
+			       const u8 basepoint[CURVE25519_POINT_SIZE])
+{
+	u64 buf0[10] __aligned(32) = { 0 };
+	u64 *x0 = buf0;
+	u64 *z = buf0 + 5;
+	u64 *q;
+	format_fexpand(x0, basepoint);
+	z[0] = 1;
+	q = buf0;
+	{
+		u8 e[32] __aligned(32) = { 0 };
+		u8 *scalar;
+		memcpy(e, secret, 32);
+		normalize_secret(e);
+		scalar = e;
+		{
+			u64 buf[15] = { 0 };
+			u64 *nq = buf;
+			u64 *x = nq;
+			x[0] = 1;
+			ladder_cmult(nq, scalar, q);
+			format_scalar_of_point(mypublic, nq);
+			memzero_explicit(buf, sizeof(buf));
+		}
+		memzero_explicit(e, sizeof(e));
+	}
+	memzero_explicit(buf0, sizeof(buf0));
+}
diff --git a/lib/zinc/curve25519/curve25519.c b/lib/zinc/curve25519/curve25519.c
new file mode 100644
index 000000000000..183eeb5bc548
--- /dev/null
+++ b/lib/zinc/curve25519/curve25519.c
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * This is an implementation of the Curve25519 ECDH algorithm, using either
+ * a 32-bit implementation or a 64-bit implementation with 128-bit integers,
+ * depending on what is supported by the target compiler.
+ *
+ * Information: https://cr.yp.to/ecdh.html
+ */
+
+#include <zinc/curve25519.h>
+
+#include <asm/unaligned.h>
+#include <linux/version.h>
+#include <linux/string.h>
+#include <linux/random.h>
+#include <crypto/algapi.h>
+
+#ifndef HAVE_CURVE25519_ARCH_IMPLEMENTATION
+void __init curve25519_fpu_init(void)
+{
+}
+static inline bool curve25519_arch(u8 mypublic[CURVE25519_POINT_SIZE],
+				   const u8 secret[CURVE25519_POINT_SIZE],
+				   const u8 basepoint[CURVE25519_POINT_SIZE])
+{
+	return false;
+}
+static inline bool curve25519_base_arch(u8 pub[CURVE25519_POINT_SIZE],
+					const u8 secret[CURVE25519_POINT_SIZE])
+{
+	return false;
+}
+#endif
+
+static __always_inline void normalize_secret(u8 secret[CURVE25519_POINT_SIZE])
+{
+	secret[0] &= 248;
+	secret[31] &= 127;
+	secret[31] |= 64;
+}
+
+#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__)
+#include "curve25519-hacl64.h"
+#else
+#include "curve25519-fiat32.h"
+#endif
+
+static const u8 null_point[CURVE25519_POINT_SIZE] = { 0 };
+
+bool curve25519(u8 mypublic[CURVE25519_POINT_SIZE],
+		const u8 secret[CURVE25519_POINT_SIZE],
+		const u8 basepoint[CURVE25519_POINT_SIZE])
+{
+	if (!curve25519_arch(mypublic, secret, basepoint))
+		curve25519_generic(mypublic, secret, basepoint);
+	return crypto_memneq(mypublic, null_point, CURVE25519_POINT_SIZE);
+}
+EXPORT_SYMBOL(curve25519);
+
+bool curve25519_generate_public(u8 pub[CURVE25519_POINT_SIZE],
+				const u8 secret[CURVE25519_POINT_SIZE])
+{
+	static const u8 basepoint[CURVE25519_POINT_SIZE] __aligned(32) = { 9 };
+
+	if (unlikely(!crypto_memneq(secret, null_point, CURVE25519_POINT_SIZE)))
+		return false;
+
+	if (curve25519_base_arch(pub, secret))
+		return crypto_memneq(pub, null_point, CURVE25519_POINT_SIZE);
+	return curve25519(pub, secret, basepoint);
+}
+EXPORT_SYMBOL(curve25519_generate_public);
+
+void curve25519_generate_secret(u8 secret[CURVE25519_POINT_SIZE])
+{
+	get_random_bytes_wait(secret, CURVE25519_POINT_SIZE);
+	normalize_secret(secret);
+}
+EXPORT_SYMBOL(curve25519_generate_secret);
+
+#include "../selftest/curve25519.h"
diff --git a/lib/zinc/main.c b/lib/zinc/main.c
index b1d650d82edb..8138b6d99754 100644
--- a/lib/zinc/main.c
+++ b/lib/zinc/main.c
@@ -7,6 +7,7 @@
 #include <zinc/chacha20.h>
 #include <zinc/poly1305.h>
 #include <zinc/blake2s.h>
+#include <zinc/curve25519.h>
 
 #include <linux/init.h>
 #include <linux/module.h>
@@ -35,6 +36,10 @@ static int __init mod_init(void)
 #ifdef CONFIG_ZINC_BLAKE2S
 	blake2s_fpu_init();
 	selftest(blake2s);
+#endif
+#ifdef CONFIG_ZINC_CURVE25519
+	curve25519_fpu_init();
+	selftest(curve25519);
 #endif
 	return 0;
 }
diff --git a/lib/zinc/selftest/curve25519.h b/lib/zinc/selftest/curve25519.h
new file mode 100644
index 000000000000..b9ac9897106d
--- /dev/null
+++ b/lib/zinc/selftest/curve25519.h
@@ -0,0 +1,1321 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+struct curve25519_test_vector {
+	u8 private[CURVE25519_POINT_SIZE];
+	u8 public[CURVE25519_POINT_SIZE];
+	u8 result[CURVE25519_POINT_SIZE];
+	bool valid;
+};
+static const struct curve25519_test_vector curve25519_test_vectors[] __initconst = {
+	{
+		.private = { 0x77, 0x07, 0x6d, 0x0a, 0x73, 0x18, 0xa5, 0x7d,
+			     0x3c, 0x16, 0xc1, 0x72, 0x51, 0xb2, 0x66, 0x45,
+			     0xdf, 0x4c, 0x2f, 0x87, 0xeb, 0xc0, 0x99, 0x2a,
+			     0xb1, 0x77, 0xfb, 0xa5, 0x1d, 0xb9, 0x2c, 0x2a },
+		.public = { 0xde, 0x9e, 0xdb, 0x7d, 0x7b, 0x7d, 0xc1, 0xb4,
+			    0xd3, 0x5b, 0x61, 0xc2, 0xec, 0xe4, 0x35, 0x37,
+			    0x3f, 0x83, 0x43, 0xc8, 0x5b, 0x78, 0x67, 0x4d,
+			    0xad, 0xfc, 0x7e, 0x14, 0x6f, 0x88, 0x2b, 0x4f },
+		.result = { 0x4a, 0x5d, 0x9d, 0x5b, 0xa4, 0xce, 0x2d, 0xe1,
+			    0x72, 0x8e, 0x3b, 0xf4, 0x80, 0x35, 0x0f, 0x25,
+			    0xe0, 0x7e, 0x21, 0xc9, 0x47, 0xd1, 0x9e, 0x33,
+			    0x76, 0xf0, 0x9b, 0x3c, 0x1e, 0x16, 0x17, 0x42 },
+		.valid = true
+	},
+	{
+		.private = { 0x5d, 0xab, 0x08, 0x7e, 0x62, 0x4a, 0x8a, 0x4b,
+			     0x79, 0xe1, 0x7f, 0x8b, 0x83, 0x80, 0x0e, 0xe6,
+			     0x6f, 0x3b, 0xb1, 0x29, 0x26, 0x18, 0xb6, 0xfd,
+			     0x1c, 0x2f, 0x8b, 0x27, 0xff, 0x88, 0xe0, 0xeb },
+		.public = { 0x85, 0x20, 0xf0, 0x09, 0x89, 0x30, 0xa7, 0x54,
+			    0x74, 0x8b, 0x7d, 0xdc, 0xb4, 0x3e, 0xf7, 0x5a,
+			    0x0d, 0xbf, 0x3a, 0x0d, 0x26, 0x38, 0x1a, 0xf4,
+			    0xeb, 0xa4, 0xa9, 0x8e, 0xaa, 0x9b, 0x4e, 0x6a },
+		.result = { 0x4a, 0x5d, 0x9d, 0x5b, 0xa4, 0xce, 0x2d, 0xe1,
+			    0x72, 0x8e, 0x3b, 0xf4, 0x80, 0x35, 0x0f, 0x25,
+			    0xe0, 0x7e, 0x21, 0xc9, 0x47, 0xd1, 0x9e, 0x33,
+			    0x76, 0xf0, 0x9b, 0x3c, 0x1e, 0x16, 0x17, 0x42 },
+		.valid = true
+	},
+	{
+		.private = { 1 },
+		.public = { 0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.result = { 0x3c, 0x77, 0x77, 0xca, 0xf9, 0x97, 0xb2, 0x64,
+			    0x41, 0x60, 0x77, 0x66, 0x5b, 0x4e, 0x22, 0x9d,
+			    0x0b, 0x95, 0x48, 0xdc, 0x0c, 0xd8, 0x19, 0x98,
+			    0xdd, 0xcd, 0xc5, 0xc8, 0x53, 0x3c, 0x79, 0x7f },
+		.valid = true
+	},
+	{
+		.private = { 1 },
+		.public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0xb3, 0x2d, 0x13, 0x62, 0xc2, 0x48, 0xd6, 0x2f,
+			    0xe6, 0x26, 0x19, 0xcf, 0xf0, 0x4d, 0xd4, 0x3d,
+			    0xb7, 0x3f, 0xfc, 0x1b, 0x63, 0x08, 0xed, 0xe3,
+			    0x0b, 0x78, 0xd8, 0x73, 0x80, 0xf1, 0xe8, 0x34 },
+		.valid = true
+	},
+	{
+		.private = { 0xa5, 0x46, 0xe3, 0x6b, 0xf0, 0x52, 0x7c, 0x9d,
+			     0x3b, 0x16, 0x15, 0x4b, 0x82, 0x46, 0x5e, 0xdd,
+			     0x62, 0x14, 0x4c, 0x0a, 0xc1, 0xfc, 0x5a, 0x18,
+			     0x50, 0x6a, 0x22, 0x44, 0xba, 0x44, 0x9a, 0xc4 },
+		.public = { 0xe6, 0xdb, 0x68, 0x67, 0x58, 0x30, 0x30, 0xdb,
+			    0x35, 0x94, 0xc1, 0xa4, 0x24, 0xb1, 0x5f, 0x7c,
+			    0x72, 0x66, 0x24, 0xec, 0x26, 0xb3, 0x35, 0x3b,
+			    0x10, 0xa9, 0x03, 0xa6, 0xd0, 0xab, 0x1c, 0x4c },
+		.result = { 0xc3, 0xda, 0x55, 0x37, 0x9d, 0xe9, 0xc6, 0x90,
+			    0x8e, 0x94, 0xea, 0x4d, 0xf2, 0x8d, 0x08, 0x4f,
+			    0x32, 0xec, 0xcf, 0x03, 0x49, 0x1c, 0x71, 0xf7,
+			    0x54, 0xb4, 0x07, 0x55, 0x77, 0xa2, 0x85, 0x52 },
+		.valid = true
+	},
+	{
+		.private = { 1, 2, 3, 4 },
+		.public = { 0 },
+		.result = { 0 },
+		.valid = false
+	},
+	{
+		.private = { 2, 4, 6, 8 },
+		.public = { 0xe0, 0xeb, 0x7a, 0x7c, 0x3b, 0x41, 0xb8, 0xae,
+			    0x16, 0x56, 0xe3, 0xfa, 0xf1, 0x9f, 0xc4, 0x6a,
+			    0xda, 0x09, 0x8d, 0xeb, 0x9c, 0x32, 0xb1, 0xfd,
+			    0x86, 0x62, 0x05, 0x16, 0x5f, 0x49, 0xb8 },
+		.result = { 0 },
+		.valid = false
+	},
+	{
+		.private = { 0xff, 0xff, 0xff, 0xff, 0x0a, 0xff, 0xff, 0xff,
+			     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0x0a, 0x00, 0xfb, 0x9f },
+		.result = { 0x77, 0x52, 0xb6, 0x18, 0xc1, 0x2d, 0x48, 0xd2,
+			    0xc6, 0x93, 0x46, 0x83, 0x81, 0x7c, 0xc6, 0x57,
+			    0xf3, 0x31, 0x03, 0x19, 0x49, 0x48, 0x20, 0x05,
+			    0x42, 0x2b, 0x4e, 0xae, 0x8d, 0x1d, 0x43, 0x23 },
+		.valid = true
+	},
+	{
+		.private = { 0x8e, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			     0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			     0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			     0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.public = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8e, 0x06 },
+		.result = { 0x5a, 0xdf, 0xaa, 0x25, 0x86, 0x8e, 0x32, 0x3d,
+			    0xae, 0x49, 0x62, 0xc1, 0x01, 0x5c, 0xb3, 0x12,
+			    0xe1, 0xc5, 0xc7, 0x9e, 0x95, 0x3f, 0x03, 0x99,
+			    0xb0, 0xba, 0x16, 0x22, 0xf3, 0xb6, 0xf7, 0x0c },
+		.valid = true
+	},
+	/* wycheproof - normal case */
+	{
+		.private = { 0x48, 0x52, 0x83, 0x4d, 0x9d, 0x6b, 0x77, 0xda,
+			     0xde, 0xab, 0xaa, 0xf2, 0xe1, 0x1d, 0xca, 0x66,
+			     0xd1, 0x9f, 0xe7, 0x49, 0x93, 0xa7, 0xbe, 0xc3,
+			     0x6c, 0x6e, 0x16, 0xa0, 0x98, 0x3f, 0xea, 0xba },
+		.public = { 0x9c, 0x64, 0x7d, 0x9a, 0xe5, 0x89, 0xb9, 0xf5,
+			    0x8f, 0xdc, 0x3c, 0xa4, 0x94, 0x7e, 0xfb, 0xc9,
+			    0x15, 0xc4, 0xb2, 0xe0, 0x8e, 0x74, 0x4a, 0x0e,
+			    0xdf, 0x46, 0x9d, 0xac, 0x59, 0xc8, 0xf8, 0x5a },
+		.result = { 0x87, 0xb7, 0xf2, 0x12, 0xb6, 0x27, 0xf7, 0xa5,
+			    0x4c, 0xa5, 0xe0, 0xbc, 0xda, 0xdd, 0xd5, 0x38,
+			    0x9d, 0x9d, 0xe6, 0x15, 0x6c, 0xdb, 0xcf, 0x8e,
+			    0xbe, 0x14, 0xff, 0xbc, 0xfb, 0x43, 0x65, 0x51 },
+		.valid = true
+	},
+	/* wycheproof - public key on twist */
+	{
+		.private = { 0x58, 0x8c, 0x06, 0x1a, 0x50, 0x80, 0x4a, 0xc4,
+			     0x88, 0xad, 0x77, 0x4a, 0xc7, 0x16, 0xc3, 0xf5,
+			     0xba, 0x71, 0x4b, 0x27, 0x12, 0xe0, 0x48, 0x49,
+			     0x13, 0x79, 0xa5, 0x00, 0x21, 0x19, 0x98, 0xa8 },
+		.public = { 0x63, 0xaa, 0x40, 0xc6, 0xe3, 0x83, 0x46, 0xc5,
+			    0xca, 0xf2, 0x3a, 0x6d, 0xf0, 0xa5, 0xe6, 0xc8,
+			    0x08, 0x89, 0xa0, 0x86, 0x47, 0xe5, 0x51, 0xb3,
+			    0x56, 0x34, 0x49, 0xbe, 0xfc, 0xfc, 0x97, 0x33 },
+		.result = { 0xb1, 0xa7, 0x07, 0x51, 0x94, 0x95, 0xff, 0xff,
+			    0xb2, 0x98, 0xff, 0x94, 0x17, 0x16, 0xb0, 0x6d,
+			    0xfa, 0xb8, 0x7c, 0xf8, 0xd9, 0x11, 0x23, 0xfe,
+			    0x2b, 0xe9, 0xa2, 0x33, 0xdd, 0xa2, 0x22, 0x12 },
+		.valid = true
+	},
+	/* wycheproof - public key on twist */
+	{
+		.private = { 0xb0, 0x5b, 0xfd, 0x32, 0xe5, 0x53, 0x25, 0xd9,
+			     0xfd, 0x64, 0x8c, 0xb3, 0x02, 0x84, 0x80, 0x39,
+			     0x00, 0x0b, 0x39, 0x0e, 0x44, 0xd5, 0x21, 0xe5,
+			     0x8a, 0xab, 0x3b, 0x29, 0xa6, 0x96, 0x0b, 0xa8 },
+		.public = { 0x0f, 0x83, 0xc3, 0x6f, 0xde, 0xd9, 0xd3, 0x2f,
+			    0xad, 0xf4, 0xef, 0xa3, 0xae, 0x93, 0xa9, 0x0b,
+			    0xb5, 0xcf, 0xa6, 0x68, 0x93, 0xbc, 0x41, 0x2c,
+			    0x43, 0xfa, 0x72, 0x87, 0xdb, 0xb9, 0x97, 0x79 },
+		.result = { 0x67, 0xdd, 0x4a, 0x6e, 0x16, 0x55, 0x33, 0x53,
+			    0x4c, 0x0e, 0x3f, 0x17, 0x2e, 0x4a, 0xb8, 0x57,
+			    0x6b, 0xca, 0x92, 0x3a, 0x5f, 0x07, 0xb2, 0xc0,
+			    0x69, 0xb4, 0xc3, 0x10, 0xff, 0x2e, 0x93, 0x5b },
+		.valid = true
+	},
+	/* wycheproof - public key on twist */
+	{
+		.private = { 0x70, 0xe3, 0x4b, 0xcb, 0xe1, 0xf4, 0x7f, 0xbc,
+			     0x0f, 0xdd, 0xfd, 0x7c, 0x1e, 0x1a, 0xa5, 0x3d,
+			     0x57, 0xbf, 0xe0, 0xf6, 0x6d, 0x24, 0x30, 0x67,
+			     0xb4, 0x24, 0xbb, 0x62, 0x10, 0xbe, 0xd1, 0x9c },
+		.public = { 0x0b, 0x82, 0x11, 0xa2, 0xb6, 0x04, 0x90, 0x97,
+			    0xf6, 0x87, 0x1c, 0x6c, 0x05, 0x2d, 0x3c, 0x5f,
+			    0xc1, 0xba, 0x17, 0xda, 0x9e, 0x32, 0xae, 0x45,
+			    0x84, 0x03, 0xb0, 0x5b, 0xb2, 0x83, 0x09, 0x2a },
+		.result = { 0x4a, 0x06, 0x38, 0xcf, 0xaa, 0x9e, 0xf1, 0x93,
+			    0x3b, 0x47, 0xf8, 0x93, 0x92, 0x96, 0xa6, 0xb2,
+			    0x5b, 0xe5, 0x41, 0xef, 0x7f, 0x70, 0xe8, 0x44,
+			    0xc0, 0xbc, 0xc0, 0x0b, 0x13, 0x4d, 0xe6, 0x4a },
+		.valid = true
+	},
+	/* wycheproof - public key on twist */
+	{
+		.private = { 0x68, 0xc1, 0xf3, 0xa6, 0x53, 0xa4, 0xcd, 0xb1,
+			     0xd3, 0x7b, 0xba, 0x94, 0x73, 0x8f, 0x8b, 0x95,
+			     0x7a, 0x57, 0xbe, 0xb2, 0x4d, 0x64, 0x6e, 0x99,
+			     0x4d, 0xc2, 0x9a, 0x27, 0x6a, 0xad, 0x45, 0x8d },
+		.public = { 0x34, 0x3a, 0xc2, 0x0a, 0x3b, 0x9c, 0x6a, 0x27,
+			    0xb1, 0x00, 0x81, 0x76, 0x50, 0x9a, 0xd3, 0x07,
+			    0x35, 0x85, 0x6e, 0xc1, 0xc8, 0xd8, 0xfc, 0xae,
+			    0x13, 0x91, 0x2d, 0x08, 0xd1, 0x52, 0xf4, 0x6c },
+		.result = { 0x39, 0x94, 0x91, 0xfc, 0xe8, 0xdf, 0xab, 0x73,
+			    0xb4, 0xf9, 0xf6, 0x11, 0xde, 0x8e, 0xa0, 0xb2,
+			    0x7b, 0x28, 0xf8, 0x59, 0x94, 0x25, 0x0b, 0x0f,
+			    0x47, 0x5d, 0x58, 0x5d, 0x04, 0x2a, 0xc2, 0x07 },
+		.valid = true
+	},
+	/* wycheproof - public key on twist */
+	{
+		.private = { 0xd8, 0x77, 0xb2, 0x6d, 0x06, 0xdf, 0xf9, 0xd9,
+			     0xf7, 0xfd, 0x4c, 0x5b, 0x37, 0x69, 0xf8, 0xcd,
+			     0xd5, 0xb3, 0x05, 0x16, 0xa5, 0xab, 0x80, 0x6b,
+			     0xe3, 0x24, 0xff, 0x3e, 0xb6, 0x9e, 0xa0, 0xb2 },
+		.public = { 0xfa, 0x69, 0x5f, 0xc7, 0xbe, 0x8d, 0x1b, 0xe5,
+			    0xbf, 0x70, 0x48, 0x98, 0xf3, 0x88, 0xc4, 0x52,
+			    0xba, 0xfd, 0xd3, 0xb8, 0xea, 0xe8, 0x05, 0xf8,
+			    0x68, 0x1a, 0x8d, 0x15, 0xc2, 0xd4, 0xe1, 0x42 },
+		.result = { 0x2c, 0x4f, 0xe1, 0x1d, 0x49, 0x0a, 0x53, 0x86,
+			    0x17, 0x76, 0xb1, 0x3b, 0x43, 0x54, 0xab, 0xd4,
+			    0xcf, 0x5a, 0x97, 0x69, 0x9d, 0xb6, 0xe6, 0xc6,
+			    0x8c, 0x16, 0x26, 0xd0, 0x76, 0x62, 0xf7, 0x58 },
+		.valid = true
+	},
+	/* wycheproof - public key = 0 */
+	{
+		.private = { 0x20, 0x74, 0x94, 0x03, 0x8f, 0x2b, 0xb8, 0x11,
+			     0xd4, 0x78, 0x05, 0xbc, 0xdf, 0x04, 0xa2, 0xac,
+			     0x58, 0x5a, 0xda, 0x7f, 0x2f, 0x23, 0x38, 0x9b,
+			     0xfd, 0x46, 0x58, 0xf9, 0xdd, 0xd4, 0xde, 0xbc },
+		.public = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key = 1 */
+	{
+		.private = { 0x20, 0x2e, 0x89, 0x72, 0xb6, 0x1c, 0x7e, 0x61,
+			     0x93, 0x0e, 0xb9, 0x45, 0x0b, 0x50, 0x70, 0xea,
+			     0xe1, 0xc6, 0x70, 0x47, 0x56, 0x85, 0x54, 0x1f,
+			     0x04, 0x76, 0x21, 0x7e, 0x48, 0x18, 0xcf, 0xab },
+		.public = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - edge case on twist */
+	{
+		.private = { 0x38, 0xdd, 0xe9, 0xf3, 0xe7, 0xb7, 0x99, 0x04,
+			     0x5f, 0x9a, 0xc3, 0x79, 0x3d, 0x4a, 0x92, 0x77,
+			     0xda, 0xde, 0xad, 0xc4, 0x1b, 0xec, 0x02, 0x90,
+			     0xf8, 0x1f, 0x74, 0x4f, 0x73, 0x77, 0x5f, 0x84 },
+		.public = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.result = { 0x9a, 0x2c, 0xfe, 0x84, 0xff, 0x9c, 0x4a, 0x97,
+			    0x39, 0x62, 0x5c, 0xae, 0x4a, 0x3b, 0x82, 0xa9,
+			    0x06, 0x87, 0x7a, 0x44, 0x19, 0x46, 0xf8, 0xd7,
+			    0xb3, 0xd7, 0x95, 0xfe, 0x8f, 0x5d, 0x16, 0x39 },
+		.valid = true
+	},
+	/* wycheproof - edge case on twist */
+	{
+		.private = { 0x98, 0x57, 0xa9, 0x14, 0xe3, 0xc2, 0x90, 0x36,
+			     0xfd, 0x9a, 0x44, 0x2b, 0xa5, 0x26, 0xb5, 0xcd,
+			     0xcd, 0xf2, 0x82, 0x16, 0x15, 0x3e, 0x63, 0x6c,
+			     0x10, 0x67, 0x7a, 0xca, 0xb6, 0xbd, 0x6a, 0xa5 },
+		.public = { 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.result = { 0x4d, 0xa4, 0xe0, 0xaa, 0x07, 0x2c, 0x23, 0x2e,
+			    0xe2, 0xf0, 0xfa, 0x4e, 0x51, 0x9a, 0xe5, 0x0b,
+			    0x52, 0xc1, 0xed, 0xd0, 0x8a, 0x53, 0x4d, 0x4e,
+			    0xf3, 0x46, 0xc2, 0xe1, 0x06, 0xd2, 0x1d, 0x60 },
+		.valid = true
+	},
+	/* wycheproof - edge case on twist */
+	{
+		.private = { 0x48, 0xe2, 0x13, 0x0d, 0x72, 0x33, 0x05, 0xed,
+			     0x05, 0xe6, 0xe5, 0x89, 0x4d, 0x39, 0x8a, 0x5e,
+			     0x33, 0x36, 0x7a, 0x8c, 0x6a, 0xac, 0x8f, 0xcd,
+			     0xf0, 0xa8, 0x8e, 0x4b, 0x42, 0x82, 0x0d, 0xb7 },
+		.public = { 0xff, 0xff, 0xff, 0x03, 0x00, 0x00, 0xf8, 0xff,
+			    0xff, 0x1f, 0x00, 0x00, 0xc0, 0xff, 0xff, 0xff,
+			    0x00, 0x00, 0x00, 0xfe, 0xff, 0xff, 0x07, 0x00,
+			    0x00, 0xf0, 0xff, 0xff, 0x3f, 0x00, 0x00, 0x00 },
+		.result = { 0x9e, 0xd1, 0x0c, 0x53, 0x74, 0x7f, 0x64, 0x7f,
+			    0x82, 0xf4, 0x51, 0x25, 0xd3, 0xde, 0x15, 0xa1,
+			    0xe6, 0xb8, 0x24, 0x49, 0x6a, 0xb4, 0x04, 0x10,
+			    0xff, 0xcc, 0x3c, 0xfe, 0x95, 0x76, 0x0f, 0x3b },
+		.valid = true
+	},
+	/* wycheproof - edge case on twist */
+	{
+		.private = { 0x28, 0xf4, 0x10, 0x11, 0x69, 0x18, 0x51, 0xb3,
+			     0xa6, 0x2b, 0x64, 0x15, 0x53, 0xb3, 0x0d, 0x0d,
+			     0xfd, 0xdc, 0xb8, 0xff, 0xfc, 0xf5, 0x37, 0x00,
+			     0xa7, 0xbe, 0x2f, 0x6a, 0x87, 0x2e, 0x9f, 0xb0 },
+		.public = { 0x00, 0x00, 0x00, 0xfc, 0xff, 0xff, 0x07, 0x00,
+			    0x00, 0xe0, 0xff, 0xff, 0x3f, 0x00, 0x00, 0x00,
+			    0xff, 0xff, 0xff, 0x01, 0x00, 0x00, 0xf8, 0xff,
+			    0xff, 0x0f, 0x00, 0x00, 0xc0, 0xff, 0xff, 0x7f },
+		.result = { 0xcf, 0x72, 0xb4, 0xaa, 0x6a, 0xa1, 0xc9, 0xf8,
+			    0x94, 0xf4, 0x16, 0x5b, 0x86, 0x10, 0x9a, 0xa4,
+			    0x68, 0x51, 0x76, 0x48, 0xe1, 0xf0, 0xcc, 0x70,
+			    0xe1, 0xab, 0x08, 0x46, 0x01, 0x76, 0x50, 0x6b },
+		.valid = true
+	},
+	/* wycheproof - edge case on twist */
+	{
+		.private = { 0x18, 0xa9, 0x3b, 0x64, 0x99, 0xb9, 0xf6, 0xb3,
+			     0x22, 0x5c, 0xa0, 0x2f, 0xef, 0x41, 0x0e, 0x0a,
+			     0xde, 0xc2, 0x35, 0x32, 0x32, 0x1d, 0x2d, 0x8e,
+			     0xf1, 0xa6, 0xd6, 0x02, 0xa8, 0xc6, 0x5b, 0x83 },
+		.public = { 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+			    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+			    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff,
+			    0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0x5d, 0x50, 0xb6, 0x28, 0x36, 0xbb, 0x69, 0x57,
+			    0x94, 0x10, 0x38, 0x6c, 0xf7, 0xbb, 0x81, 0x1c,
+			    0x14, 0xbf, 0x85, 0xb1, 0xc7, 0xb1, 0x7e, 0x59,
+			    0x24, 0xc7, 0xff, 0xea, 0x91, 0xef, 0x9e, 0x12 },
+		.valid = true
+	},
+	/* wycheproof - edge case on twist */
+	{
+		.private = { 0xc0, 0x1d, 0x13, 0x05, 0xa1, 0x33, 0x8a, 0x1f,
+			     0xca, 0xc2, 0xba, 0x7e, 0x2e, 0x03, 0x2b, 0x42,
+			     0x7e, 0x0b, 0x04, 0x90, 0x31, 0x65, 0xac, 0xa9,
+			     0x57, 0xd8, 0xd0, 0x55, 0x3d, 0x87, 0x17, 0xb0 },
+		.public = { 0xea, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0x19, 0x23, 0x0e, 0xb1, 0x48, 0xd5, 0xd6, 0x7c,
+			    0x3c, 0x22, 0xab, 0x1d, 0xae, 0xff, 0x80, 0xa5,
+			    0x7e, 0xae, 0x42, 0x65, 0xce, 0x28, 0x72, 0x65,
+			    0x7b, 0x2c, 0x80, 0x99, 0xfc, 0x69, 0x8e, 0x50 },
+		.valid = true
+	},
+	/* wycheproof - edge case for public key */
+	{
+		.private = { 0x38, 0x6f, 0x7f, 0x16, 0xc5, 0x07, 0x31, 0xd6,
+			     0x4f, 0x82, 0xe6, 0xa1, 0x70, 0xb1, 0x42, 0xa4,
+			     0xe3, 0x4f, 0x31, 0xfd, 0x77, 0x68, 0xfc, 0xb8,
+			     0x90, 0x29, 0x25, 0xe7, 0xd1, 0xe2, 0x1a, 0xbe },
+		.public = { 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.result = { 0x0f, 0xca, 0xb5, 0xd8, 0x42, 0xa0, 0x78, 0xd7,
+			    0xa7, 0x1f, 0xc5, 0x9b, 0x57, 0xbf, 0xb4, 0xca,
+			    0x0b, 0xe6, 0x87, 0x3b, 0x49, 0xdc, 0xdb, 0x9f,
+			    0x44, 0xe1, 0x4a, 0xe8, 0xfb, 0xdf, 0xa5, 0x42 },
+		.valid = true
+	},
+	/* wycheproof - edge case for public key */
+	{
+		.private = { 0xe0, 0x23, 0xa2, 0x89, 0xbd, 0x5e, 0x90, 0xfa,
+			     0x28, 0x04, 0xdd, 0xc0, 0x19, 0xa0, 0x5e, 0xf3,
+			     0xe7, 0x9d, 0x43, 0x4b, 0xb6, 0xea, 0x2f, 0x52,
+			     0x2e, 0xcb, 0x64, 0x3a, 0x75, 0x29, 0x6e, 0x95 },
+		.public = { 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+			    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+			    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+			    0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00 },
+		.result = { 0x54, 0xce, 0x8f, 0x22, 0x75, 0xc0, 0x77, 0xe3,
+			    0xb1, 0x30, 0x6a, 0x39, 0x39, 0xc5, 0xe0, 0x3e,
+			    0xef, 0x6b, 0xbb, 0x88, 0x06, 0x05, 0x44, 0x75,
+			    0x8d, 0x9f, 0xef, 0x59, 0xb0, 0xbc, 0x3e, 0x4f },
+		.valid = true
+	},
+	/* wycheproof - edge case for public key */
+	{
+		.private = { 0x68, 0xf0, 0x10, 0xd6, 0x2e, 0xe8, 0xd9, 0x26,
+			     0x05, 0x3a, 0x36, 0x1c, 0x3a, 0x75, 0xc6, 0xea,
+			     0x4e, 0xbd, 0xc8, 0x60, 0x6a, 0xb2, 0x85, 0x00,
+			     0x3a, 0x6f, 0x8f, 0x40, 0x76, 0xb0, 0x1e, 0x83 },
+		.public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x03 },
+		.result = { 0xf1, 0x36, 0x77, 0x5c, 0x5b, 0xeb, 0x0a, 0xf8,
+			    0x11, 0x0a, 0xf1, 0x0b, 0x20, 0x37, 0x23, 0x32,
+			    0x04, 0x3c, 0xab, 0x75, 0x24, 0x19, 0x67, 0x87,
+			    0x75, 0xa2, 0x23, 0xdf, 0x57, 0xc9, 0xd3, 0x0d },
+		.valid = true
+	},
+	/* wycheproof - edge case for public key */
+	{
+		.private = { 0x58, 0xeb, 0xcb, 0x35, 0xb0, 0xf8, 0x84, 0x5c,
+			     0xaf, 0x1e, 0xc6, 0x30, 0xf9, 0x65, 0x76, 0xb6,
+			     0x2c, 0x4b, 0x7b, 0x6c, 0x36, 0xb2, 0x9d, 0xeb,
+			     0x2c, 0xb0, 0x08, 0x46, 0x51, 0x75, 0x5c, 0x96 },
+		.public = { 0xff, 0xff, 0xff, 0xfb, 0xff, 0xff, 0xfb, 0xff,
+			    0xff, 0xdf, 0xff, 0xff, 0xdf, 0xff, 0xff, 0xff,
+			    0xfe, 0xff, 0xff, 0xfe, 0xff, 0xff, 0xf7, 0xff,
+			    0xff, 0xf7, 0xff, 0xff, 0xbf, 0xff, 0xff, 0x3f },
+		.result = { 0xbf, 0x9a, 0xff, 0xd0, 0x6b, 0x84, 0x40, 0x85,
+			    0x58, 0x64, 0x60, 0x96, 0x2e, 0xf2, 0x14, 0x6f,
+			    0xf3, 0xd4, 0x53, 0x3d, 0x94, 0x44, 0xaa, 0xb0,
+			    0x06, 0xeb, 0x88, 0xcc, 0x30, 0x54, 0x40, 0x7d },
+		.valid = true
+	},
+	/* wycheproof - edge case for public key */
+	{
+		.private = { 0x18, 0x8c, 0x4b, 0xc5, 0xb9, 0xc4, 0x4b, 0x38,
+			     0xbb, 0x65, 0x8b, 0x9b, 0x2a, 0xe8, 0x2d, 0x5b,
+			     0x01, 0x01, 0x5e, 0x09, 0x31, 0x84, 0xb1, 0x7c,
+			     0xb7, 0x86, 0x35, 0x03, 0xa7, 0x83, 0xe1, 0xbb },
+		.public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f },
+		.result = { 0xd4, 0x80, 0xde, 0x04, 0xf6, 0x99, 0xcb, 0x3b,
+			    0xe0, 0x68, 0x4a, 0x9c, 0xc2, 0xe3, 0x12, 0x81,
+			    0xea, 0x0b, 0xc5, 0xa9, 0xdc, 0xc1, 0x57, 0xd3,
+			    0xd2, 0x01, 0x58, 0xd4, 0x6c, 0xa5, 0x24, 0x6d },
+		.valid = true
+	},
+	/* wycheproof - edge case for public key */
+	{
+		.private = { 0xe0, 0x6c, 0x11, 0xbb, 0x2e, 0x13, 0xce, 0x3d,
+			     0xc7, 0x67, 0x3f, 0x67, 0xf5, 0x48, 0x22, 0x42,
+			     0x90, 0x94, 0x23, 0xa9, 0xae, 0x95, 0xee, 0x98,
+			     0x6a, 0x98, 0x8d, 0x98, 0xfa, 0xee, 0x23, 0xa2 },
+		.public = { 0xff, 0xff, 0xff, 0xff, 0xfe, 0xff, 0xff, 0x7f,
+			    0xff, 0xff, 0xff, 0xff, 0xfe, 0xff, 0xff, 0x7f,
+			    0xff, 0xff, 0xff, 0xff, 0xfe, 0xff, 0xff, 0x7f,
+			    0xff, 0xff, 0xff, 0xff, 0xfe, 0xff, 0xff, 0x7f },
+		.result = { 0x4c, 0x44, 0x01, 0xcc, 0xe6, 0xb5, 0x1e, 0x4c,
+			    0xb1, 0x8f, 0x27, 0x90, 0x24, 0x6c, 0x9b, 0xf9,
+			    0x14, 0xdb, 0x66, 0x77, 0x50, 0xa1, 0xcb, 0x89,
+			    0x06, 0x90, 0x92, 0xaf, 0x07, 0x29, 0x22, 0x76 },
+		.valid = true
+	},
+	/* wycheproof - edge case for public key */
+	{
+		.private = { 0xc0, 0x65, 0x8c, 0x46, 0xdd, 0xe1, 0x81, 0x29,
+			     0x29, 0x38, 0x77, 0x53, 0x5b, 0x11, 0x62, 0xb6,
+			     0xf9, 0xf5, 0x41, 0x4a, 0x23, 0xcf, 0x4d, 0x2c,
+			     0xbc, 0x14, 0x0a, 0x4d, 0x99, 0xda, 0x2b, 0x8f },
+		.public = { 0xeb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0x57, 0x8b, 0xa8, 0xcc, 0x2d, 0xbd, 0xc5, 0x75,
+			    0xaf, 0xcf, 0x9d, 0xf2, 0xb3, 0xee, 0x61, 0x89,
+			    0xf5, 0x33, 0x7d, 0x68, 0x54, 0xc7, 0x9b, 0x4c,
+			    0xe1, 0x65, 0xea, 0x12, 0x29, 0x3b, 0x3a, 0x0f },
+		.valid = true
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0x10, 0x25, 0x5c, 0x92, 0x30, 0xa9, 0x7a, 0x30,
+			     0xa4, 0x58, 0xca, 0x28, 0x4a, 0x62, 0x96, 0x69,
+			     0x29, 0x3a, 0x31, 0x89, 0x0c, 0xda, 0x9d, 0x14,
+			     0x7f, 0xeb, 0xc7, 0xd1, 0xe2, 0x2d, 0x6b, 0xb1 },
+		.public = { 0xe0, 0xeb, 0x7a, 0x7c, 0x3b, 0x41, 0xb8, 0xae,
+			    0x16, 0x56, 0xe3, 0xfa, 0xf1, 0x9f, 0xc4, 0x6a,
+			    0xda, 0x09, 0x8d, 0xeb, 0x9c, 0x32, 0xb1, 0xfd,
+			    0x86, 0x62, 0x05, 0x16, 0x5f, 0x49, 0xb8, 0x00 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0x78, 0xf1, 0xe8, 0xed, 0xf1, 0x44, 0x81, 0xb3,
+			     0x89, 0x44, 0x8d, 0xac, 0x8f, 0x59, 0xc7, 0x0b,
+			     0x03, 0x8e, 0x7c, 0xf9, 0x2e, 0xf2, 0xc7, 0xef,
+			     0xf5, 0x7a, 0x72, 0x46, 0x6e, 0x11, 0x52, 0x96 },
+		.public = { 0x5f, 0x9c, 0x95, 0xbc, 0xa3, 0x50, 0x8c, 0x24,
+			    0xb1, 0xd0, 0xb1, 0x55, 0x9c, 0x83, 0xef, 0x5b,
+			    0x04, 0x44, 0x5c, 0xc4, 0x58, 0x1c, 0x8e, 0x86,
+			    0xd8, 0x22, 0x4e, 0xdd, 0xd0, 0x9f, 0x11, 0x57 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0xa0, 0xa0, 0x5a, 0x3e, 0x8f, 0x9f, 0x44, 0x20,
+			     0x4d, 0x5f, 0x80, 0x59, 0xa9, 0x4a, 0xc7, 0xdf,
+			     0xc3, 0x9a, 0x49, 0xac, 0x01, 0x6d, 0xd7, 0x43,
+			     0xdb, 0xfa, 0x43, 0xc5, 0xd6, 0x71, 0xfd, 0x88 },
+		.public = { 0xec, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0xd0, 0xdb, 0xb3, 0xed, 0x19, 0x06, 0x66, 0x3f,
+			     0x15, 0x42, 0x0a, 0xf3, 0x1f, 0x4e, 0xaf, 0x65,
+			     0x09, 0xd9, 0xa9, 0x94, 0x97, 0x23, 0x50, 0x06,
+			     0x05, 0xad, 0x7c, 0x1c, 0x6e, 0x74, 0x50, 0xa9 },
+		.public = { 0xed, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0xc0, 0xb1, 0xd0, 0xeb, 0x22, 0xb2, 0x44, 0xfe,
+			     0x32, 0x91, 0x14, 0x00, 0x72, 0xcd, 0xd9, 0xd9,
+			     0x89, 0xb5, 0xf0, 0xec, 0xd9, 0x6c, 0x10, 0x0f,
+			     0xeb, 0x5b, 0xca, 0x24, 0x1c, 0x1d, 0x9f, 0x8f },
+		.public = { 0xee, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0x48, 0x0b, 0xf4, 0x5f, 0x59, 0x49, 0x42, 0xa8,
+			     0xbc, 0x0f, 0x33, 0x53, 0xc6, 0xe8, 0xb8, 0x85,
+			     0x3d, 0x77, 0xf3, 0x51, 0xf1, 0xc2, 0xca, 0x6c,
+			     0x2d, 0x1a, 0xbf, 0x8a, 0x00, 0xb4, 0x22, 0x9c },
+		.public = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0x30, 0xf9, 0x93, 0xfc, 0xf8, 0x51, 0x4f, 0xc8,
+			     0x9b, 0xd8, 0xdb, 0x14, 0xcd, 0x43, 0xba, 0x0d,
+			     0x4b, 0x25, 0x30, 0xe7, 0x3c, 0x42, 0x76, 0xa0,
+			     0x5e, 0x1b, 0x14, 0x5d, 0x42, 0x0c, 0xed, 0xb4 },
+		.public = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0xc0, 0x49, 0x74, 0xb7, 0x58, 0x38, 0x0e, 0x2a,
+			     0x5b, 0x5d, 0xf6, 0xeb, 0x09, 0xbb, 0x2f, 0x6b,
+			     0x34, 0x34, 0xf9, 0x82, 0x72, 0x2a, 0x8e, 0x67,
+			     0x6d, 0x3d, 0xa2, 0x51, 0xd1, 0xb3, 0xde, 0x83 },
+		.public = { 0xe0, 0xeb, 0x7a, 0x7c, 0x3b, 0x41, 0xb8, 0xae,
+			    0x16, 0x56, 0xe3, 0xfa, 0xf1, 0x9f, 0xc4, 0x6a,
+			    0xda, 0x09, 0x8d, 0xeb, 0x9c, 0x32, 0xb1, 0xfd,
+			    0x86, 0x62, 0x05, 0x16, 0x5f, 0x49, 0xb8, 0x80 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0x50, 0x2a, 0x31, 0x37, 0x3d, 0xb3, 0x24, 0x46,
+			     0x84, 0x2f, 0xe5, 0xad, 0xd3, 0xe0, 0x24, 0x02,
+			     0x2e, 0xa5, 0x4f, 0x27, 0x41, 0x82, 0xaf, 0xc3,
+			     0xd9, 0xf1, 0xbb, 0x3d, 0x39, 0x53, 0x4e, 0xb5 },
+		.public = { 0x5f, 0x9c, 0x95, 0xbc, 0xa3, 0x50, 0x8c, 0x24,
+			    0xb1, 0xd0, 0xb1, 0x55, 0x9c, 0x83, 0xef, 0x5b,
+			    0x04, 0x44, 0x5c, 0xc4, 0x58, 0x1c, 0x8e, 0x86,
+			    0xd8, 0x22, 0x4e, 0xdd, 0xd0, 0x9f, 0x11, 0xd7 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0x90, 0xfa, 0x64, 0x17, 0xb0, 0xe3, 0x70, 0x30,
+			     0xfd, 0x6e, 0x43, 0xef, 0xf2, 0xab, 0xae, 0xf1,
+			     0x4c, 0x67, 0x93, 0x11, 0x7a, 0x03, 0x9c, 0xf6,
+			     0x21, 0x31, 0x8b, 0xa9, 0x0f, 0x4e, 0x98, 0xbe },
+		.public = { 0xec, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0x78, 0xad, 0x3f, 0x26, 0x02, 0x7f, 0x1c, 0x9f,
+			     0xdd, 0x97, 0x5a, 0x16, 0x13, 0xb9, 0x47, 0x77,
+			     0x9b, 0xad, 0x2c, 0xf2, 0xb7, 0x41, 0xad, 0xe0,
+			     0x18, 0x40, 0x88, 0x5a, 0x30, 0xbb, 0x97, 0x9c },
+		.public = { 0xed, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key with low order */
+	{
+		.private = { 0x98, 0xe2, 0x3d, 0xe7, 0xb1, 0xe0, 0x92, 0x6e,
+			     0xd9, 0xc8, 0x7e, 0x7b, 0x14, 0xba, 0xf5, 0x5f,
+			     0x49, 0x7a, 0x1d, 0x70, 0x96, 0xf9, 0x39, 0x77,
+			     0x68, 0x0e, 0x44, 0xdc, 0x1c, 0x7b, 0x7b, 0x8b },
+		.public = { 0xee, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = false
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0xf0, 0x1e, 0x48, 0xda, 0xfa, 0xc9, 0xd7, 0xbc,
+			     0xf5, 0x89, 0xcb, 0xc3, 0x82, 0xc8, 0x78, 0xd1,
+			     0x8b, 0xda, 0x35, 0x50, 0x58, 0x9f, 0xfb, 0x5d,
+			     0x50, 0xb5, 0x23, 0xbe, 0xbe, 0x32, 0x9d, 0xae },
+		.public = { 0xef, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0xbd, 0x36, 0xa0, 0x79, 0x0e, 0xb8, 0x83, 0x09,
+			    0x8c, 0x98, 0x8b, 0x21, 0x78, 0x67, 0x73, 0xde,
+			    0x0b, 0x3a, 0x4d, 0xf1, 0x62, 0x28, 0x2c, 0xf1,
+			    0x10, 0xde, 0x18, 0xdd, 0x48, 0x4c, 0xe7, 0x4b },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x28, 0x87, 0x96, 0xbc, 0x5a, 0xff, 0x4b, 0x81,
+			     0xa3, 0x75, 0x01, 0x75, 0x7b, 0xc0, 0x75, 0x3a,
+			     0x3c, 0x21, 0x96, 0x47, 0x90, 0xd3, 0x86, 0x99,
+			     0x30, 0x8d, 0xeb, 0xc1, 0x7a, 0x6e, 0xaf, 0x8d },
+		.public = { 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0xb4, 0xe0, 0xdd, 0x76, 0xda, 0x7b, 0x07, 0x17,
+			    0x28, 0xb6, 0x1f, 0x85, 0x67, 0x71, 0xaa, 0x35,
+			    0x6e, 0x57, 0xed, 0xa7, 0x8a, 0x5b, 0x16, 0x55,
+			    0xcc, 0x38, 0x20, 0xfb, 0x5f, 0x85, 0x4c, 0x5c },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x98, 0xdf, 0x84, 0x5f, 0x66, 0x51, 0xbf, 0x11,
+			     0x38, 0x22, 0x1f, 0x11, 0x90, 0x41, 0xf7, 0x2b,
+			     0x6d, 0xbc, 0x3c, 0x4a, 0xce, 0x71, 0x43, 0xd9,
+			     0x9f, 0xd5, 0x5a, 0xd8, 0x67, 0x48, 0x0d, 0xa8 },
+		.public = { 0xf1, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0x6f, 0xdf, 0x6c, 0x37, 0x61, 0x1d, 0xbd, 0x53,
+			    0x04, 0xdc, 0x0f, 0x2e, 0xb7, 0xc9, 0x51, 0x7e,
+			    0xb3, 0xc5, 0x0e, 0x12, 0xfd, 0x05, 0x0a, 0xc6,
+			    0xde, 0xc2, 0x70, 0x71, 0xd4, 0xbf, 0xc0, 0x34 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0xf0, 0x94, 0x98, 0xe4, 0x6f, 0x02, 0xf8, 0x78,
+			     0x82, 0x9e, 0x78, 0xb8, 0x03, 0xd3, 0x16, 0xa2,
+			     0xed, 0x69, 0x5d, 0x04, 0x98, 0xa0, 0x8a, 0xbd,
+			     0xf8, 0x27, 0x69, 0x30, 0xe2, 0x4e, 0xdc, 0xb0 },
+		.public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.result = { 0x4c, 0x8f, 0xc4, 0xb1, 0xc6, 0xab, 0x88, 0xfb,
+			    0x21, 0xf1, 0x8f, 0x6d, 0x4c, 0x81, 0x02, 0x40,
+			    0xd4, 0xe9, 0x46, 0x51, 0xba, 0x44, 0xf7, 0xa2,
+			    0xc8, 0x63, 0xce, 0xc7, 0xdc, 0x56, 0x60, 0x2d },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x18, 0x13, 0xc1, 0x0a, 0x5c, 0x7f, 0x21, 0xf9,
+			     0x6e, 0x17, 0xf2, 0x88, 0xc0, 0xcc, 0x37, 0x60,
+			     0x7c, 0x04, 0xc5, 0xf5, 0xae, 0xa2, 0xdb, 0x13,
+			     0x4f, 0x9e, 0x2f, 0xfc, 0x66, 0xbd, 0x9d, 0xb8 },
+		.public = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 },
+		.result = { 0x1c, 0xd0, 0xb2, 0x82, 0x67, 0xdc, 0x54, 0x1c,
+			    0x64, 0x2d, 0x6d, 0x7d, 0xca, 0x44, 0xa8, 0xb3,
+			    0x8a, 0x63, 0x73, 0x6e, 0xef, 0x5c, 0x4e, 0x65,
+			    0x01, 0xff, 0xbb, 0xb1, 0x78, 0x0c, 0x03, 0x3c },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x78, 0x57, 0xfb, 0x80, 0x86, 0x53, 0x64, 0x5a,
+			     0x0b, 0xeb, 0x13, 0x8a, 0x64, 0xf5, 0xf4, 0xd7,
+			     0x33, 0xa4, 0x5e, 0xa8, 0x4c, 0x3c, 0xda, 0x11,
+			     0xa9, 0xc0, 0x6f, 0x7e, 0x71, 0x39, 0x14, 0x9e },
+		.public = { 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 },
+		.result = { 0x87, 0x55, 0xbe, 0x01, 0xc6, 0x0a, 0x7e, 0x82,
+			    0x5c, 0xff, 0x3e, 0x0e, 0x78, 0xcb, 0x3a, 0xa4,
+			    0x33, 0x38, 0x61, 0x51, 0x6a, 0xa5, 0x9b, 0x1c,
+			    0x51, 0xa8, 0xb2, 0xa5, 0x43, 0xdf, 0xa8, 0x22 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0xe0, 0x3a, 0xa8, 0x42, 0xe2, 0xab, 0xc5, 0x6e,
+			     0x81, 0xe8, 0x7b, 0x8b, 0x9f, 0x41, 0x7b, 0x2a,
+			     0x1e, 0x59, 0x13, 0xc7, 0x23, 0xee, 0xd2, 0x8d,
+			     0x75, 0x2f, 0x8d, 0x47, 0xa5, 0x9f, 0x49, 0x8f },
+		.public = { 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 },
+		.result = { 0x54, 0xc9, 0xa1, 0xed, 0x95, 0xe5, 0x46, 0xd2,
+			    0x78, 0x22, 0xa3, 0x60, 0x93, 0x1d, 0xda, 0x60,
+			    0xa1, 0xdf, 0x04, 0x9d, 0xa6, 0xf9, 0x04, 0x25,
+			    0x3c, 0x06, 0x12, 0xbb, 0xdc, 0x08, 0x74, 0x76 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0xf8, 0xf7, 0x07, 0xb7, 0x99, 0x9b, 0x18, 0xcb,
+			     0x0d, 0x6b, 0x96, 0x12, 0x4f, 0x20, 0x45, 0x97,
+			     0x2c, 0xa2, 0x74, 0xbf, 0xc1, 0x54, 0xad, 0x0c,
+			     0x87, 0x03, 0x8c, 0x24, 0xc6, 0xd0, 0xd4, 0xb2 },
+		.public = { 0xda, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0xcc, 0x1f, 0x40, 0xd7, 0x43, 0xcd, 0xc2, 0x23,
+			    0x0e, 0x10, 0x43, 0xda, 0xba, 0x8b, 0x75, 0xe8,
+			    0x10, 0xf1, 0xfb, 0xab, 0x7f, 0x25, 0x52, 0x69,
+			    0xbd, 0x9e, 0xbb, 0x29, 0xe6, 0xbf, 0x49, 0x4f },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0xa0, 0x34, 0xf6, 0x84, 0xfa, 0x63, 0x1e, 0x1a,
+			     0x34, 0x81, 0x18, 0xc1, 0xce, 0x4c, 0x98, 0x23,
+			     0x1f, 0x2d, 0x9e, 0xec, 0x9b, 0xa5, 0x36, 0x5b,
+			     0x4a, 0x05, 0xd6, 0x9a, 0x78, 0x5b, 0x07, 0x96 },
+		.public = { 0xdb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x54, 0x99, 0x8e, 0xe4, 0x3a, 0x5b, 0x00, 0x7b,
+			    0xf4, 0x99, 0xf0, 0x78, 0xe7, 0x36, 0x52, 0x44,
+			    0x00, 0xa8, 0xb5, 0xc7, 0xe9, 0xb9, 0xb4, 0x37,
+			    0x71, 0x74, 0x8c, 0x7c, 0xdf, 0x88, 0x04, 0x12 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x30, 0xb6, 0xc6, 0xa0, 0xf2, 0xff, 0xa6, 0x80,
+			     0x76, 0x8f, 0x99, 0x2b, 0xa8, 0x9e, 0x15, 0x2d,
+			     0x5b, 0xc9, 0x89, 0x3d, 0x38, 0xc9, 0x11, 0x9b,
+			     0xe4, 0xf7, 0x67, 0xbf, 0xab, 0x6e, 0x0c, 0xa5 },
+		.public = { 0xdc, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0xea, 0xd9, 0xb3, 0x8e, 0xfd, 0xd7, 0x23, 0x63,
+			    0x79, 0x34, 0xe5, 0x5a, 0xb7, 0x17, 0xa7, 0xae,
+			    0x09, 0xeb, 0x86, 0xa2, 0x1d, 0xc3, 0x6a, 0x3f,
+			    0xee, 0xb8, 0x8b, 0x75, 0x9e, 0x39, 0x1e, 0x09 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x90, 0x1b, 0x9d, 0xcf, 0x88, 0x1e, 0x01, 0xe0,
+			     0x27, 0x57, 0x50, 0x35, 0xd4, 0x0b, 0x43, 0xbd,
+			     0xc1, 0xc5, 0x24, 0x2e, 0x03, 0x08, 0x47, 0x49,
+			     0x5b, 0x0c, 0x72, 0x86, 0x46, 0x9b, 0x65, 0x91 },
+		.public = { 0xea, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x60, 0x2f, 0xf4, 0x07, 0x89, 0xb5, 0x4b, 0x41,
+			    0x80, 0x59, 0x15, 0xfe, 0x2a, 0x62, 0x21, 0xf0,
+			    0x7a, 0x50, 0xff, 0xc2, 0xc3, 0xfc, 0x94, 0xcf,
+			    0x61, 0xf1, 0x3d, 0x79, 0x04, 0xe8, 0x8e, 0x0e },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x80, 0x46, 0x67, 0x7c, 0x28, 0xfd, 0x82, 0xc9,
+			     0xa1, 0xbd, 0xb7, 0x1a, 0x1a, 0x1a, 0x34, 0xfa,
+			     0xba, 0x12, 0x25, 0xe2, 0x50, 0x7f, 0xe3, 0xf5,
+			     0x4d, 0x10, 0xbd, 0x5b, 0x0d, 0x86, 0x5f, 0x8e },
+		.public = { 0xeb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0xe0, 0x0a, 0xe8, 0xb1, 0x43, 0x47, 0x12, 0x47,
+			    0xba, 0x24, 0xf1, 0x2c, 0x88, 0x55, 0x36, 0xc3,
+			    0xcb, 0x98, 0x1b, 0x58, 0xe1, 0xe5, 0x6b, 0x2b,
+			    0xaf, 0x35, 0xc1, 0x2a, 0xe1, 0xf7, 0x9c, 0x26 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x60, 0x2f, 0x7e, 0x2f, 0x68, 0xa8, 0x46, 0xb8,
+			     0x2c, 0xc2, 0x69, 0xb1, 0xd4, 0x8e, 0x93, 0x98,
+			     0x86, 0xae, 0x54, 0xfd, 0x63, 0x6c, 0x1f, 0xe0,
+			     0x74, 0xd7, 0x10, 0x12, 0x7d, 0x47, 0x24, 0x91 },
+		.public = { 0xef, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x98, 0xcb, 0x9b, 0x50, 0xdd, 0x3f, 0xc2, 0xb0,
+			    0xd4, 0xf2, 0xd2, 0xbf, 0x7c, 0x5c, 0xfd, 0xd1,
+			    0x0c, 0x8f, 0xcd, 0x31, 0xfc, 0x40, 0xaf, 0x1a,
+			    0xd4, 0x4f, 0x47, 0xc1, 0x31, 0x37, 0x63, 0x62 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x60, 0x88, 0x7b, 0x3d, 0xc7, 0x24, 0x43, 0x02,
+			     0x6e, 0xbe, 0xdb, 0xbb, 0xb7, 0x06, 0x65, 0xf4,
+			     0x2b, 0x87, 0xad, 0xd1, 0x44, 0x0e, 0x77, 0x68,
+			     0xfb, 0xd7, 0xe8, 0xe2, 0xce, 0x5f, 0x63, 0x9d },
+		.public = { 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x38, 0xd6, 0x30, 0x4c, 0x4a, 0x7e, 0x6d, 0x9f,
+			    0x79, 0x59, 0x33, 0x4f, 0xb5, 0x24, 0x5b, 0xd2,
+			    0xc7, 0x54, 0x52, 0x5d, 0x4c, 0x91, 0xdb, 0x95,
+			    0x02, 0x06, 0x92, 0x62, 0x34, 0xc1, 0xf6, 0x33 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0x78, 0xd3, 0x1d, 0xfa, 0x85, 0x44, 0x97, 0xd7,
+			     0x2d, 0x8d, 0xef, 0x8a, 0x1b, 0x7f, 0xb0, 0x06,
+			     0xce, 0xc2, 0xd8, 0xc4, 0x92, 0x46, 0x47, 0xc9,
+			     0x38, 0x14, 0xae, 0x56, 0xfa, 0xed, 0xa4, 0x95 },
+		.public = { 0xf1, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x78, 0x6c, 0xd5, 0x49, 0x96, 0xf0, 0x14, 0xa5,
+			    0xa0, 0x31, 0xec, 0x14, 0xdb, 0x81, 0x2e, 0xd0,
+			    0x83, 0x55, 0x06, 0x1f, 0xdb, 0x5d, 0xe6, 0x80,
+			    0xa8, 0x00, 0xac, 0x52, 0x1f, 0x31, 0x8e, 0x23 },
+		.valid = true
+	},
+	/* wycheproof - public key >= p */
+	{
+		.private = { 0xc0, 0x4c, 0x5b, 0xae, 0xfa, 0x83, 0x02, 0xdd,
+			     0xde, 0xd6, 0xa4, 0xbb, 0x95, 0x77, 0x61, 0xb4,
+			     0xeb, 0x97, 0xae, 0xfa, 0x4f, 0xc3, 0xb8, 0x04,
+			     0x30, 0x85, 0xf9, 0x6a, 0x56, 0x59, 0xb3, 0xa5 },
+		.public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff },
+		.result = { 0x29, 0xae, 0x8b, 0xc7, 0x3e, 0x9b, 0x10, 0xa0,
+			    0x8b, 0x4f, 0x68, 0x1c, 0x43, 0xc3, 0xe0, 0xac,
+			    0x1a, 0x17, 0x1d, 0x31, 0xb3, 0x8f, 0x1a, 0x48,
+			    0xef, 0xba, 0x29, 0xae, 0x63, 0x9e, 0xa1, 0x34 },
+		.valid = true
+	},
+	/* wycheproof - RFC 7748 */
+	{
+		.private = { 0xa0, 0x46, 0xe3, 0x6b, 0xf0, 0x52, 0x7c, 0x9d,
+			     0x3b, 0x16, 0x15, 0x4b, 0x82, 0x46, 0x5e, 0xdd,
+			     0x62, 0x14, 0x4c, 0x0a, 0xc1, 0xfc, 0x5a, 0x18,
+			     0x50, 0x6a, 0x22, 0x44, 0xba, 0x44, 0x9a, 0x44 },
+		.public = { 0xe6, 0xdb, 0x68, 0x67, 0x58, 0x30, 0x30, 0xdb,
+			    0x35, 0x94, 0xc1, 0xa4, 0x24, 0xb1, 0x5f, 0x7c,
+			    0x72, 0x66, 0x24, 0xec, 0x26, 0xb3, 0x35, 0x3b,
+			    0x10, 0xa9, 0x03, 0xa6, 0xd0, 0xab, 0x1c, 0x4c },
+		.result = { 0xc3, 0xda, 0x55, 0x37, 0x9d, 0xe9, 0xc6, 0x90,
+			    0x8e, 0x94, 0xea, 0x4d, 0xf2, 0x8d, 0x08, 0x4f,
+			    0x32, 0xec, 0xcf, 0x03, 0x49, 0x1c, 0x71, 0xf7,
+			    0x54, 0xb4, 0x07, 0x55, 0x77, 0xa2, 0x85, 0x52 },
+		.valid = true
+	},
+	/* wycheproof - RFC 7748 */
+	{
+		.private = { 0x48, 0x66, 0xe9, 0xd4, 0xd1, 0xb4, 0x67, 0x3c,
+			     0x5a, 0xd2, 0x26, 0x91, 0x95, 0x7d, 0x6a, 0xf5,
+			     0xc1, 0x1b, 0x64, 0x21, 0xe0, 0xea, 0x01, 0xd4,
+			     0x2c, 0xa4, 0x16, 0x9e, 0x79, 0x18, 0xba, 0x4d },
+		.public = { 0xe5, 0x21, 0x0f, 0x12, 0x78, 0x68, 0x11, 0xd3,
+			    0xf4, 0xb7, 0x95, 0x9d, 0x05, 0x38, 0xae, 0x2c,
+			    0x31, 0xdb, 0xe7, 0x10, 0x6f, 0xc0, 0x3c, 0x3e,
+			    0xfc, 0x4c, 0xd5, 0x49, 0xc7, 0x15, 0xa4, 0x13 },
+		.result = { 0x95, 0xcb, 0xde, 0x94, 0x76, 0xe8, 0x90, 0x7d,
+			    0x7a, 0xad, 0xe4, 0x5c, 0xb4, 0xb8, 0x73, 0xf8,
+			    0x8b, 0x59, 0x5a, 0x68, 0x79, 0x9f, 0xa1, 0x52,
+			    0xe6, 0xf8, 0xf7, 0x64, 0x7a, 0xac, 0x79, 0x57 },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x0a, 0xb4, 0xe7, 0x63, 0x80, 0xd8, 0x4d, 0xde,
+			    0x4f, 0x68, 0x33, 0xc5, 0x8f, 0x2a, 0x9f, 0xb8,
+			    0xf8, 0x3b, 0xb0, 0x16, 0x9b, 0x17, 0x2b, 0xe4,
+			    0xb6, 0xe0, 0x59, 0x28, 0x87, 0x74, 0x1a, 0x36 },
+		.result = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x89, 0xe1, 0x0d, 0x57, 0x01, 0xb4, 0x33, 0x7d,
+			    0x2d, 0x03, 0x21, 0x81, 0x53, 0x8b, 0x10, 0x64,
+			    0xbd, 0x40, 0x84, 0x40, 0x1c, 0xec, 0xa1, 0xfd,
+			    0x12, 0x66, 0x3a, 0x19, 0x59, 0x38, 0x80, 0x00 },
+		.result = { 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x2b, 0x55, 0xd3, 0xaa, 0x4a, 0x8f, 0x80, 0xc8,
+			    0xc0, 0xb2, 0xae, 0x5f, 0x93, 0x3e, 0x85, 0xaf,
+			    0x49, 0xbe, 0xac, 0x36, 0xc2, 0xfa, 0x73, 0x94,
+			    0xba, 0xb7, 0x6c, 0x89, 0x33, 0xf8, 0xf8, 0x1d },
+		.result = { 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x63, 0xe5, 0xb1, 0xfe, 0x96, 0x01, 0xfe, 0x84,
+			    0x38, 0x5d, 0x88, 0x66, 0xb0, 0x42, 0x12, 0x62,
+			    0xf7, 0x8f, 0xbf, 0xa5, 0xaf, 0xf9, 0x58, 0x5e,
+			    0x62, 0x66, 0x79, 0xb1, 0x85, 0x47, 0xd9, 0x59 },
+		.result = { 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0xe4, 0x28, 0xf3, 0xda, 0xc1, 0x78, 0x09, 0xf8,
+			    0x27, 0xa5, 0x22, 0xce, 0x32, 0x35, 0x50, 0x58,
+			    0xd0, 0x73, 0x69, 0x36, 0x4a, 0xa7, 0x89, 0x02,
+			    0xee, 0x10, 0x13, 0x9b, 0x9f, 0x9d, 0xd6, 0x53 },
+		.result = { 0xfc, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0xb3, 0xb5, 0x0e, 0x3e, 0xd3, 0xa4, 0x07, 0xb9,
+			    0x5d, 0xe9, 0x42, 0xef, 0x74, 0x57, 0x5b, 0x5a,
+			    0xb8, 0xa1, 0x0c, 0x09, 0xee, 0x10, 0x35, 0x44,
+			    0xd6, 0x0b, 0xdf, 0xed, 0x81, 0x38, 0xab, 0x2b },
+		.result = { 0xf9, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x21, 0x3f, 0xff, 0xe9, 0x3d, 0x5e, 0xa8, 0xcd,
+			    0x24, 0x2e, 0x46, 0x28, 0x44, 0x02, 0x99, 0x22,
+			    0xc4, 0x3c, 0x77, 0xc9, 0xe3, 0xe4, 0x2f, 0x56,
+			    0x2f, 0x48, 0x5d, 0x24, 0xc5, 0x01, 0xa2, 0x0b },
+		.result = { 0xf3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x91, 0xb2, 0x32, 0xa1, 0x78, 0xb3, 0xcd, 0x53,
+			    0x09, 0x32, 0x44, 0x1e, 0x61, 0x39, 0x41, 0x8f,
+			    0x72, 0x17, 0x22, 0x92, 0xf1, 0xda, 0x4c, 0x18,
+			    0x34, 0xfc, 0x5e, 0xbf, 0xef, 0xb5, 0x1e, 0x3f },
+		.result = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x03 },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x04, 0x5c, 0x6e, 0x11, 0xc5, 0xd3, 0x32, 0x55,
+			    0x6c, 0x78, 0x22, 0xfe, 0x94, 0xeb, 0xf8, 0x9b,
+			    0x56, 0xa3, 0x87, 0x8d, 0xc2, 0x7c, 0xa0, 0x79,
+			    0x10, 0x30, 0x58, 0x84, 0x9f, 0xab, 0xcb, 0x4f },
+		.result = { 0xe5, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x1c, 0xa2, 0x19, 0x0b, 0x71, 0x16, 0x35, 0x39,
+			    0x06, 0x3c, 0x35, 0x77, 0x3b, 0xda, 0x0c, 0x9c,
+			    0x92, 0x8e, 0x91, 0x36, 0xf0, 0x62, 0x0a, 0xeb,
+			    0x09, 0x3f, 0x09, 0x91, 0x97, 0xb7, 0xf7, 0x4e },
+		.result = { 0xe3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0xf7, 0x6e, 0x90, 0x10, 0xac, 0x33, 0xc5, 0x04,
+			    0x3b, 0x2d, 0x3b, 0x76, 0xa8, 0x42, 0x17, 0x10,
+			    0x00, 0xc4, 0x91, 0x62, 0x22, 0xe9, 0xe8, 0x58,
+			    0x97, 0xa0, 0xae, 0xc7, 0xf6, 0x35, 0x0b, 0x3c },
+		.result = { 0xdd, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0xbb, 0x72, 0x68, 0x8d, 0x8f, 0x8a, 0xa7, 0xa3,
+			    0x9c, 0xd6, 0x06, 0x0c, 0xd5, 0xc8, 0x09, 0x3c,
+			    0xde, 0xc6, 0xfe, 0x34, 0x19, 0x37, 0xc3, 0x88,
+			    0x6a, 0x99, 0x34, 0x6c, 0xd0, 0x7f, 0xaa, 0x55 },
+		.result = { 0xdb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			    0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x88, 0xfd, 0xde, 0xa1, 0x93, 0x39, 0x1c, 0x6a,
+			    0x59, 0x33, 0xef, 0x9b, 0x71, 0x90, 0x15, 0x49,
+			    0x44, 0x72, 0x05, 0xaa, 0xe9, 0xda, 0x92, 0x8a,
+			    0x6b, 0x91, 0xa3, 0x52, 0xba, 0x10, 0xf4, 0x1f },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02 },
+		.valid = true
+	},
+	/* wycheproof - edge case for shared secret */
+	{
+		.private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4,
+			     0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3,
+			     0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc,
+			     0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 },
+		.public = { 0x30, 0x3b, 0x39, 0x2f, 0x15, 0x31, 0x16, 0xca,
+			    0xd9, 0xcc, 0x68, 0x2a, 0x00, 0xcc, 0xc4, 0x4c,
+			    0x95, 0xff, 0x0d, 0x3b, 0xbe, 0x56, 0x8b, 0xeb,
+			    0x6c, 0x4e, 0x73, 0x9b, 0xaf, 0xdc, 0x2c, 0x68 },
+		.result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x00 },
+		.valid = true
+	},
+	/* wycheproof - checking for overflow */
+	{
+		.private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d,
+			     0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d,
+			     0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c,
+			     0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 },
+		.public = { 0xfd, 0x30, 0x0a, 0xeb, 0x40, 0xe1, 0xfa, 0x58,
+			    0x25, 0x18, 0x41, 0x2b, 0x49, 0xb2, 0x08, 0xa7,
+			    0x84, 0x2b, 0x1e, 0x1f, 0x05, 0x6a, 0x04, 0x01,
+			    0x78, 0xea, 0x41, 0x41, 0x53, 0x4f, 0x65, 0x2d },
+		.result = { 0xb7, 0x34, 0x10, 0x5d, 0xc2, 0x57, 0x58, 0x5d,
+			    0x73, 0xb5, 0x66, 0xcc, 0xb7, 0x6f, 0x06, 0x27,
+			    0x95, 0xcc, 0xbe, 0xc8, 0x91, 0x28, 0xe5, 0x2b,
+			    0x02, 0xf3, 0xe5, 0x96, 0x39, 0xf1, 0x3c, 0x46 },
+		.valid = true
+	},
+	/* wycheproof - checking for overflow */
+	{
+		.private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d,
+			     0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d,
+			     0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c,
+			     0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 },
+		.public = { 0xc8, 0xef, 0x79, 0xb5, 0x14, 0xd7, 0x68, 0x26,
+			    0x77, 0xbc, 0x79, 0x31, 0xe0, 0x6e, 0xe5, 0xc2,
+			    0x7c, 0x9b, 0x39, 0x2b, 0x4a, 0xe9, 0x48, 0x44,
+			    0x73, 0xf5, 0x54, 0xe6, 0x67, 0x8e, 0xcc, 0x2e },
+		.result = { 0x64, 0x7a, 0x46, 0xb6, 0xfc, 0x3f, 0x40, 0xd6,
+			    0x21, 0x41, 0xee, 0x3c, 0xee, 0x70, 0x6b, 0x4d,
+			    0x7a, 0x92, 0x71, 0x59, 0x3a, 0x7b, 0x14, 0x3e,
+			    0x8e, 0x2e, 0x22, 0x79, 0x88, 0x3e, 0x45, 0x50 },
+		.valid = true
+	},
+	/* wycheproof - checking for overflow */
+	{
+		.private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d,
+			     0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d,
+			     0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c,
+			     0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 },
+		.public = { 0x64, 0xae, 0xac, 0x25, 0x04, 0x14, 0x48, 0x61,
+			    0x53, 0x2b, 0x7b, 0xbc, 0xb6, 0xc8, 0x7d, 0x67,
+			    0xdd, 0x4c, 0x1f, 0x07, 0xeb, 0xc2, 0xe0, 0x6e,
+			    0xff, 0xb9, 0x5a, 0xec, 0xc6, 0x17, 0x0b, 0x2c },
+		.result = { 0x4f, 0xf0, 0x3d, 0x5f, 0xb4, 0x3c, 0xd8, 0x65,
+			    0x7a, 0x3c, 0xf3, 0x7c, 0x13, 0x8c, 0xad, 0xce,
+			    0xcc, 0xe5, 0x09, 0xe4, 0xeb, 0xa0, 0x89, 0xd0,
+			    0xef, 0x40, 0xb4, 0xe4, 0xfb, 0x94, 0x61, 0x55 },
+		.valid = true
+	},
+	/* wycheproof - checking for overflow */
+	{
+		.private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d,
+			     0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d,
+			     0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c,
+			     0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 },
+		.public = { 0xbf, 0x68, 0xe3, 0x5e, 0x9b, 0xdb, 0x7e, 0xee,
+			    0x1b, 0x50, 0x57, 0x02, 0x21, 0x86, 0x0f, 0x5d,
+			    0xcd, 0xad, 0x8a, 0xcb, 0xab, 0x03, 0x1b, 0x14,
+			    0x97, 0x4c, 0xc4, 0x90, 0x13, 0xc4, 0x98, 0x31 },
+		.result = { 0x21, 0xce, 0xe5, 0x2e, 0xfd, 0xbc, 0x81, 0x2e,
+			    0x1d, 0x02, 0x1a, 0x4a, 0xf1, 0xe1, 0xd8, 0xbc,
+			    0x4d, 0xb3, 0xc4, 0x00, 0xe4, 0xd2, 0xa2, 0xc5,
+			    0x6a, 0x39, 0x26, 0xdb, 0x4d, 0x99, 0xc6, 0x5b },
+		.valid = true
+	},
+	/* wycheproof - checking for overflow */
+	{
+		.private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d,
+			     0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d,
+			     0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c,
+			     0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 },
+		.public = { 0x53, 0x47, 0xc4, 0x91, 0x33, 0x1a, 0x64, 0xb4,
+			    0x3d, 0xdc, 0x68, 0x30, 0x34, 0xe6, 0x77, 0xf5,
+			    0x3d, 0xc3, 0x2b, 0x52, 0xa5, 0x2a, 0x57, 0x7c,
+			    0x15, 0xa8, 0x3b, 0xf2, 0x98, 0xe9, 0x9f, 0x19 },
+		.result = { 0x18, 0xcb, 0x89, 0xe4, 0xe2, 0x0c, 0x0c, 0x2b,
+			    0xd3, 0x24, 0x30, 0x52, 0x45, 0x26, 0x6c, 0x93,
+			    0x27, 0x69, 0x0b, 0xbe, 0x79, 0xac, 0xb8, 0x8f,
+			    0x5b, 0x8f, 0xb3, 0xf7, 0x4e, 0xca, 0x3e, 0x52 },
+		.valid = true
+	},
+	/* wycheproof - private key == -1 (mod order) */
+	{
+		.private = { 0xa0, 0x23, 0xcd, 0xd0, 0x83, 0xef, 0x5b, 0xb8,
+			     0x2f, 0x10, 0xd6, 0x2e, 0x59, 0xe1, 0x5a, 0x68,
+			     0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+			     0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x50 },
+		.public = { 0x25, 0x8e, 0x04, 0x52, 0x3b, 0x8d, 0x25, 0x3e,
+			    0xe6, 0x57, 0x19, 0xfc, 0x69, 0x06, 0xc6, 0x57,
+			    0x19, 0x2d, 0x80, 0x71, 0x7e, 0xdc, 0x82, 0x8f,
+			    0xa0, 0xaf, 0x21, 0x68, 0x6e, 0x2f, 0xaa, 0x75 },
+		.result = { 0x25, 0x8e, 0x04, 0x52, 0x3b, 0x8d, 0x25, 0x3e,
+			    0xe6, 0x57, 0x19, 0xfc, 0x69, 0x06, 0xc6, 0x57,
+			    0x19, 0x2d, 0x80, 0x71, 0x7e, 0xdc, 0x82, 0x8f,
+			    0xa0, 0xaf, 0x21, 0x68, 0x6e, 0x2f, 0xaa, 0x75 },
+		.valid = true
+	},
+	/* wycheproof - private key == 1 (mod order) on twist */
+	{
+		.private = { 0x58, 0x08, 0x3d, 0xd2, 0x61, 0xad, 0x91, 0xef,
+			     0xf9, 0x52, 0x32, 0x2e, 0xc8, 0x24, 0xc6, 0x82,
+			     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+			     0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x5f },
+		.public = { 0x2e, 0xae, 0x5e, 0xc3, 0xdd, 0x49, 0x4e, 0x9f,
+			    0x2d, 0x37, 0xd2, 0x58, 0xf8, 0x73, 0xa8, 0xe6,
+			    0xe9, 0xd0, 0xdb, 0xd1, 0xe3, 0x83, 0xef, 0x64,
+			    0xd9, 0x8b, 0xb9, 0x1b, 0x3e, 0x0b, 0xe0, 0x35 },
+		.result = { 0x2e, 0xae, 0x5e, 0xc3, 0xdd, 0x49, 0x4e, 0x9f,
+			    0x2d, 0x37, 0xd2, 0x58, 0xf8, 0x73, 0xa8, 0xe6,
+			    0xe9, 0xd0, 0xdb, 0xd1, 0xe3, 0x83, 0xef, 0x64,
+			    0xd9, 0x8b, 0xb9, 0x1b, 0x3e, 0x0b, 0xe0, 0x35 },
+		.valid = true
+	}
+};
+
+bool __init curve25519_selftest(void)
+{
+	bool success = true, ret, ret2;
+	size_t i = 0, j;
+	u8 in[CURVE25519_POINT_SIZE];
+	u8 out[CURVE25519_POINT_SIZE], out2[CURVE25519_POINT_SIZE];
+
+	for (i = 0; i < ARRAY_SIZE(curve25519_test_vectors); ++i) {
+		memset(out, 0, CURVE25519_POINT_SIZE);
+		ret = curve25519(out, curve25519_test_vectors[i].private,
+				 curve25519_test_vectors[i].public);
+		if (ret != curve25519_test_vectors[i].valid ||
+		    memcmp(out, curve25519_test_vectors[i].result,
+			   CURVE25519_POINT_SIZE)) {
+			pr_info("curve25519 self-test %zu: FAIL\n", i + 1);
+			success = false;
+			break;
+		}
+	}
+
+	for (i = 0; i < 5; ++i) {
+		get_random_bytes(in, sizeof(in));
+		ret = curve25519_generate_public(out, in);
+		ret2 = curve25519(out2, in, (u8[CURVE25519_POINT_SIZE]){ 9 });
+		if (ret != ret2 || memcmp(out, out2, CURVE25519_POINT_SIZE)) {
+			pr_info("curve25519 basepoint self-test %zu: FAIL: input - 0x",
+				i + 1);
+			for (j = CURVE25519_POINT_SIZE; j-- > 0;)
+				printk(KERN_CONT "%02x", in[j]);
+			printk(KERN_CONT "\n");
+			success = false;
+			break;
+		}
+	}
+
+	if (success)
+		pr_info("curve25519 self-tests: pass\n");
+	return success;
+}
+#endif
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 15/20] zinc: Curve25519 ARM implementation
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (13 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 14/20] zinc: Curve25519 generic C implementations and selftest Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 16/20] zinc: Curve25519 x86_64 implementation Jason A. Donenfeld
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Russell King, linux-arm-kernel

This comes from Dan Bernstein and Peter Schwabe's public domain NEON
code, and has been modified to be friendly for kernel space, as well as
removing some qhasm strangeness to be more idiomatic.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
---
 lib/zinc/Makefile                         |    4 +
 lib/zinc/curve25519/curve25519-arm-glue.h |   46 +
 lib/zinc/curve25519/curve25519-arm.S      | 2095 +++++++++++++++++++++
 3 files changed, 2145 insertions(+)
 create mode 100644 lib/zinc/curve25519/curve25519-arm-glue.h
 create mode 100644 lib/zinc/curve25519/curve25519-arm.S

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 25826d3eb74a..0a9d97146c70 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -53,6 +53,10 @@ endif
 
 ifeq ($(CONFIG_ZINC_CURVE25519),y)
 zinc-y += curve25519/curve25519.o
+ifeq ($(CONFIG_ZINC_ARCH_ARM)$(CONFIG_KERNEL_MODE_NEON),yy)
+zinc-y += curve25519/curve25519-arm.o
+CFLAGS_curve25519.o += -include $(srctree)/$(src)/curve25519/curve25519-arm-glue.h
+endif
 endif
 
 ifeq ($(CONFIG_ZINC_BLAKE2S),y)
diff --git a/lib/zinc/curve25519/curve25519-arm-glue.h b/lib/zinc/curve25519/curve25519-arm-glue.h
new file mode 100644
index 000000000000..36e8002e2477
--- /dev/null
+++ b/lib/zinc/curve25519/curve25519-arm-glue.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/curve25519.h>
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+
+#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && __LINUX_ARM_ARCH__ == 7
+#define ARM_USE_NEON
+asmlinkage void curve25519_neon(u8 mypublic[CURVE25519_POINT_SIZE],
+				const u8 secret[CURVE25519_POINT_SIZE],
+				const u8 basepoint[CURVE25519_POINT_SIZE]);
+#endif
+
+static bool curve25519_use_neon __ro_after_init;
+
+void __init curve25519_fpu_init(void)
+{
+	curve25519_use_neon = elf_hwcap & HWCAP_NEON;
+}
+
+static inline bool curve25519_arch(u8 mypublic[CURVE25519_POINT_SIZE],
+				   const u8 secret[CURVE25519_POINT_SIZE],
+				   const u8 basepoint[CURVE25519_POINT_SIZE])
+{
+#ifdef ARM_USE_NEON
+	if (curve25519_use_neon && may_use_simd()) {
+		kernel_neon_begin();
+		curve25519_neon(mypublic, secret, basepoint);
+		kernel_neon_end();
+		return true;
+	}
+#endif
+	return false;
+}
+
+static inline bool curve25519_base_arch(u8 pub[CURVE25519_POINT_SIZE],
+					const u8 secret[CURVE25519_POINT_SIZE])
+{
+	return false;
+}
+
+#define HAVE_CURVE25519_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/curve25519/curve25519-arm.S b/lib/zinc/curve25519/curve25519-arm.S
new file mode 100644
index 000000000000..ad6690b8ffd7
--- /dev/null
+++ b/lib/zinc/curve25519/curve25519-arm.S
@@ -0,0 +1,2095 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * Based on public domain code from Daniel J. Bernstein and Peter Schwabe. This
+ * has been built from SUPERCOP's curve25519/neon2/scalarmult.pq using qhasm,
+ * but has subsequently been manually reworked for use in kernel space.
+ */
+
+#if __LINUX_ARM_ARCH__ == 7
+#include <linux/linkage.h>
+
+.text
+.fpu neon
+.align 4
+
+ENTRY(curve25519_neon)
+	push		{r4-r11, lr}
+	mov		ip, sp
+	sub		r3, sp, #704
+	and		r3, r3, #0xfffffff0
+	mov		sp, r3
+	movw		r4, #0
+	movw		r5, #254
+	vmov.i32	q0, #1
+	vshr.u64	q1, q0, #7
+	vshr.u64	q0, q0, #8
+	vmov.i32	d4, #19
+	vmov.i32	d5, #38
+	add		r6, sp, #480
+	vst1.8		{d2-d3}, [r6, : 128]
+	add		r6, sp, #496
+	vst1.8		{d0-d1}, [r6, : 128]
+	add		r6, sp, #512
+	vst1.8		{d4-d5}, [r6, : 128]
+	add		r6, r3, #0
+	vmov.i32	q2, #0
+	vst1.8		{d4-d5}, [r6, : 128]!
+	vst1.8		{d4-d5}, [r6, : 128]!
+	vst1.8		d4, [r6, : 64]
+	add		r6, r3, #0
+	movw		r7, #960
+	sub		r7, r7, #2
+	neg		r7, r7
+	sub		r7, r7, r7, LSL #7
+	str		r7, [r6]
+	add		r6, sp, #672
+	vld1.8		{d4-d5}, [r1]!
+	vld1.8		{d6-d7}, [r1]
+	vst1.8		{d4-d5}, [r6, : 128]!
+	vst1.8		{d6-d7}, [r6, : 128]
+	sub		r1, r6, #16
+	ldrb		r6, [r1]
+	and		r6, r6, #248
+	strb		r6, [r1]
+	ldrb		r6, [r1, #31]
+	and		r6, r6, #127
+	orr		r6, r6, #64
+	strb		r6, [r1, #31]
+	vmov.i64	q2, #0xffffffff
+	vshr.u64	q3, q2, #7
+	vshr.u64	q2, q2, #6
+	vld1.8		{d8}, [r2]
+	vld1.8		{d10}, [r2]
+	add		r2, r2, #6
+	vld1.8		{d12}, [r2]
+	vld1.8		{d14}, [r2]
+	add		r2, r2, #6
+	vld1.8		{d16}, [r2]
+	add		r2, r2, #4
+	vld1.8		{d18}, [r2]
+	vld1.8		{d20}, [r2]
+	add		r2, r2, #6
+	vld1.8		{d22}, [r2]
+	add		r2, r2, #2
+	vld1.8		{d24}, [r2]
+	vld1.8		{d26}, [r2]
+	vshr.u64	q5, q5, #26
+	vshr.u64	q6, q6, #3
+	vshr.u64	q7, q7, #29
+	vshr.u64	q8, q8, #6
+	vshr.u64	q10, q10, #25
+	vshr.u64	q11, q11, #3
+	vshr.u64	q12, q12, #12
+	vshr.u64	q13, q13, #38
+	vand		q4, q4, q2
+	vand		q6, q6, q2
+	vand		q8, q8, q2
+	vand		q10, q10, q2
+	vand		q2, q12, q2
+	vand		q5, q5, q3
+	vand		q7, q7, q3
+	vand		q9, q9, q3
+	vand		q11, q11, q3
+	vand		q3, q13, q3
+	add		r2, r3, #48
+	vadd.i64	q12, q4, q1
+	vadd.i64	q13, q10, q1
+	vshr.s64	q12, q12, #26
+	vshr.s64	q13, q13, #26
+	vadd.i64	q5, q5, q12
+	vshl.i64	q12, q12, #26
+	vadd.i64	q14, q5, q0
+	vadd.i64	q11, q11, q13
+	vshl.i64	q13, q13, #26
+	vadd.i64	q15, q11, q0
+	vsub.i64	q4, q4, q12
+	vshr.s64	q12, q14, #25
+	vsub.i64	q10, q10, q13
+	vshr.s64	q13, q15, #25
+	vadd.i64	q6, q6, q12
+	vshl.i64	q12, q12, #25
+	vadd.i64	q14, q6, q1
+	vadd.i64	q2, q2, q13
+	vsub.i64	q5, q5, q12
+	vshr.s64	q12, q14, #26
+	vshl.i64	q13, q13, #25
+	vadd.i64	q14, q2, q1
+	vadd.i64	q7, q7, q12
+	vshl.i64	q12, q12, #26
+	vadd.i64	q15, q7, q0
+	vsub.i64	q11, q11, q13
+	vshr.s64	q13, q14, #26
+	vsub.i64	q6, q6, q12
+	vshr.s64	q12, q15, #25
+	vadd.i64	q3, q3, q13
+	vshl.i64	q13, q13, #26
+	vadd.i64	q14, q3, q0
+	vadd.i64	q8, q8, q12
+	vshl.i64	q12, q12, #25
+	vadd.i64	q15, q8, q1
+	add		r2, r2, #8
+	vsub.i64	q2, q2, q13
+	vshr.s64	q13, q14, #25
+	vsub.i64	q7, q7, q12
+	vshr.s64	q12, q15, #26
+	vadd.i64	q14, q13, q13
+	vadd.i64	q9, q9, q12
+	vtrn.32		d12, d14
+	vshl.i64	q12, q12, #26
+	vtrn.32		d13, d15
+	vadd.i64	q0, q9, q0
+	vadd.i64	q4, q4, q14
+	vst1.8		d12, [r2, : 64]!
+	vshl.i64	q6, q13, #4
+	vsub.i64	q7, q8, q12
+	vshr.s64	q0, q0, #25
+	vadd.i64	q4, q4, q6
+	vadd.i64	q6, q10, q0
+	vshl.i64	q0, q0, #25
+	vadd.i64	q8, q6, q1
+	vadd.i64	q4, q4, q13
+	vshl.i64	q10, q13, #25
+	vadd.i64	q1, q4, q1
+	vsub.i64	q0, q9, q0
+	vshr.s64	q8, q8, #26
+	vsub.i64	q3, q3, q10
+	vtrn.32		d14, d0
+	vshr.s64	q1, q1, #26
+	vtrn.32		d15, d1
+	vadd.i64	q0, q11, q8
+	vst1.8		d14, [r2, : 64]
+	vshl.i64	q7, q8, #26
+	vadd.i64	q5, q5, q1
+	vtrn.32		d4, d6
+	vshl.i64	q1, q1, #26
+	vtrn.32		d5, d7
+	vsub.i64	q3, q6, q7
+	add		r2, r2, #16
+	vsub.i64	q1, q4, q1
+	vst1.8		d4, [r2, : 64]
+	vtrn.32		d6, d0
+	vtrn.32		d7, d1
+	sub		r2, r2, #8
+	vtrn.32		d2, d10
+	vtrn.32		d3, d11
+	vst1.8		d6, [r2, : 64]
+	sub		r2, r2, #24
+	vst1.8		d2, [r2, : 64]
+	add		r2, r3, #96
+	vmov.i32	q0, #0
+	vmov.i64	d2, #0xff
+	vmov.i64	d3, #0
+	vshr.u32	q1, q1, #7
+	vst1.8		{d2-d3}, [r2, : 128]!
+	vst1.8		{d0-d1}, [r2, : 128]!
+	vst1.8		d0, [r2, : 64]
+	add		r2, r3, #144
+	vmov.i32	q0, #0
+	vst1.8		{d0-d1}, [r2, : 128]!
+	vst1.8		{d0-d1}, [r2, : 128]!
+	vst1.8		d0, [r2, : 64]
+	add		r2, r3, #240
+	vmov.i32	q0, #0
+	vmov.i64	d2, #0xff
+	vmov.i64	d3, #0
+	vshr.u32	q1, q1, #7
+	vst1.8		{d2-d3}, [r2, : 128]!
+	vst1.8		{d0-d1}, [r2, : 128]!
+	vst1.8		d0, [r2, : 64]
+	add		r2, r3, #48
+	add		r6, r3, #192
+	vld1.8		{d0-d1}, [r2, : 128]!
+	vld1.8		{d2-d3}, [r2, : 128]!
+	vld1.8		{d4}, [r2, : 64]
+	vst1.8		{d0-d1}, [r6, : 128]!
+	vst1.8		{d2-d3}, [r6, : 128]!
+	vst1.8		d4, [r6, : 64]
+.Lmainloop:
+	mov		r2, r5, LSR #3
+	and		r6, r5, #7
+	ldrb		r2, [r1, r2]
+	mov		r2, r2, LSR r6
+	and		r2, r2, #1
+	str		r5, [sp, #456]
+	eor		r4, r4, r2
+	str		r2, [sp, #460]
+	neg		r2, r4
+	add		r4, r3, #96
+	add		r5, r3, #192
+	add		r6, r3, #144
+	vld1.8		{d8-d9}, [r4, : 128]!
+	add		r7, r3, #240
+	vld1.8		{d10-d11}, [r5, : 128]!
+	veor		q6, q4, q5
+	vld1.8		{d14-d15}, [r6, : 128]!
+	vdup.i32	q8, r2
+	vld1.8		{d18-d19}, [r7, : 128]!
+	veor		q10, q7, q9
+	vld1.8		{d22-d23}, [r4, : 128]!
+	vand		q6, q6, q8
+	vld1.8		{d24-d25}, [r5, : 128]!
+	vand		q10, q10, q8
+	vld1.8		{d26-d27}, [r6, : 128]!
+	veor		q4, q4, q6
+	vld1.8		{d28-d29}, [r7, : 128]!
+	veor		q5, q5, q6
+	vld1.8		{d0}, [r4, : 64]
+	veor		q6, q7, q10
+	vld1.8		{d2}, [r5, : 64]
+	veor		q7, q9, q10
+	vld1.8		{d4}, [r6, : 64]
+	veor		q9, q11, q12
+	vld1.8		{d6}, [r7, : 64]
+	veor		q10, q0, q1
+	sub		r2, r4, #32
+	vand		q9, q9, q8
+	sub		r4, r5, #32
+	vand		q10, q10, q8
+	sub		r5, r6, #32
+	veor		q11, q11, q9
+	sub		r6, r7, #32
+	veor		q0, q0, q10
+	veor		q9, q12, q9
+	veor		q1, q1, q10
+	veor		q10, q13, q14
+	veor		q12, q2, q3
+	vand		q10, q10, q8
+	vand		q8, q12, q8
+	veor		q12, q13, q10
+	veor		q2, q2, q8
+	veor		q10, q14, q10
+	veor		q3, q3, q8
+	vadd.i32	q8, q4, q6
+	vsub.i32	q4, q4, q6
+	vst1.8		{d16-d17}, [r2, : 128]!
+	vadd.i32	q6, q11, q12
+	vst1.8		{d8-d9}, [r5, : 128]!
+	vsub.i32	q4, q11, q12
+	vst1.8		{d12-d13}, [r2, : 128]!
+	vadd.i32	q6, q0, q2
+	vst1.8		{d8-d9}, [r5, : 128]!
+	vsub.i32	q0, q0, q2
+	vst1.8		d12, [r2, : 64]
+	vadd.i32	q2, q5, q7
+	vst1.8		d0, [r5, : 64]
+	vsub.i32	q0, q5, q7
+	vst1.8		{d4-d5}, [r4, : 128]!
+	vadd.i32	q2, q9, q10
+	vst1.8		{d0-d1}, [r6, : 128]!
+	vsub.i32	q0, q9, q10
+	vst1.8		{d4-d5}, [r4, : 128]!
+	vadd.i32	q2, q1, q3
+	vst1.8		{d0-d1}, [r6, : 128]!
+	vsub.i32	q0, q1, q3
+	vst1.8		d4, [r4, : 64]
+	vst1.8		d0, [r6, : 64]
+	add		r2, sp, #512
+	add		r4, r3, #96
+	add		r5, r3, #144
+	vld1.8		{d0-d1}, [r2, : 128]
+	vld1.8		{d2-d3}, [r4, : 128]!
+	vld1.8		{d4-d5}, [r5, : 128]!
+	vzip.i32	q1, q2
+	vld1.8		{d6-d7}, [r4, : 128]!
+	vld1.8		{d8-d9}, [r5, : 128]!
+	vshl.i32	q5, q1, #1
+	vzip.i32	q3, q4
+	vshl.i32	q6, q2, #1
+	vld1.8		{d14}, [r4, : 64]
+	vshl.i32	q8, q3, #1
+	vld1.8		{d15}, [r5, : 64]
+	vshl.i32	q9, q4, #1
+	vmul.i32	d21, d7, d1
+	vtrn.32		d14, d15
+	vmul.i32	q11, q4, q0
+	vmul.i32	q0, q7, q0
+	vmull.s32	q12, d2, d2
+	vmlal.s32	q12, d11, d1
+	vmlal.s32	q12, d12, d0
+	vmlal.s32	q12, d13, d23
+	vmlal.s32	q12, d16, d22
+	vmlal.s32	q12, d7, d21
+	vmull.s32	q10, d2, d11
+	vmlal.s32	q10, d4, d1
+	vmlal.s32	q10, d13, d0
+	vmlal.s32	q10, d6, d23
+	vmlal.s32	q10, d17, d22
+	vmull.s32	q13, d10, d4
+	vmlal.s32	q13, d11, d3
+	vmlal.s32	q13, d13, d1
+	vmlal.s32	q13, d16, d0
+	vmlal.s32	q13, d17, d23
+	vmlal.s32	q13, d8, d22
+	vmull.s32	q1, d10, d5
+	vmlal.s32	q1, d11, d4
+	vmlal.s32	q1, d6, d1
+	vmlal.s32	q1, d17, d0
+	vmlal.s32	q1, d8, d23
+	vmull.s32	q14, d10, d6
+	vmlal.s32	q14, d11, d13
+	vmlal.s32	q14, d4, d4
+	vmlal.s32	q14, d17, d1
+	vmlal.s32	q14, d18, d0
+	vmlal.s32	q14, d9, d23
+	vmull.s32	q11, d10, d7
+	vmlal.s32	q11, d11, d6
+	vmlal.s32	q11, d12, d5
+	vmlal.s32	q11, d8, d1
+	vmlal.s32	q11, d19, d0
+	vmull.s32	q15, d10, d8
+	vmlal.s32	q15, d11, d17
+	vmlal.s32	q15, d12, d6
+	vmlal.s32	q15, d13, d5
+	vmlal.s32	q15, d19, d1
+	vmlal.s32	q15, d14, d0
+	vmull.s32	q2, d10, d9
+	vmlal.s32	q2, d11, d8
+	vmlal.s32	q2, d12, d7
+	vmlal.s32	q2, d13, d6
+	vmlal.s32	q2, d14, d1
+	vmull.s32	q0, d15, d1
+	vmlal.s32	q0, d10, d14
+	vmlal.s32	q0, d11, d19
+	vmlal.s32	q0, d12, d8
+	vmlal.s32	q0, d13, d17
+	vmlal.s32	q0, d6, d6
+	add		r2, sp, #480
+	vld1.8		{d18-d19}, [r2, : 128]
+	vmull.s32	q3, d16, d7
+	vmlal.s32	q3, d10, d15
+	vmlal.s32	q3, d11, d14
+	vmlal.s32	q3, d12, d9
+	vmlal.s32	q3, d13, d8
+	add		r2, sp, #496
+	vld1.8		{d8-d9}, [r2, : 128]
+	vadd.i64	q5, q12, q9
+	vadd.i64	q6, q15, q9
+	vshr.s64	q5, q5, #26
+	vshr.s64	q6, q6, #26
+	vadd.i64	q7, q10, q5
+	vshl.i64	q5, q5, #26
+	vadd.i64	q8, q7, q4
+	vadd.i64	q2, q2, q6
+	vshl.i64	q6, q6, #26
+	vadd.i64	q10, q2, q4
+	vsub.i64	q5, q12, q5
+	vshr.s64	q8, q8, #25
+	vsub.i64	q6, q15, q6
+	vshr.s64	q10, q10, #25
+	vadd.i64	q12, q13, q8
+	vshl.i64	q8, q8, #25
+	vadd.i64	q13, q12, q9
+	vadd.i64	q0, q0, q10
+	vsub.i64	q7, q7, q8
+	vshr.s64	q8, q13, #26
+	vshl.i64	q10, q10, #25
+	vadd.i64	q13, q0, q9
+	vadd.i64	q1, q1, q8
+	vshl.i64	q8, q8, #26
+	vadd.i64	q15, q1, q4
+	vsub.i64	q2, q2, q10
+	vshr.s64	q10, q13, #26
+	vsub.i64	q8, q12, q8
+	vshr.s64	q12, q15, #25
+	vadd.i64	q3, q3, q10
+	vshl.i64	q10, q10, #26
+	vadd.i64	q13, q3, q4
+	vadd.i64	q14, q14, q12
+	add		r2, r3, #288
+	vshl.i64	q12, q12, #25
+	add		r4, r3, #336
+	vadd.i64	q15, q14, q9
+	add		r2, r2, #8
+	vsub.i64	q0, q0, q10
+	add		r4, r4, #8
+	vshr.s64	q10, q13, #25
+	vsub.i64	q1, q1, q12
+	vshr.s64	q12, q15, #26
+	vadd.i64	q13, q10, q10
+	vadd.i64	q11, q11, q12
+	vtrn.32		d16, d2
+	vshl.i64	q12, q12, #26
+	vtrn.32		d17, d3
+	vadd.i64	q1, q11, q4
+	vadd.i64	q4, q5, q13
+	vst1.8		d16, [r2, : 64]!
+	vshl.i64	q5, q10, #4
+	vst1.8		d17, [r4, : 64]!
+	vsub.i64	q8, q14, q12
+	vshr.s64	q1, q1, #25
+	vadd.i64	q4, q4, q5
+	vadd.i64	q5, q6, q1
+	vshl.i64	q1, q1, #25
+	vadd.i64	q6, q5, q9
+	vadd.i64	q4, q4, q10
+	vshl.i64	q10, q10, #25
+	vadd.i64	q9, q4, q9
+	vsub.i64	q1, q11, q1
+	vshr.s64	q6, q6, #26
+	vsub.i64	q3, q3, q10
+	vtrn.32		d16, d2
+	vshr.s64	q9, q9, #26
+	vtrn.32		d17, d3
+	vadd.i64	q1, q2, q6
+	vst1.8		d16, [r2, : 64]
+	vshl.i64	q2, q6, #26
+	vst1.8		d17, [r4, : 64]
+	vadd.i64	q6, q7, q9
+	vtrn.32		d0, d6
+	vshl.i64	q7, q9, #26
+	vtrn.32		d1, d7
+	vsub.i64	q2, q5, q2
+	add		r2, r2, #16
+	vsub.i64	q3, q4, q7
+	vst1.8		d0, [r2, : 64]
+	add		r4, r4, #16
+	vst1.8		d1, [r4, : 64]
+	vtrn.32		d4, d2
+	vtrn.32		d5, d3
+	sub		r2, r2, #8
+	sub		r4, r4, #8
+	vtrn.32		d6, d12
+	vtrn.32		d7, d13
+	vst1.8		d4, [r2, : 64]
+	vst1.8		d5, [r4, : 64]
+	sub		r2, r2, #24
+	sub		r4, r4, #24
+	vst1.8		d6, [r2, : 64]
+	vst1.8		d7, [r4, : 64]
+	add		r2, r3, #240
+	add		r4, r3, #96
+	vld1.8		{d0-d1}, [r4, : 128]!
+	vld1.8		{d2-d3}, [r4, : 128]!
+	vld1.8		{d4}, [r4, : 64]
+	add		r4, r3, #144
+	vld1.8		{d6-d7}, [r4, : 128]!
+	vtrn.32		q0, q3
+	vld1.8		{d8-d9}, [r4, : 128]!
+	vshl.i32	q5, q0, #4
+	vtrn.32		q1, q4
+	vshl.i32	q6, q3, #4
+	vadd.i32	q5, q5, q0
+	vadd.i32	q6, q6, q3
+	vshl.i32	q7, q1, #4
+	vld1.8		{d5}, [r4, : 64]
+	vshl.i32	q8, q4, #4
+	vtrn.32		d4, d5
+	vadd.i32	q7, q7, q1
+	vadd.i32	q8, q8, q4
+	vld1.8		{d18-d19}, [r2, : 128]!
+	vshl.i32	q10, q2, #4
+	vld1.8		{d22-d23}, [r2, : 128]!
+	vadd.i32	q10, q10, q2
+	vld1.8		{d24}, [r2, : 64]
+	vadd.i32	q5, q5, q0
+	add		r2, r3, #192
+	vld1.8		{d26-d27}, [r2, : 128]!
+	vadd.i32	q6, q6, q3
+	vld1.8		{d28-d29}, [r2, : 128]!
+	vadd.i32	q8, q8, q4
+	vld1.8		{d25}, [r2, : 64]
+	vadd.i32	q10, q10, q2
+	vtrn.32		q9, q13
+	vadd.i32	q7, q7, q1
+	vadd.i32	q5, q5, q0
+	vtrn.32		q11, q14
+	vadd.i32	q6, q6, q3
+	add		r2, sp, #528
+	vadd.i32	q10, q10, q2
+	vtrn.32		d24, d25
+	vst1.8		{d12-d13}, [r2, : 128]
+	vshl.i32	q6, q13, #1
+	add		r2, sp, #544
+	vst1.8		{d20-d21}, [r2, : 128]
+	vshl.i32	q10, q14, #1
+	add		r2, sp, #560
+	vst1.8		{d12-d13}, [r2, : 128]
+	vshl.i32	q15, q12, #1
+	vadd.i32	q8, q8, q4
+	vext.32		d10, d31, d30, #0
+	vadd.i32	q7, q7, q1
+	add		r2, sp, #576
+	vst1.8		{d16-d17}, [r2, : 128]
+	vmull.s32	q8, d18, d5
+	vmlal.s32	q8, d26, d4
+	vmlal.s32	q8, d19, d9
+	vmlal.s32	q8, d27, d3
+	vmlal.s32	q8, d22, d8
+	vmlal.s32	q8, d28, d2
+	vmlal.s32	q8, d23, d7
+	vmlal.s32	q8, d29, d1
+	vmlal.s32	q8, d24, d6
+	vmlal.s32	q8, d25, d0
+	add		r2, sp, #592
+	vst1.8		{d14-d15}, [r2, : 128]
+	vmull.s32	q2, d18, d4
+	vmlal.s32	q2, d12, d9
+	vmlal.s32	q2, d13, d8
+	vmlal.s32	q2, d19, d3
+	vmlal.s32	q2, d22, d2
+	vmlal.s32	q2, d23, d1
+	vmlal.s32	q2, d24, d0
+	add		r2, sp, #608
+	vst1.8		{d20-d21}, [r2, : 128]
+	vmull.s32	q7, d18, d9
+	vmlal.s32	q7, d26, d3
+	vmlal.s32	q7, d19, d8
+	vmlal.s32	q7, d27, d2
+	vmlal.s32	q7, d22, d7
+	vmlal.s32	q7, d28, d1
+	vmlal.s32	q7, d23, d6
+	vmlal.s32	q7, d29, d0
+	add		r2, sp, #624
+	vst1.8		{d10-d11}, [r2, : 128]
+	vmull.s32	q5, d18, d3
+	vmlal.s32	q5, d19, d2
+	vmlal.s32	q5, d22, d1
+	vmlal.s32	q5, d23, d0
+	vmlal.s32	q5, d12, d8
+	add		r2, sp, #640
+	vst1.8		{d16-d17}, [r2, : 128]
+	vmull.s32	q4, d18, d8
+	vmlal.s32	q4, d26, d2
+	vmlal.s32	q4, d19, d7
+	vmlal.s32	q4, d27, d1
+	vmlal.s32	q4, d22, d6
+	vmlal.s32	q4, d28, d0
+	vmull.s32	q8, d18, d7
+	vmlal.s32	q8, d26, d1
+	vmlal.s32	q8, d19, d6
+	vmlal.s32	q8, d27, d0
+	add		r2, sp, #544
+	vld1.8		{d20-d21}, [r2, : 128]
+	vmlal.s32	q7, d24, d21
+	vmlal.s32	q7, d25, d20
+	vmlal.s32	q4, d23, d21
+	vmlal.s32	q4, d29, d20
+	vmlal.s32	q8, d22, d21
+	vmlal.s32	q8, d28, d20
+	vmlal.s32	q5, d24, d20
+	add		r2, sp, #544
+	vst1.8		{d14-d15}, [r2, : 128]
+	vmull.s32	q7, d18, d6
+	vmlal.s32	q7, d26, d0
+	add		r2, sp, #624
+	vld1.8		{d30-d31}, [r2, : 128]
+	vmlal.s32	q2, d30, d21
+	vmlal.s32	q7, d19, d21
+	vmlal.s32	q7, d27, d20
+	add		r2, sp, #592
+	vld1.8		{d26-d27}, [r2, : 128]
+	vmlal.s32	q4, d25, d27
+	vmlal.s32	q8, d29, d27
+	vmlal.s32	q8, d25, d26
+	vmlal.s32	q7, d28, d27
+	vmlal.s32	q7, d29, d26
+	add		r2, sp, #576
+	vld1.8		{d28-d29}, [r2, : 128]
+	vmlal.s32	q4, d24, d29
+	vmlal.s32	q8, d23, d29
+	vmlal.s32	q8, d24, d28
+	vmlal.s32	q7, d22, d29
+	vmlal.s32	q7, d23, d28
+	add		r2, sp, #576
+	vst1.8		{d8-d9}, [r2, : 128]
+	add		r2, sp, #528
+	vld1.8		{d8-d9}, [r2, : 128]
+	vmlal.s32	q7, d24, d9
+	vmlal.s32	q7, d25, d31
+	vmull.s32	q1, d18, d2
+	vmlal.s32	q1, d19, d1
+	vmlal.s32	q1, d22, d0
+	vmlal.s32	q1, d24, d27
+	vmlal.s32	q1, d23, d20
+	vmlal.s32	q1, d12, d7
+	vmlal.s32	q1, d13, d6
+	vmull.s32	q6, d18, d1
+	vmlal.s32	q6, d19, d0
+	vmlal.s32	q6, d23, d27
+	vmlal.s32	q6, d22, d20
+	vmlal.s32	q6, d24, d26
+	vmull.s32	q0, d18, d0
+	vmlal.s32	q0, d22, d27
+	vmlal.s32	q0, d23, d26
+	vmlal.s32	q0, d24, d31
+	vmlal.s32	q0, d19, d20
+	add		r2, sp, #608
+	vld1.8		{d18-d19}, [r2, : 128]
+	vmlal.s32	q2, d18, d7
+	vmlal.s32	q2, d19, d6
+	vmlal.s32	q5, d18, d6
+	vmlal.s32	q5, d19, d21
+	vmlal.s32	q1, d18, d21
+	vmlal.s32	q1, d19, d29
+	vmlal.s32	q0, d18, d28
+	vmlal.s32	q0, d19, d9
+	vmlal.s32	q6, d18, d29
+	vmlal.s32	q6, d19, d28
+	add		r2, sp, #560
+	vld1.8		{d18-d19}, [r2, : 128]
+	add		r2, sp, #480
+	vld1.8		{d22-d23}, [r2, : 128]
+	vmlal.s32	q5, d19, d7
+	vmlal.s32	q0, d18, d21
+	vmlal.s32	q0, d19, d29
+	vmlal.s32	q6, d18, d6
+	add		r2, sp, #496
+	vld1.8		{d6-d7}, [r2, : 128]
+	vmlal.s32	q6, d19, d21
+	add		r2, sp, #544
+	vld1.8		{d18-d19}, [r2, : 128]
+	vmlal.s32	q0, d30, d8
+	add		r2, sp, #640
+	vld1.8		{d20-d21}, [r2, : 128]
+	vmlal.s32	q5, d30, d29
+	add		r2, sp, #576
+	vld1.8		{d24-d25}, [r2, : 128]
+	vmlal.s32	q1, d30, d28
+	vadd.i64	q13, q0, q11
+	vadd.i64	q14, q5, q11
+	vmlal.s32	q6, d30, d9
+	vshr.s64	q4, q13, #26
+	vshr.s64	q13, q14, #26
+	vadd.i64	q7, q7, q4
+	vshl.i64	q4, q4, #26
+	vadd.i64	q14, q7, q3
+	vadd.i64	q9, q9, q13
+	vshl.i64	q13, q13, #26
+	vadd.i64	q15, q9, q3
+	vsub.i64	q0, q0, q4
+	vshr.s64	q4, q14, #25
+	vsub.i64	q5, q5, q13
+	vshr.s64	q13, q15, #25
+	vadd.i64	q6, q6, q4
+	vshl.i64	q4, q4, #25
+	vadd.i64	q14, q6, q11
+	vadd.i64	q2, q2, q13
+	vsub.i64	q4, q7, q4
+	vshr.s64	q7, q14, #26
+	vshl.i64	q13, q13, #25
+	vadd.i64	q14, q2, q11
+	vadd.i64	q8, q8, q7
+	vshl.i64	q7, q7, #26
+	vadd.i64	q15, q8, q3
+	vsub.i64	q9, q9, q13
+	vshr.s64	q13, q14, #26
+	vsub.i64	q6, q6, q7
+	vshr.s64	q7, q15, #25
+	vadd.i64	q10, q10, q13
+	vshl.i64	q13, q13, #26
+	vadd.i64	q14, q10, q3
+	vadd.i64	q1, q1, q7
+	add		r2, r3, #144
+	vshl.i64	q7, q7, #25
+	add		r4, r3, #96
+	vadd.i64	q15, q1, q11
+	add		r2, r2, #8
+	vsub.i64	q2, q2, q13
+	add		r4, r4, #8
+	vshr.s64	q13, q14, #25
+	vsub.i64	q7, q8, q7
+	vshr.s64	q8, q15, #26
+	vadd.i64	q14, q13, q13
+	vadd.i64	q12, q12, q8
+	vtrn.32		d12, d14
+	vshl.i64	q8, q8, #26
+	vtrn.32		d13, d15
+	vadd.i64	q3, q12, q3
+	vadd.i64	q0, q0, q14
+	vst1.8		d12, [r2, : 64]!
+	vshl.i64	q7, q13, #4
+	vst1.8		d13, [r4, : 64]!
+	vsub.i64	q1, q1, q8
+	vshr.s64	q3, q3, #25
+	vadd.i64	q0, q0, q7
+	vadd.i64	q5, q5, q3
+	vshl.i64	q3, q3, #25
+	vadd.i64	q6, q5, q11
+	vadd.i64	q0, q0, q13
+	vshl.i64	q7, q13, #25
+	vadd.i64	q8, q0, q11
+	vsub.i64	q3, q12, q3
+	vshr.s64	q6, q6, #26
+	vsub.i64	q7, q10, q7
+	vtrn.32		d2, d6
+	vshr.s64	q8, q8, #26
+	vtrn.32		d3, d7
+	vadd.i64	q3, q9, q6
+	vst1.8		d2, [r2, : 64]
+	vshl.i64	q6, q6, #26
+	vst1.8		d3, [r4, : 64]
+	vadd.i64	q1, q4, q8
+	vtrn.32		d4, d14
+	vshl.i64	q4, q8, #26
+	vtrn.32		d5, d15
+	vsub.i64	q5, q5, q6
+	add		r2, r2, #16
+	vsub.i64	q0, q0, q4
+	vst1.8		d4, [r2, : 64]
+	add		r4, r4, #16
+	vst1.8		d5, [r4, : 64]
+	vtrn.32		d10, d6
+	vtrn.32		d11, d7
+	sub		r2, r2, #8
+	sub		r4, r4, #8
+	vtrn.32		d0, d2
+	vtrn.32		d1, d3
+	vst1.8		d10, [r2, : 64]
+	vst1.8		d11, [r4, : 64]
+	sub		r2, r2, #24
+	sub		r4, r4, #24
+	vst1.8		d0, [r2, : 64]
+	vst1.8		d1, [r4, : 64]
+	add		r2, r3, #288
+	add		r4, r3, #336
+	vld1.8		{d0-d1}, [r2, : 128]!
+	vld1.8		{d2-d3}, [r4, : 128]!
+	vsub.i32	q0, q0, q1
+	vld1.8		{d2-d3}, [r2, : 128]!
+	vld1.8		{d4-d5}, [r4, : 128]!
+	vsub.i32	q1, q1, q2
+	add		r5, r3, #240
+	vld1.8		{d4}, [r2, : 64]
+	vld1.8		{d6}, [r4, : 64]
+	vsub.i32	q2, q2, q3
+	vst1.8		{d0-d1}, [r5, : 128]!
+	vst1.8		{d2-d3}, [r5, : 128]!
+	vst1.8		d4, [r5, : 64]
+	add		r2, r3, #144
+	add		r4, r3, #96
+	add		r5, r3, #144
+	add		r6, r3, #192
+	vld1.8		{d0-d1}, [r2, : 128]!
+	vld1.8		{d2-d3}, [r4, : 128]!
+	vsub.i32	q2, q0, q1
+	vadd.i32	q0, q0, q1
+	vld1.8		{d2-d3}, [r2, : 128]!
+	vld1.8		{d6-d7}, [r4, : 128]!
+	vsub.i32	q4, q1, q3
+	vadd.i32	q1, q1, q3
+	vld1.8		{d6}, [r2, : 64]
+	vld1.8		{d10}, [r4, : 64]
+	vsub.i32	q6, q3, q5
+	vadd.i32	q3, q3, q5
+	vst1.8		{d4-d5}, [r5, : 128]!
+	vst1.8		{d0-d1}, [r6, : 128]!
+	vst1.8		{d8-d9}, [r5, : 128]!
+	vst1.8		{d2-d3}, [r6, : 128]!
+	vst1.8		d12, [r5, : 64]
+	vst1.8		d6, [r6, : 64]
+	add		r2, r3, #0
+	add		r4, r3, #240
+	vld1.8		{d0-d1}, [r4, : 128]!
+	vld1.8		{d2-d3}, [r4, : 128]!
+	vld1.8		{d4}, [r4, : 64]
+	add		r4, r3, #336
+	vld1.8		{d6-d7}, [r4, : 128]!
+	vtrn.32		q0, q3
+	vld1.8		{d8-d9}, [r4, : 128]!
+	vshl.i32	q5, q0, #4
+	vtrn.32		q1, q4
+	vshl.i32	q6, q3, #4
+	vadd.i32	q5, q5, q0
+	vadd.i32	q6, q6, q3
+	vshl.i32	q7, q1, #4
+	vld1.8		{d5}, [r4, : 64]
+	vshl.i32	q8, q4, #4
+	vtrn.32		d4, d5
+	vadd.i32	q7, q7, q1
+	vadd.i32	q8, q8, q4
+	vld1.8		{d18-d19}, [r2, : 128]!
+	vshl.i32	q10, q2, #4
+	vld1.8		{d22-d23}, [r2, : 128]!
+	vadd.i32	q10, q10, q2
+	vld1.8		{d24}, [r2, : 64]
+	vadd.i32	q5, q5, q0
+	add		r2, r3, #288
+	vld1.8		{d26-d27}, [r2, : 128]!
+	vadd.i32	q6, q6, q3
+	vld1.8		{d28-d29}, [r2, : 128]!
+	vadd.i32	q8, q8, q4
+	vld1.8		{d25}, [r2, : 64]
+	vadd.i32	q10, q10, q2
+	vtrn.32		q9, q13
+	vadd.i32	q7, q7, q1
+	vadd.i32	q5, q5, q0
+	vtrn.32		q11, q14
+	vadd.i32	q6, q6, q3
+	add		r2, sp, #528
+	vadd.i32	q10, q10, q2
+	vtrn.32		d24, d25
+	vst1.8		{d12-d13}, [r2, : 128]
+	vshl.i32	q6, q13, #1
+	add		r2, sp, #544
+	vst1.8		{d20-d21}, [r2, : 128]
+	vshl.i32	q10, q14, #1
+	add		r2, sp, #560
+	vst1.8		{d12-d13}, [r2, : 128]
+	vshl.i32	q15, q12, #1
+	vadd.i32	q8, q8, q4
+	vext.32		d10, d31, d30, #0
+	vadd.i32	q7, q7, q1
+	add		r2, sp, #576
+	vst1.8		{d16-d17}, [r2, : 128]
+	vmull.s32	q8, d18, d5
+	vmlal.s32	q8, d26, d4
+	vmlal.s32	q8, d19, d9
+	vmlal.s32	q8, d27, d3
+	vmlal.s32	q8, d22, d8
+	vmlal.s32	q8, d28, d2
+	vmlal.s32	q8, d23, d7
+	vmlal.s32	q8, d29, d1
+	vmlal.s32	q8, d24, d6
+	vmlal.s32	q8, d25, d0
+	add		r2, sp, #592
+	vst1.8		{d14-d15}, [r2, : 128]
+	vmull.s32	q2, d18, d4
+	vmlal.s32	q2, d12, d9
+	vmlal.s32	q2, d13, d8
+	vmlal.s32	q2, d19, d3
+	vmlal.s32	q2, d22, d2
+	vmlal.s32	q2, d23, d1
+	vmlal.s32	q2, d24, d0
+	add		r2, sp, #608
+	vst1.8		{d20-d21}, [r2, : 128]
+	vmull.s32	q7, d18, d9
+	vmlal.s32	q7, d26, d3
+	vmlal.s32	q7, d19, d8
+	vmlal.s32	q7, d27, d2
+	vmlal.s32	q7, d22, d7
+	vmlal.s32	q7, d28, d1
+	vmlal.s32	q7, d23, d6
+	vmlal.s32	q7, d29, d0
+	add		r2, sp, #624
+	vst1.8		{d10-d11}, [r2, : 128]
+	vmull.s32	q5, d18, d3
+	vmlal.s32	q5, d19, d2
+	vmlal.s32	q5, d22, d1
+	vmlal.s32	q5, d23, d0
+	vmlal.s32	q5, d12, d8
+	add		r2, sp, #640
+	vst1.8		{d16-d17}, [r2, : 128]
+	vmull.s32	q4, d18, d8
+	vmlal.s32	q4, d26, d2
+	vmlal.s32	q4, d19, d7
+	vmlal.s32	q4, d27, d1
+	vmlal.s32	q4, d22, d6
+	vmlal.s32	q4, d28, d0
+	vmull.s32	q8, d18, d7
+	vmlal.s32	q8, d26, d1
+	vmlal.s32	q8, d19, d6
+	vmlal.s32	q8, d27, d0
+	add		r2, sp, #544
+	vld1.8		{d20-d21}, [r2, : 128]
+	vmlal.s32	q7, d24, d21
+	vmlal.s32	q7, d25, d20
+	vmlal.s32	q4, d23, d21
+	vmlal.s32	q4, d29, d20
+	vmlal.s32	q8, d22, d21
+	vmlal.s32	q8, d28, d20
+	vmlal.s32	q5, d24, d20
+	add		r2, sp, #544
+	vst1.8		{d14-d15}, [r2, : 128]
+	vmull.s32	q7, d18, d6
+	vmlal.s32	q7, d26, d0
+	add		r2, sp, #624
+	vld1.8		{d30-d31}, [r2, : 128]
+	vmlal.s32	q2, d30, d21
+	vmlal.s32	q7, d19, d21
+	vmlal.s32	q7, d27, d20
+	add		r2, sp, #592
+	vld1.8		{d26-d27}, [r2, : 128]
+	vmlal.s32	q4, d25, d27
+	vmlal.s32	q8, d29, d27
+	vmlal.s32	q8, d25, d26
+	vmlal.s32	q7, d28, d27
+	vmlal.s32	q7, d29, d26
+	add		r2, sp, #576
+	vld1.8		{d28-d29}, [r2, : 128]
+	vmlal.s32	q4, d24, d29
+	vmlal.s32	q8, d23, d29
+	vmlal.s32	q8, d24, d28
+	vmlal.s32	q7, d22, d29
+	vmlal.s32	q7, d23, d28
+	add		r2, sp, #576
+	vst1.8		{d8-d9}, [r2, : 128]
+	add		r2, sp, #528
+	vld1.8		{d8-d9}, [r2, : 128]
+	vmlal.s32	q7, d24, d9
+	vmlal.s32	q7, d25, d31
+	vmull.s32	q1, d18, d2
+	vmlal.s32	q1, d19, d1
+	vmlal.s32	q1, d22, d0
+	vmlal.s32	q1, d24, d27
+	vmlal.s32	q1, d23, d20
+	vmlal.s32	q1, d12, d7
+	vmlal.s32	q1, d13, d6
+	vmull.s32	q6, d18, d1
+	vmlal.s32	q6, d19, d0
+	vmlal.s32	q6, d23, d27
+	vmlal.s32	q6, d22, d20
+	vmlal.s32	q6, d24, d26
+	vmull.s32	q0, d18, d0
+	vmlal.s32	q0, d22, d27
+	vmlal.s32	q0, d23, d26
+	vmlal.s32	q0, d24, d31
+	vmlal.s32	q0, d19, d20
+	add		r2, sp, #608
+	vld1.8		{d18-d19}, [r2, : 128]
+	vmlal.s32	q2, d18, d7
+	vmlal.s32	q2, d19, d6
+	vmlal.s32	q5, d18, d6
+	vmlal.s32	q5, d19, d21
+	vmlal.s32	q1, d18, d21
+	vmlal.s32	q1, d19, d29
+	vmlal.s32	q0, d18, d28
+	vmlal.s32	q0, d19, d9
+	vmlal.s32	q6, d18, d29
+	vmlal.s32	q6, d19, d28
+	add		r2, sp, #560
+	vld1.8		{d18-d19}, [r2, : 128]
+	add		r2, sp, #480
+	vld1.8		{d22-d23}, [r2, : 128]
+	vmlal.s32	q5, d19, d7
+	vmlal.s32	q0, d18, d21
+	vmlal.s32	q0, d19, d29
+	vmlal.s32	q6, d18, d6
+	add		r2, sp, #496
+	vld1.8		{d6-d7}, [r2, : 128]
+	vmlal.s32	q6, d19, d21
+	add		r2, sp, #544
+	vld1.8		{d18-d19}, [r2, : 128]
+	vmlal.s32	q0, d30, d8
+	add		r2, sp, #640
+	vld1.8		{d20-d21}, [r2, : 128]
+	vmlal.s32	q5, d30, d29
+	add		r2, sp, #576
+	vld1.8		{d24-d25}, [r2, : 128]
+	vmlal.s32	q1, d30, d28
+	vadd.i64	q13, q0, q11
+	vadd.i64	q14, q5, q11
+	vmlal.s32	q6, d30, d9
+	vshr.s64	q4, q13, #26
+	vshr.s64	q13, q14, #26
+	vadd.i64	q7, q7, q4
+	vshl.i64	q4, q4, #26
+	vadd.i64	q14, q7, q3
+	vadd.i64	q9, q9, q13
+	vshl.i64	q13, q13, #26
+	vadd.i64	q15, q9, q3
+	vsub.i64	q0, q0, q4
+	vshr.s64	q4, q14, #25
+	vsub.i64	q5, q5, q13
+	vshr.s64	q13, q15, #25
+	vadd.i64	q6, q6, q4
+	vshl.i64	q4, q4, #25
+	vadd.i64	q14, q6, q11
+	vadd.i64	q2, q2, q13
+	vsub.i64	q4, q7, q4
+	vshr.s64	q7, q14, #26
+	vshl.i64	q13, q13, #25
+	vadd.i64	q14, q2, q11
+	vadd.i64	q8, q8, q7
+	vshl.i64	q7, q7, #26
+	vadd.i64	q15, q8, q3
+	vsub.i64	q9, q9, q13
+	vshr.s64	q13, q14, #26
+	vsub.i64	q6, q6, q7
+	vshr.s64	q7, q15, #25
+	vadd.i64	q10, q10, q13
+	vshl.i64	q13, q13, #26
+	vadd.i64	q14, q10, q3
+	vadd.i64	q1, q1, q7
+	add		r2, r3, #288
+	vshl.i64	q7, q7, #25
+	add		r4, r3, #96
+	vadd.i64	q15, q1, q11
+	add		r2, r2, #8
+	vsub.i64	q2, q2, q13
+	add		r4, r4, #8
+	vshr.s64	q13, q14, #25
+	vsub.i64	q7, q8, q7
+	vshr.s64	q8, q15, #26
+	vadd.i64	q14, q13, q13
+	vadd.i64	q12, q12, q8
+	vtrn.32		d12, d14
+	vshl.i64	q8, q8, #26
+	vtrn.32		d13, d15
+	vadd.i64	q3, q12, q3
+	vadd.i64	q0, q0, q14
+	vst1.8		d12, [r2, : 64]!
+	vshl.i64	q7, q13, #4
+	vst1.8		d13, [r4, : 64]!
+	vsub.i64	q1, q1, q8
+	vshr.s64	q3, q3, #25
+	vadd.i64	q0, q0, q7
+	vadd.i64	q5, q5, q3
+	vshl.i64	q3, q3, #25
+	vadd.i64	q6, q5, q11
+	vadd.i64	q0, q0, q13
+	vshl.i64	q7, q13, #25
+	vadd.i64	q8, q0, q11
+	vsub.i64	q3, q12, q3
+	vshr.s64	q6, q6, #26
+	vsub.i64	q7, q10, q7
+	vtrn.32		d2, d6
+	vshr.s64	q8, q8, #26
+	vtrn.32		d3, d7
+	vadd.i64	q3, q9, q6
+	vst1.8		d2, [r2, : 64]
+	vshl.i64	q6, q6, #26
+	vst1.8		d3, [r4, : 64]
+	vadd.i64	q1, q4, q8
+	vtrn.32		d4, d14
+	vshl.i64	q4, q8, #26
+	vtrn.32		d5, d15
+	vsub.i64	q5, q5, q6
+	add		r2, r2, #16
+	vsub.i64	q0, q0, q4
+	vst1.8		d4, [r2, : 64]
+	add		r4, r4, #16
+	vst1.8		d5, [r4, : 64]
+	vtrn.32		d10, d6
+	vtrn.32		d11, d7
+	sub		r2, r2, #8
+	sub		r4, r4, #8
+	vtrn.32		d0, d2
+	vtrn.32		d1, d3
+	vst1.8		d10, [r2, : 64]
+	vst1.8		d11, [r4, : 64]
+	sub		r2, r2, #24
+	sub		r4, r4, #24
+	vst1.8		d0, [r2, : 64]
+	vst1.8		d1, [r4, : 64]
+	add		r2, sp, #512
+	add		r4, r3, #144
+	add		r5, r3, #192
+	vld1.8		{d0-d1}, [r2, : 128]
+	vld1.8		{d2-d3}, [r4, : 128]!
+	vld1.8		{d4-d5}, [r5, : 128]!
+	vzip.i32	q1, q2
+	vld1.8		{d6-d7}, [r4, : 128]!
+	vld1.8		{d8-d9}, [r5, : 128]!
+	vshl.i32	q5, q1, #1
+	vzip.i32	q3, q4
+	vshl.i32	q6, q2, #1
+	vld1.8		{d14}, [r4, : 64]
+	vshl.i32	q8, q3, #1
+	vld1.8		{d15}, [r5, : 64]
+	vshl.i32	q9, q4, #1
+	vmul.i32	d21, d7, d1
+	vtrn.32		d14, d15
+	vmul.i32	q11, q4, q0
+	vmul.i32	q0, q7, q0
+	vmull.s32	q12, d2, d2
+	vmlal.s32	q12, d11, d1
+	vmlal.s32	q12, d12, d0
+	vmlal.s32	q12, d13, d23
+	vmlal.s32	q12, d16, d22
+	vmlal.s32	q12, d7, d21
+	vmull.s32	q10, d2, d11
+	vmlal.s32	q10, d4, d1
+	vmlal.s32	q10, d13, d0
+	vmlal.s32	q10, d6, d23
+	vmlal.s32	q10, d17, d22
+	vmull.s32	q13, d10, d4
+	vmlal.s32	q13, d11, d3
+	vmlal.s32	q13, d13, d1
+	vmlal.s32	q13, d16, d0
+	vmlal.s32	q13, d17, d23
+	vmlal.s32	q13, d8, d22
+	vmull.s32	q1, d10, d5
+	vmlal.s32	q1, d11, d4
+	vmlal.s32	q1, d6, d1
+	vmlal.s32	q1, d17, d0
+	vmlal.s32	q1, d8, d23
+	vmull.s32	q14, d10, d6
+	vmlal.s32	q14, d11, d13
+	vmlal.s32	q14, d4, d4
+	vmlal.s32	q14, d17, d1
+	vmlal.s32	q14, d18, d0
+	vmlal.s32	q14, d9, d23
+	vmull.s32	q11, d10, d7
+	vmlal.s32	q11, d11, d6
+	vmlal.s32	q11, d12, d5
+	vmlal.s32	q11, d8, d1
+	vmlal.s32	q11, d19, d0
+	vmull.s32	q15, d10, d8
+	vmlal.s32	q15, d11, d17
+	vmlal.s32	q15, d12, d6
+	vmlal.s32	q15, d13, d5
+	vmlal.s32	q15, d19, d1
+	vmlal.s32	q15, d14, d0
+	vmull.s32	q2, d10, d9
+	vmlal.s32	q2, d11, d8
+	vmlal.s32	q2, d12, d7
+	vmlal.s32	q2, d13, d6
+	vmlal.s32	q2, d14, d1
+	vmull.s32	q0, d15, d1
+	vmlal.s32	q0, d10, d14
+	vmlal.s32	q0, d11, d19
+	vmlal.s32	q0, d12, d8
+	vmlal.s32	q0, d13, d17
+	vmlal.s32	q0, d6, d6
+	add		r2, sp, #480
+	vld1.8		{d18-d19}, [r2, : 128]
+	vmull.s32	q3, d16, d7
+	vmlal.s32	q3, d10, d15
+	vmlal.s32	q3, d11, d14
+	vmlal.s32	q3, d12, d9
+	vmlal.s32	q3, d13, d8
+	add		r2, sp, #496
+	vld1.8		{d8-d9}, [r2, : 128]
+	vadd.i64	q5, q12, q9
+	vadd.i64	q6, q15, q9
+	vshr.s64	q5, q5, #26
+	vshr.s64	q6, q6, #26
+	vadd.i64	q7, q10, q5
+	vshl.i64	q5, q5, #26
+	vadd.i64	q8, q7, q4
+	vadd.i64	q2, q2, q6
+	vshl.i64	q6, q6, #26
+	vadd.i64	q10, q2, q4
+	vsub.i64	q5, q12, q5
+	vshr.s64	q8, q8, #25
+	vsub.i64	q6, q15, q6
+	vshr.s64	q10, q10, #25
+	vadd.i64	q12, q13, q8
+	vshl.i64	q8, q8, #25
+	vadd.i64	q13, q12, q9
+	vadd.i64	q0, q0, q10
+	vsub.i64	q7, q7, q8
+	vshr.s64	q8, q13, #26
+	vshl.i64	q10, q10, #25
+	vadd.i64	q13, q0, q9
+	vadd.i64	q1, q1, q8
+	vshl.i64	q8, q8, #26
+	vadd.i64	q15, q1, q4
+	vsub.i64	q2, q2, q10
+	vshr.s64	q10, q13, #26
+	vsub.i64	q8, q12, q8
+	vshr.s64	q12, q15, #25
+	vadd.i64	q3, q3, q10
+	vshl.i64	q10, q10, #26
+	vadd.i64	q13, q3, q4
+	vadd.i64	q14, q14, q12
+	add		r2, r3, #144
+	vshl.i64	q12, q12, #25
+	add		r4, r3, #192
+	vadd.i64	q15, q14, q9
+	add		r2, r2, #8
+	vsub.i64	q0, q0, q10
+	add		r4, r4, #8
+	vshr.s64	q10, q13, #25
+	vsub.i64	q1, q1, q12
+	vshr.s64	q12, q15, #26
+	vadd.i64	q13, q10, q10
+	vadd.i64	q11, q11, q12
+	vtrn.32		d16, d2
+	vshl.i64	q12, q12, #26
+	vtrn.32		d17, d3
+	vadd.i64	q1, q11, q4
+	vadd.i64	q4, q5, q13
+	vst1.8		d16, [r2, : 64]!
+	vshl.i64	q5, q10, #4
+	vst1.8		d17, [r4, : 64]!
+	vsub.i64	q8, q14, q12
+	vshr.s64	q1, q1, #25
+	vadd.i64	q4, q4, q5
+	vadd.i64	q5, q6, q1
+	vshl.i64	q1, q1, #25
+	vadd.i64	q6, q5, q9
+	vadd.i64	q4, q4, q10
+	vshl.i64	q10, q10, #25
+	vadd.i64	q9, q4, q9
+	vsub.i64	q1, q11, q1
+	vshr.s64	q6, q6, #26
+	vsub.i64	q3, q3, q10
+	vtrn.32		d16, d2
+	vshr.s64	q9, q9, #26
+	vtrn.32		d17, d3
+	vadd.i64	q1, q2, q6
+	vst1.8		d16, [r2, : 64]
+	vshl.i64	q2, q6, #26
+	vst1.8		d17, [r4, : 64]
+	vadd.i64	q6, q7, q9
+	vtrn.32		d0, d6
+	vshl.i64	q7, q9, #26
+	vtrn.32		d1, d7
+	vsub.i64	q2, q5, q2
+	add		r2, r2, #16
+	vsub.i64	q3, q4, q7
+	vst1.8		d0, [r2, : 64]
+	add		r4, r4, #16
+	vst1.8		d1, [r4, : 64]
+	vtrn.32		d4, d2
+	vtrn.32		d5, d3
+	sub		r2, r2, #8
+	sub		r4, r4, #8
+	vtrn.32		d6, d12
+	vtrn.32		d7, d13
+	vst1.8		d4, [r2, : 64]
+	vst1.8		d5, [r4, : 64]
+	sub		r2, r2, #24
+	sub		r4, r4, #24
+	vst1.8		d6, [r2, : 64]
+	vst1.8		d7, [r4, : 64]
+	add		r2, r3, #336
+	add		r4, r3, #288
+	vld1.8		{d0-d1}, [r2, : 128]!
+	vld1.8		{d2-d3}, [r4, : 128]!
+	vadd.i32	q0, q0, q1
+	vld1.8		{d2-d3}, [r2, : 128]!
+	vld1.8		{d4-d5}, [r4, : 128]!
+	vadd.i32	q1, q1, q2
+	add		r5, r3, #288
+	vld1.8		{d4}, [r2, : 64]
+	vld1.8		{d6}, [r4, : 64]
+	vadd.i32	q2, q2, q3
+	vst1.8		{d0-d1}, [r5, : 128]!
+	vst1.8		{d2-d3}, [r5, : 128]!
+	vst1.8		d4, [r5, : 64]
+	add		r2, r3, #48
+	add		r4, r3, #144
+	vld1.8		{d0-d1}, [r4, : 128]!
+	vld1.8		{d2-d3}, [r4, : 128]!
+	vld1.8		{d4}, [r4, : 64]
+	add		r4, r3, #288
+	vld1.8		{d6-d7}, [r4, : 128]!
+	vtrn.32		q0, q3
+	vld1.8		{d8-d9}, [r4, : 128]!
+	vshl.i32	q5, q0, #4
+	vtrn.32		q1, q4
+	vshl.i32	q6, q3, #4
+	vadd.i32	q5, q5, q0
+	vadd.i32	q6, q6, q3
+	vshl.i32	q7, q1, #4
+	vld1.8		{d5}, [r4, : 64]
+	vshl.i32	q8, q4, #4
+	vtrn.32		d4, d5
+	vadd.i32	q7, q7, q1
+	vadd.i32	q8, q8, q4
+	vld1.8		{d18-d19}, [r2, : 128]!
+	vshl.i32	q10, q2, #4
+	vld1.8		{d22-d23}, [r2, : 128]!
+	vadd.i32	q10, q10, q2
+	vld1.8		{d24}, [r2, : 64]
+	vadd.i32	q5, q5, q0
+	add		r2, r3, #240
+	vld1.8		{d26-d27}, [r2, : 128]!
+	vadd.i32	q6, q6, q3
+	vld1.8		{d28-d29}, [r2, : 128]!
+	vadd.i32	q8, q8, q4
+	vld1.8		{d25}, [r2, : 64]
+	vadd.i32	q10, q10, q2
+	vtrn.32		q9, q13
+	vadd.i32	q7, q7, q1
+	vadd.i32	q5, q5, q0
+	vtrn.32		q11, q14
+	vadd.i32	q6, q6, q3
+	add		r2, sp, #528
+	vadd.i32	q10, q10, q2
+	vtrn.32		d24, d25
+	vst1.8		{d12-d13}, [r2, : 128]
+	vshl.i32	q6, q13, #1
+	add		r2, sp, #544
+	vst1.8		{d20-d21}, [r2, : 128]
+	vshl.i32	q10, q14, #1
+	add		r2, sp, #560
+	vst1.8		{d12-d13}, [r2, : 128]
+	vshl.i32	q15, q12, #1
+	vadd.i32	q8, q8, q4
+	vext.32		d10, d31, d30, #0
+	vadd.i32	q7, q7, q1
+	add		r2, sp, #576
+	vst1.8		{d16-d17}, [r2, : 128]
+	vmull.s32	q8, d18, d5
+	vmlal.s32	q8, d26, d4
+	vmlal.s32	q8, d19, d9
+	vmlal.s32	q8, d27, d3
+	vmlal.s32	q8, d22, d8
+	vmlal.s32	q8, d28, d2
+	vmlal.s32	q8, d23, d7
+	vmlal.s32	q8, d29, d1
+	vmlal.s32	q8, d24, d6
+	vmlal.s32	q8, d25, d0
+	add		r2, sp, #592
+	vst1.8		{d14-d15}, [r2, : 128]
+	vmull.s32	q2, d18, d4
+	vmlal.s32	q2, d12, d9
+	vmlal.s32	q2, d13, d8
+	vmlal.s32	q2, d19, d3
+	vmlal.s32	q2, d22, d2
+	vmlal.s32	q2, d23, d1
+	vmlal.s32	q2, d24, d0
+	add		r2, sp, #608
+	vst1.8		{d20-d21}, [r2, : 128]
+	vmull.s32	q7, d18, d9
+	vmlal.s32	q7, d26, d3
+	vmlal.s32	q7, d19, d8
+	vmlal.s32	q7, d27, d2
+	vmlal.s32	q7, d22, d7
+	vmlal.s32	q7, d28, d1
+	vmlal.s32	q7, d23, d6
+	vmlal.s32	q7, d29, d0
+	add		r2, sp, #624
+	vst1.8		{d10-d11}, [r2, : 128]
+	vmull.s32	q5, d18, d3
+	vmlal.s32	q5, d19, d2
+	vmlal.s32	q5, d22, d1
+	vmlal.s32	q5, d23, d0
+	vmlal.s32	q5, d12, d8
+	add		r2, sp, #640
+	vst1.8		{d16-d17}, [r2, : 128]
+	vmull.s32	q4, d18, d8
+	vmlal.s32	q4, d26, d2
+	vmlal.s32	q4, d19, d7
+	vmlal.s32	q4, d27, d1
+	vmlal.s32	q4, d22, d6
+	vmlal.s32	q4, d28, d0
+	vmull.s32	q8, d18, d7
+	vmlal.s32	q8, d26, d1
+	vmlal.s32	q8, d19, d6
+	vmlal.s32	q8, d27, d0
+	add		r2, sp, #544
+	vld1.8		{d20-d21}, [r2, : 128]
+	vmlal.s32	q7, d24, d21
+	vmlal.s32	q7, d25, d20
+	vmlal.s32	q4, d23, d21
+	vmlal.s32	q4, d29, d20
+	vmlal.s32	q8, d22, d21
+	vmlal.s32	q8, d28, d20
+	vmlal.s32	q5, d24, d20
+	add		r2, sp, #544
+	vst1.8		{d14-d15}, [r2, : 128]
+	vmull.s32	q7, d18, d6
+	vmlal.s32	q7, d26, d0
+	add		r2, sp, #624
+	vld1.8		{d30-d31}, [r2, : 128]
+	vmlal.s32	q2, d30, d21
+	vmlal.s32	q7, d19, d21
+	vmlal.s32	q7, d27, d20
+	add		r2, sp, #592
+	vld1.8		{d26-d27}, [r2, : 128]
+	vmlal.s32	q4, d25, d27
+	vmlal.s32	q8, d29, d27
+	vmlal.s32	q8, d25, d26
+	vmlal.s32	q7, d28, d27
+	vmlal.s32	q7, d29, d26
+	add		r2, sp, #576
+	vld1.8		{d28-d29}, [r2, : 128]
+	vmlal.s32	q4, d24, d29
+	vmlal.s32	q8, d23, d29
+	vmlal.s32	q8, d24, d28
+	vmlal.s32	q7, d22, d29
+	vmlal.s32	q7, d23, d28
+	add		r2, sp, #576
+	vst1.8		{d8-d9}, [r2, : 128]
+	add		r2, sp, #528
+	vld1.8		{d8-d9}, [r2, : 128]
+	vmlal.s32	q7, d24, d9
+	vmlal.s32	q7, d25, d31
+	vmull.s32	q1, d18, d2
+	vmlal.s32	q1, d19, d1
+	vmlal.s32	q1, d22, d0
+	vmlal.s32	q1, d24, d27
+	vmlal.s32	q1, d23, d20
+	vmlal.s32	q1, d12, d7
+	vmlal.s32	q1, d13, d6
+	vmull.s32	q6, d18, d1
+	vmlal.s32	q6, d19, d0
+	vmlal.s32	q6, d23, d27
+	vmlal.s32	q6, d22, d20
+	vmlal.s32	q6, d24, d26
+	vmull.s32	q0, d18, d0
+	vmlal.s32	q0, d22, d27
+	vmlal.s32	q0, d23, d26
+	vmlal.s32	q0, d24, d31
+	vmlal.s32	q0, d19, d20
+	add		r2, sp, #608
+	vld1.8		{d18-d19}, [r2, : 128]
+	vmlal.s32	q2, d18, d7
+	vmlal.s32	q2, d19, d6
+	vmlal.s32	q5, d18, d6
+	vmlal.s32	q5, d19, d21
+	vmlal.s32	q1, d18, d21
+	vmlal.s32	q1, d19, d29
+	vmlal.s32	q0, d18, d28
+	vmlal.s32	q0, d19, d9
+	vmlal.s32	q6, d18, d29
+	vmlal.s32	q6, d19, d28
+	add		r2, sp, #560
+	vld1.8		{d18-d19}, [r2, : 128]
+	add		r2, sp, #480
+	vld1.8		{d22-d23}, [r2, : 128]
+	vmlal.s32	q5, d19, d7
+	vmlal.s32	q0, d18, d21
+	vmlal.s32	q0, d19, d29
+	vmlal.s32	q6, d18, d6
+	add		r2, sp, #496
+	vld1.8		{d6-d7}, [r2, : 128]
+	vmlal.s32	q6, d19, d21
+	add		r2, sp, #544
+	vld1.8		{d18-d19}, [r2, : 128]
+	vmlal.s32	q0, d30, d8
+	add		r2, sp, #640
+	vld1.8		{d20-d21}, [r2, : 128]
+	vmlal.s32	q5, d30, d29
+	add		r2, sp, #576
+	vld1.8		{d24-d25}, [r2, : 128]
+	vmlal.s32	q1, d30, d28
+	vadd.i64	q13, q0, q11
+	vadd.i64	q14, q5, q11
+	vmlal.s32	q6, d30, d9
+	vshr.s64	q4, q13, #26
+	vshr.s64	q13, q14, #26
+	vadd.i64	q7, q7, q4
+	vshl.i64	q4, q4, #26
+	vadd.i64	q14, q7, q3
+	vadd.i64	q9, q9, q13
+	vshl.i64	q13, q13, #26
+	vadd.i64	q15, q9, q3
+	vsub.i64	q0, q0, q4
+	vshr.s64	q4, q14, #25
+	vsub.i64	q5, q5, q13
+	vshr.s64	q13, q15, #25
+	vadd.i64	q6, q6, q4
+	vshl.i64	q4, q4, #25
+	vadd.i64	q14, q6, q11
+	vadd.i64	q2, q2, q13
+	vsub.i64	q4, q7, q4
+	vshr.s64	q7, q14, #26
+	vshl.i64	q13, q13, #25
+	vadd.i64	q14, q2, q11
+	vadd.i64	q8, q8, q7
+	vshl.i64	q7, q7, #26
+	vadd.i64	q15, q8, q3
+	vsub.i64	q9, q9, q13
+	vshr.s64	q13, q14, #26
+	vsub.i64	q6, q6, q7
+	vshr.s64	q7, q15, #25
+	vadd.i64	q10, q10, q13
+	vshl.i64	q13, q13, #26
+	vadd.i64	q14, q10, q3
+	vadd.i64	q1, q1, q7
+	add		r2, r3, #240
+	vshl.i64	q7, q7, #25
+	add		r4, r3, #144
+	vadd.i64	q15, q1, q11
+	add		r2, r2, #8
+	vsub.i64	q2, q2, q13
+	add		r4, r4, #8
+	vshr.s64	q13, q14, #25
+	vsub.i64	q7, q8, q7
+	vshr.s64	q8, q15, #26
+	vadd.i64	q14, q13, q13
+	vadd.i64	q12, q12, q8
+	vtrn.32		d12, d14
+	vshl.i64	q8, q8, #26
+	vtrn.32		d13, d15
+	vadd.i64	q3, q12, q3
+	vadd.i64	q0, q0, q14
+	vst1.8		d12, [r2, : 64]!
+	vshl.i64	q7, q13, #4
+	vst1.8		d13, [r4, : 64]!
+	vsub.i64	q1, q1, q8
+	vshr.s64	q3, q3, #25
+	vadd.i64	q0, q0, q7
+	vadd.i64	q5, q5, q3
+	vshl.i64	q3, q3, #25
+	vadd.i64	q6, q5, q11
+	vadd.i64	q0, q0, q13
+	vshl.i64	q7, q13, #25
+	vadd.i64	q8, q0, q11
+	vsub.i64	q3, q12, q3
+	vshr.s64	q6, q6, #26
+	vsub.i64	q7, q10, q7
+	vtrn.32		d2, d6
+	vshr.s64	q8, q8, #26
+	vtrn.32		d3, d7
+	vadd.i64	q3, q9, q6
+	vst1.8		d2, [r2, : 64]
+	vshl.i64	q6, q6, #26
+	vst1.8		d3, [r4, : 64]
+	vadd.i64	q1, q4, q8
+	vtrn.32		d4, d14
+	vshl.i64	q4, q8, #26
+	vtrn.32		d5, d15
+	vsub.i64	q5, q5, q6
+	add		r2, r2, #16
+	vsub.i64	q0, q0, q4
+	vst1.8		d4, [r2, : 64]
+	add		r4, r4, #16
+	vst1.8		d5, [r4, : 64]
+	vtrn.32		d10, d6
+	vtrn.32		d11, d7
+	sub		r2, r2, #8
+	sub		r4, r4, #8
+	vtrn.32		d0, d2
+	vtrn.32		d1, d3
+	vst1.8		d10, [r2, : 64]
+	vst1.8		d11, [r4, : 64]
+	sub		r2, r2, #24
+	sub		r4, r4, #24
+	vst1.8		d0, [r2, : 64]
+	vst1.8		d1, [r4, : 64]
+	ldr		r2, [sp, #456]
+	ldr		r4, [sp, #460]
+	subs		r5, r2, #1
+	bge		.Lmainloop
+	add		r1, r3, #144
+	add		r2, r3, #336
+	vld1.8		{d0-d1}, [r1, : 128]!
+	vld1.8		{d2-d3}, [r1, : 128]!
+	vld1.8		{d4}, [r1, : 64]
+	vst1.8		{d0-d1}, [r2, : 128]!
+	vst1.8		{d2-d3}, [r2, : 128]!
+	vst1.8		d4, [r2, : 64]
+	movw		r1, #0
+.Linvertloop:
+	add		r2, r3, #144
+	movw		r4, #0
+	movw		r5, #2
+	cmp		r1, #1
+	moveq		r5, #1
+	addeq		r2, r3, #336
+	addeq		r4, r3, #48
+	cmp		r1, #2
+	moveq		r5, #1
+	addeq		r2, r3, #48
+	cmp		r1, #3
+	moveq		r5, #5
+	addeq		r4, r3, #336
+	cmp		r1, #4
+	moveq		r5, #10
+	cmp		r1, #5
+	moveq		r5, #20
+	cmp		r1, #6
+	moveq		r5, #10
+	addeq		r2, r3, #336
+	addeq		r4, r3, #336
+	cmp		r1, #7
+	moveq		r5, #50
+	cmp		r1, #8
+	moveq		r5, #100
+	cmp		r1, #9
+	moveq		r5, #50
+	addeq		r2, r3, #336
+	cmp		r1, #10
+	moveq		r5, #5
+	addeq		r2, r3, #48
+	cmp		r1, #11
+	moveq		r5, #0
+	addeq		r2, r3, #96
+	add		r6, r3, #144
+	add		r7, r3, #288
+	vld1.8		{d0-d1}, [r6, : 128]!
+	vld1.8		{d2-d3}, [r6, : 128]!
+	vld1.8		{d4}, [r6, : 64]
+	vst1.8		{d0-d1}, [r7, : 128]!
+	vst1.8		{d2-d3}, [r7, : 128]!
+	vst1.8		d4, [r7, : 64]
+	cmp		r5, #0
+	beq		.Lskipsquaringloop
+.Lsquaringloop:
+	add		r6, r3, #288
+	add		r7, r3, #288
+	add		r8, r3, #288
+	vmov.i32	q0, #19
+	vmov.i32	q1, #0
+	vmov.i32	q2, #1
+	vzip.i32	q1, q2
+	vld1.8		{d4-d5}, [r7, : 128]!
+	vld1.8		{d6-d7}, [r7, : 128]!
+	vld1.8		{d9}, [r7, : 64]
+	vld1.8		{d10-d11}, [r6, : 128]!
+	add		r7, sp, #384
+	vld1.8		{d12-d13}, [r6, : 128]!
+	vmul.i32	q7, q2, q0
+	vld1.8		{d8}, [r6, : 64]
+	vext.32		d17, d11, d10, #1
+	vmul.i32	q9, q3, q0
+	vext.32		d16, d10, d8, #1
+	vshl.u32	q10, q5, q1
+	vext.32		d22, d14, d4, #1
+	vext.32		d24, d18, d6, #1
+	vshl.u32	q13, q6, q1
+	vshl.u32	d28, d8, d2
+	vrev64.i32	d22, d22
+	vmul.i32	d1, d9, d1
+	vrev64.i32	d24, d24
+	vext.32		d29, d8, d13, #1
+	vext.32		d0, d1, d9, #1
+	vrev64.i32	d0, d0
+	vext.32		d2, d9, d1, #1
+	vext.32		d23, d15, d5, #1
+	vmull.s32	q4, d20, d4
+	vrev64.i32	d23, d23
+	vmlal.s32	q4, d21, d1
+	vrev64.i32	d2, d2
+	vmlal.s32	q4, d26, d19
+	vext.32		d3, d5, d15, #1
+	vmlal.s32	q4, d27, d18
+	vrev64.i32	d3, d3
+	vmlal.s32	q4, d28, d15
+	vext.32		d14, d12, d11, #1
+	vmull.s32	q5, d16, d23
+	vext.32		d15, d13, d12, #1
+	vmlal.s32	q5, d17, d4
+	vst1.8		d8, [r7, : 64]!
+	vmlal.s32	q5, d14, d1
+	vext.32		d12, d9, d8, #0
+	vmlal.s32	q5, d15, d19
+	vmov.i64	d13, #0
+	vmlal.s32	q5, d29, d18
+	vext.32		d25, d19, d7, #1
+	vmlal.s32	q6, d20, d5
+	vrev64.i32	d25, d25
+	vmlal.s32	q6, d21, d4
+	vst1.8		d11, [r7, : 64]!
+	vmlal.s32	q6, d26, d1
+	vext.32		d9, d10, d10, #0
+	vmlal.s32	q6, d27, d19
+	vmov.i64	d8, #0
+	vmlal.s32	q6, d28, d18
+	vmlal.s32	q4, d16, d24
+	vmlal.s32	q4, d17, d5
+	vmlal.s32	q4, d14, d4
+	vst1.8		d12, [r7, : 64]!
+	vmlal.s32	q4, d15, d1
+	vext.32		d10, d13, d12, #0
+	vmlal.s32	q4, d29, d19
+	vmov.i64	d11, #0
+	vmlal.s32	q5, d20, d6
+	vmlal.s32	q5, d21, d5
+	vmlal.s32	q5, d26, d4
+	vext.32		d13, d8, d8, #0
+	vmlal.s32	q5, d27, d1
+	vmov.i64	d12, #0
+	vmlal.s32	q5, d28, d19
+	vst1.8		d9, [r7, : 64]!
+	vmlal.s32	q6, d16, d25
+	vmlal.s32	q6, d17, d6
+	vst1.8		d10, [r7, : 64]
+	vmlal.s32	q6, d14, d5
+	vext.32		d8, d11, d10, #0
+	vmlal.s32	q6, d15, d4
+	vmov.i64	d9, #0
+	vmlal.s32	q6, d29, d1
+	vmlal.s32	q4, d20, d7
+	vmlal.s32	q4, d21, d6
+	vmlal.s32	q4, d26, d5
+	vext.32		d11, d12, d12, #0
+	vmlal.s32	q4, d27, d4
+	vmov.i64	d10, #0
+	vmlal.s32	q4, d28, d1
+	vmlal.s32	q5, d16, d0
+	sub		r6, r7, #32
+	vmlal.s32	q5, d17, d7
+	vmlal.s32	q5, d14, d6
+	vext.32		d30, d9, d8, #0
+	vmlal.s32	q5, d15, d5
+	vld1.8		{d31}, [r6, : 64]!
+	vmlal.s32	q5, d29, d4
+	vmlal.s32	q15, d20, d0
+	vext.32		d0, d6, d18, #1
+	vmlal.s32	q15, d21, d25
+	vrev64.i32	d0, d0
+	vmlal.s32	q15, d26, d24
+	vext.32		d1, d7, d19, #1
+	vext.32		d7, d10, d10, #0
+	vmlal.s32	q15, d27, d23
+	vrev64.i32	d1, d1
+	vld1.8		{d6}, [r6, : 64]
+	vmlal.s32	q15, d28, d22
+	vmlal.s32	q3, d16, d4
+	add		r6, r6, #24
+	vmlal.s32	q3, d17, d2
+	vext.32		d4, d31, d30, #0
+	vmov		d17, d11
+	vmlal.s32	q3, d14, d1
+	vext.32		d11, d13, d13, #0
+	vext.32		d13, d30, d30, #0
+	vmlal.s32	q3, d15, d0
+	vext.32		d1, d8, d8, #0
+	vmlal.s32	q3, d29, d3
+	vld1.8		{d5}, [r6, : 64]
+	sub		r6, r6, #16
+	vext.32		d10, d6, d6, #0
+	vmov.i32	q1, #0xffffffff
+	vshl.i64	q4, q1, #25
+	add		r7, sp, #480
+	vld1.8		{d14-d15}, [r7, : 128]
+	vadd.i64	q9, q2, q7
+	vshl.i64	q1, q1, #26
+	vshr.s64	q10, q9, #26
+	vld1.8		{d0}, [r6, : 64]!
+	vadd.i64	q5, q5, q10
+	vand		q9, q9, q1
+	vld1.8		{d16}, [r6, : 64]!
+	add		r6, sp, #496
+	vld1.8		{d20-d21}, [r6, : 128]
+	vadd.i64	q11, q5, q10
+	vsub.i64	q2, q2, q9
+	vshr.s64	q9, q11, #25
+	vext.32		d12, d5, d4, #0
+	vand		q11, q11, q4
+	vadd.i64	q0, q0, q9
+	vmov		d19, d7
+	vadd.i64	q3, q0, q7
+	vsub.i64	q5, q5, q11
+	vshr.s64	q11, q3, #26
+	vext.32		d18, d11, d10, #0
+	vand		q3, q3, q1
+	vadd.i64	q8, q8, q11
+	vadd.i64	q11, q8, q10
+	vsub.i64	q0, q0, q3
+	vshr.s64	q3, q11, #25
+	vand		q11, q11, q4
+	vadd.i64	q3, q6, q3
+	vadd.i64	q6, q3, q7
+	vsub.i64	q8, q8, q11
+	vshr.s64	q11, q6, #26
+	vand		q6, q6, q1
+	vadd.i64	q9, q9, q11
+	vadd.i64	d25, d19, d21
+	vsub.i64	q3, q3, q6
+	vshr.s64	d23, d25, #25
+	vand		q4, q12, q4
+	vadd.i64	d21, d23, d23
+	vshl.i64	d25, d23, #4
+	vadd.i64	d21, d21, d23
+	vadd.i64	d25, d25, d21
+	vadd.i64	d4, d4, d25
+	vzip.i32	q0, q8
+	vadd.i64	d12, d4, d14
+	add		r6, r8, #8
+	vst1.8		d0, [r6, : 64]
+	vsub.i64	d19, d19, d9
+	add		r6, r6, #16
+	vst1.8		d16, [r6, : 64]
+	vshr.s64	d22, d12, #26
+	vand		q0, q6, q1
+	vadd.i64	d10, d10, d22
+	vzip.i32	q3, q9
+	vsub.i64	d4, d4, d0
+	sub		r6, r6, #8
+	vst1.8		d6, [r6, : 64]
+	add		r6, r6, #16
+	vst1.8		d18, [r6, : 64]
+	vzip.i32	q2, q5
+	sub		r6, r6, #32
+	vst1.8		d4, [r6, : 64]
+	subs		r5, r5, #1
+	bhi		.Lsquaringloop
+.Lskipsquaringloop:
+	mov		r2, r2
+	add		r5, r3, #288
+	add		r6, r3, #144
+	vmov.i32	q0, #19
+	vmov.i32	q1, #0
+	vmov.i32	q2, #1
+	vzip.i32	q1, q2
+	vld1.8		{d4-d5}, [r5, : 128]!
+	vld1.8		{d6-d7}, [r5, : 128]!
+	vld1.8		{d9}, [r5, : 64]
+	vld1.8		{d10-d11}, [r2, : 128]!
+	add		r5, sp, #384
+	vld1.8		{d12-d13}, [r2, : 128]!
+	vmul.i32	q7, q2, q0
+	vld1.8		{d8}, [r2, : 64]
+	vext.32		d17, d11, d10, #1
+	vmul.i32	q9, q3, q0
+	vext.32		d16, d10, d8, #1
+	vshl.u32	q10, q5, q1
+	vext.32		d22, d14, d4, #1
+	vext.32		d24, d18, d6, #1
+	vshl.u32	q13, q6, q1
+	vshl.u32	d28, d8, d2
+	vrev64.i32	d22, d22
+	vmul.i32	d1, d9, d1
+	vrev64.i32	d24, d24
+	vext.32		d29, d8, d13, #1
+	vext.32		d0, d1, d9, #1
+	vrev64.i32	d0, d0
+	vext.32		d2, d9, d1, #1
+	vext.32		d23, d15, d5, #1
+	vmull.s32	q4, d20, d4
+	vrev64.i32	d23, d23
+	vmlal.s32	q4, d21, d1
+	vrev64.i32	d2, d2
+	vmlal.s32	q4, d26, d19
+	vext.32		d3, d5, d15, #1
+	vmlal.s32	q4, d27, d18
+	vrev64.i32	d3, d3
+	vmlal.s32	q4, d28, d15
+	vext.32		d14, d12, d11, #1
+	vmull.s32	q5, d16, d23
+	vext.32		d15, d13, d12, #1
+	vmlal.s32	q5, d17, d4
+	vst1.8		d8, [r5, : 64]!
+	vmlal.s32	q5, d14, d1
+	vext.32		d12, d9, d8, #0
+	vmlal.s32	q5, d15, d19
+	vmov.i64	d13, #0
+	vmlal.s32	q5, d29, d18
+	vext.32		d25, d19, d7, #1
+	vmlal.s32	q6, d20, d5
+	vrev64.i32	d25, d25
+	vmlal.s32	q6, d21, d4
+	vst1.8		d11, [r5, : 64]!
+	vmlal.s32	q6, d26, d1
+	vext.32		d9, d10, d10, #0
+	vmlal.s32	q6, d27, d19
+	vmov.i64	d8, #0
+	vmlal.s32	q6, d28, d18
+	vmlal.s32	q4, d16, d24
+	vmlal.s32	q4, d17, d5
+	vmlal.s32	q4, d14, d4
+	vst1.8		d12, [r5, : 64]!
+	vmlal.s32	q4, d15, d1
+	vext.32		d10, d13, d12, #0
+	vmlal.s32	q4, d29, d19
+	vmov.i64	d11, #0
+	vmlal.s32	q5, d20, d6
+	vmlal.s32	q5, d21, d5
+	vmlal.s32	q5, d26, d4
+	vext.32		d13, d8, d8, #0
+	vmlal.s32	q5, d27, d1
+	vmov.i64	d12, #0
+	vmlal.s32	q5, d28, d19
+	vst1.8		d9, [r5, : 64]!
+	vmlal.s32	q6, d16, d25
+	vmlal.s32	q6, d17, d6
+	vst1.8		d10, [r5, : 64]
+	vmlal.s32	q6, d14, d5
+	vext.32		d8, d11, d10, #0
+	vmlal.s32	q6, d15, d4
+	vmov.i64	d9, #0
+	vmlal.s32	q6, d29, d1
+	vmlal.s32	q4, d20, d7
+	vmlal.s32	q4, d21, d6
+	vmlal.s32	q4, d26, d5
+	vext.32		d11, d12, d12, #0
+	vmlal.s32	q4, d27, d4
+	vmov.i64	d10, #0
+	vmlal.s32	q4, d28, d1
+	vmlal.s32	q5, d16, d0
+	sub		r2, r5, #32
+	vmlal.s32	q5, d17, d7
+	vmlal.s32	q5, d14, d6
+	vext.32		d30, d9, d8, #0
+	vmlal.s32	q5, d15, d5
+	vld1.8		{d31}, [r2, : 64]!
+	vmlal.s32	q5, d29, d4
+	vmlal.s32	q15, d20, d0
+	vext.32		d0, d6, d18, #1
+	vmlal.s32	q15, d21, d25
+	vrev64.i32	d0, d0
+	vmlal.s32	q15, d26, d24
+	vext.32		d1, d7, d19, #1
+	vext.32		d7, d10, d10, #0
+	vmlal.s32	q15, d27, d23
+	vrev64.i32	d1, d1
+	vld1.8		{d6}, [r2, : 64]
+	vmlal.s32	q15, d28, d22
+	vmlal.s32	q3, d16, d4
+	add		r2, r2, #24
+	vmlal.s32	q3, d17, d2
+	vext.32		d4, d31, d30, #0
+	vmov		d17, d11
+	vmlal.s32	q3, d14, d1
+	vext.32		d11, d13, d13, #0
+	vext.32		d13, d30, d30, #0
+	vmlal.s32	q3, d15, d0
+	vext.32		d1, d8, d8, #0
+	vmlal.s32	q3, d29, d3
+	vld1.8		{d5}, [r2, : 64]
+	sub		r2, r2, #16
+	vext.32		d10, d6, d6, #0
+	vmov.i32	q1, #0xffffffff
+	vshl.i64	q4, q1, #25
+	add		r5, sp, #480
+	vld1.8		{d14-d15}, [r5, : 128]
+	vadd.i64	q9, q2, q7
+	vshl.i64	q1, q1, #26
+	vshr.s64	q10, q9, #26
+	vld1.8		{d0}, [r2, : 64]!
+	vadd.i64	q5, q5, q10
+	vand		q9, q9, q1
+	vld1.8		{d16}, [r2, : 64]!
+	add		r2, sp, #496
+	vld1.8		{d20-d21}, [r2, : 128]
+	vadd.i64	q11, q5, q10
+	vsub.i64	q2, q2, q9
+	vshr.s64	q9, q11, #25
+	vext.32		d12, d5, d4, #0
+	vand		q11, q11, q4
+	vadd.i64	q0, q0, q9
+	vmov		d19, d7
+	vadd.i64	q3, q0, q7
+	vsub.i64	q5, q5, q11
+	vshr.s64	q11, q3, #26
+	vext.32		d18, d11, d10, #0
+	vand		q3, q3, q1
+	vadd.i64	q8, q8, q11
+	vadd.i64	q11, q8, q10
+	vsub.i64	q0, q0, q3
+	vshr.s64	q3, q11, #25
+	vand		q11, q11, q4
+	vadd.i64	q3, q6, q3
+	vadd.i64	q6, q3, q7
+	vsub.i64	q8, q8, q11
+	vshr.s64	q11, q6, #26
+	vand		q6, q6, q1
+	vadd.i64	q9, q9, q11
+	vadd.i64	d25, d19, d21
+	vsub.i64	q3, q3, q6
+	vshr.s64	d23, d25, #25
+	vand		q4, q12, q4
+	vadd.i64	d21, d23, d23
+	vshl.i64	d25, d23, #4
+	vadd.i64	d21, d21, d23
+	vadd.i64	d25, d25, d21
+	vadd.i64	d4, d4, d25
+	vzip.i32	q0, q8
+	vadd.i64	d12, d4, d14
+	add		r2, r6, #8
+	vst1.8		d0, [r2, : 64]
+	vsub.i64	d19, d19, d9
+	add		r2, r2, #16
+	vst1.8		d16, [r2, : 64]
+	vshr.s64	d22, d12, #26
+	vand		q0, q6, q1
+	vadd.i64	d10, d10, d22
+	vzip.i32	q3, q9
+	vsub.i64	d4, d4, d0
+	sub		r2, r2, #8
+	vst1.8		d6, [r2, : 64]
+	add		r2, r2, #16
+	vst1.8		d18, [r2, : 64]
+	vzip.i32	q2, q5
+	sub		r2, r2, #32
+	vst1.8		d4, [r2, : 64]
+	cmp		r4, #0
+	beq		.Lskippostcopy
+	add		r2, r3, #144
+	mov		r4, r4
+	vld1.8		{d0-d1}, [r2, : 128]!
+	vld1.8		{d2-d3}, [r2, : 128]!
+	vld1.8		{d4}, [r2, : 64]
+	vst1.8		{d0-d1}, [r4, : 128]!
+	vst1.8		{d2-d3}, [r4, : 128]!
+	vst1.8		d4, [r4, : 64]
+.Lskippostcopy:
+	cmp		r1, #1
+	bne		.Lskipfinalcopy
+	add		r2, r3, #288
+	add		r4, r3, #144
+	vld1.8		{d0-d1}, [r2, : 128]!
+	vld1.8		{d2-d3}, [r2, : 128]!
+	vld1.8		{d4}, [r2, : 64]
+	vst1.8		{d0-d1}, [r4, : 128]!
+	vst1.8		{d2-d3}, [r4, : 128]!
+	vst1.8		d4, [r4, : 64]
+.Lskipfinalcopy:
+	add		r1, r1, #1
+	cmp		r1, #12
+	blo		.Linvertloop
+	add		r1, r3, #144
+	ldr		r2, [r1], #4
+	ldr		r3, [r1], #4
+	ldr		r4, [r1], #4
+	ldr		r5, [r1], #4
+	ldr		r6, [r1], #4
+	ldr		r7, [r1], #4
+	ldr		r8, [r1], #4
+	ldr		r9, [r1], #4
+	ldr		r10, [r1], #4
+	ldr		r1, [r1]
+	add		r11, r1, r1, LSL #4
+	add		r11, r11, r1, LSL #1
+	add		r11, r11, #16777216
+	mov		r11, r11, ASR #25
+	add		r11, r11, r2
+	mov		r11, r11, ASR #26
+	add		r11, r11, r3
+	mov		r11, r11, ASR #25
+	add		r11, r11, r4
+	mov		r11, r11, ASR #26
+	add		r11, r11, r5
+	mov		r11, r11, ASR #25
+	add		r11, r11, r6
+	mov		r11, r11, ASR #26
+	add		r11, r11, r7
+	mov		r11, r11, ASR #25
+	add		r11, r11, r8
+	mov		r11, r11, ASR #26
+	add		r11, r11, r9
+	mov		r11, r11, ASR #25
+	add		r11, r11, r10
+	mov		r11, r11, ASR #26
+	add		r11, r11, r1
+	mov		r11, r11, ASR #25
+	add		r2, r2, r11
+	add		r2, r2, r11, LSL #1
+	add		r2, r2, r11, LSL #4
+	mov		r11, r2, ASR #26
+	add		r3, r3, r11
+	sub		r2, r2, r11, LSL #26
+	mov		r11, r3, ASR #25
+	add		r4, r4, r11
+	sub		r3, r3, r11, LSL #25
+	mov		r11, r4, ASR #26
+	add		r5, r5, r11
+	sub		r4, r4, r11, LSL #26
+	mov		r11, r5, ASR #25
+	add		r6, r6, r11
+	sub		r5, r5, r11, LSL #25
+	mov		r11, r6, ASR #26
+	add		r7, r7, r11
+	sub		r6, r6, r11, LSL #26
+	mov		r11, r7, ASR #25
+	add		r8, r8, r11
+	sub		r7, r7, r11, LSL #25
+	mov		r11, r8, ASR #26
+	add		r9, r9, r11
+	sub		r8, r8, r11, LSL #26
+	mov		r11, r9, ASR #25
+	add		r10, r10, r11
+	sub		r9, r9, r11, LSL #25
+	mov		r11, r10, ASR #26
+	add		r1, r1, r11
+	sub		r10, r10, r11, LSL #26
+	mov		r11, r1, ASR #25
+	sub		r1, r1, r11, LSL #25
+	add		r2, r2, r3, LSL #26
+	mov		r3, r3, LSR #6
+	add		r3, r3, r4, LSL #19
+	mov		r4, r4, LSR #13
+	add		r4, r4, r5, LSL #13
+	mov		r5, r5, LSR #19
+	add		r5, r5, r6, LSL #6
+	add		r6, r7, r8, LSL #25
+	mov		r7, r8, LSR #7
+	add		r7, r7, r9, LSL #19
+	mov		r8, r9, LSR #13
+	add		r8, r8, r10, LSL #12
+	mov		r9, r10, LSR #20
+	add		r1, r9, r1, LSL #6
+	str		r2, [r0]
+	str		r3, [r0, #4]
+	str		r4, [r0, #8]
+	str		r5, [r0, #12]
+	str		r6, [r0, #16]
+	str		r7, [r0, #20]
+	str		r8, [r0, #24]
+	str		r1, [r0, #28]
+	movw		r0, #0
+	mov		sp, ip
+	pop		{r4-r11, pc}
+ENDPROC(curve25519_neon)
+
+#endif
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 16/20] zinc: Curve25519 x86_64 implementation
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (14 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 15/20] zinc: Curve25519 ARM implementation Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc Jason A. Donenfeld
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Armando Faz-Hernández,
	Thomas Gleixner, Ingo Molnar, x86

This implementation is the fastest available x86_64 implementation, and
unlike Sandy2x, it doesn't requie use of the floating point registers at
all. Instead it makes use of BMI2 and ADX, available on recent
microarchitectures. The implementation was written by Armando
Faz-Hernández with contributions (upstream) from Samuel Neves and me,
in addition to further changes in the kernel implementation from us.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Armando Faz-Hernández <armfazh@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: x86@kernel.org
---
 lib/zinc/Makefile                            |    3 +
 lib/zinc/curve25519/curve25519-x86_64-glue.h |   49 +
 lib/zinc/curve25519/curve25519-x86_64.h      | 2333 ++++++++++++++++++
 3 files changed, 2385 insertions(+)
 create mode 100644 lib/zinc/curve25519/curve25519-x86_64-glue.h
 create mode 100644 lib/zinc/curve25519/curve25519-x86_64.h

diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
index 0a9d97146c70..adb3acda630a 100644
--- a/lib/zinc/Makefile
+++ b/lib/zinc/Makefile
@@ -57,6 +57,9 @@ ifeq ($(CONFIG_ZINC_ARCH_ARM)$(CONFIG_KERNEL_MODE_NEON),yy)
 zinc-y += curve25519/curve25519-arm.o
 CFLAGS_curve25519.o += -include $(srctree)/$(src)/curve25519/curve25519-arm-glue.h
 endif
+ifeq ($(CONFIG_ZINC_ARCH_X86_64),y)
+CFLAGS_curve25519.o += -include $(srctree)/$(src)/curve25519/curve25519-x86_64-glue.h
+endif
 endif
 
 ifeq ($(CONFIG_ZINC_BLAKE2S),y)
diff --git a/lib/zinc/curve25519/curve25519-x86_64-glue.h b/lib/zinc/curve25519/curve25519-x86_64-glue.h
new file mode 100644
index 000000000000..7011ad055c33
--- /dev/null
+++ b/lib/zinc/curve25519/curve25519-x86_64-glue.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <zinc/curve25519.h>
+#include <asm/cpufeature.h>
+#include <asm/processor.h>
+
+#include "curve25519-x86_64.h"
+
+static bool curve25519_use_bmi2 __ro_after_init;
+static bool curve25519_use_adx __ro_after_init;
+
+void __init curve25519_fpu_init(void)
+{
+	curve25519_use_bmi2 = boot_cpu_has(X86_FEATURE_BMI2);
+	curve25519_use_adx = boot_cpu_has(X86_FEATURE_BMI2) &&
+			     boot_cpu_has(X86_FEATURE_ADX);
+}
+
+static inline bool curve25519_arch(u8 mypublic[CURVE25519_POINT_SIZE],
+				   const u8 secret[CURVE25519_POINT_SIZE],
+				   const u8 basepoint[CURVE25519_POINT_SIZE])
+{
+	if (curve25519_use_adx) {
+		curve25519_adx(mypublic, secret, basepoint);
+		return true;
+	} else if (curve25519_use_bmi2) {
+		curve25519_bmi2(mypublic, secret, basepoint);
+		return true;
+	}
+	return false;
+}
+
+static inline bool curve25519_base_arch(u8 pub[CURVE25519_POINT_SIZE],
+					const u8 secret[CURVE25519_POINT_SIZE])
+{
+	if (curve25519_use_adx) {
+		curve25519_adx_base(pub, secret);
+		return true;
+	} else if (curve25519_use_bmi2) {
+		curve25519_bmi2_base(pub, secret);
+		return true;
+	}
+	return false;
+}
+
+#define HAVE_CURVE25519_ARCH_IMPLEMENTATION
diff --git a/lib/zinc/curve25519/curve25519-x86_64.h b/lib/zinc/curve25519/curve25519-x86_64.h
new file mode 100644
index 000000000000..15c2e4307583
--- /dev/null
+++ b/lib/zinc/curve25519/curve25519-x86_64.h
@@ -0,0 +1,2333 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (c) 2017 Armando Faz <armfazh@ic.unicamp.br>. All Rights Reserved.
+ * Copyright (C) 2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2018 Samuel Neves <sneves@dei.uc.pt>. All Rights Reserved.
+ */
+
+enum { NUM_WORDS_ELTFP25519 = 4 };
+typedef __aligned(32) u64 eltfp25519_1w[NUM_WORDS_ELTFP25519];
+typedef __aligned(32) u64 eltfp25519_1w_buffer[2 * NUM_WORDS_ELTFP25519];
+
+#define mul_eltfp25519_1w_adx(c, a, b) do { \
+	mul_256x256_integer_adx(m.buffer, a, b); \
+	red_eltfp25519_1w_adx(c, m.buffer); \
+} while (0)
+
+#define mul_eltfp25519_1w_bmi2(c, a, b) do { \
+	mul_256x256_integer_bmi2(m.buffer, a, b); \
+	red_eltfp25519_1w_bmi2(c, m.buffer); \
+} while (0)
+
+#define sqr_eltfp25519_1w_adx(a) do { \
+	sqr_256x256_integer_adx(m.buffer, a); \
+	red_eltfp25519_1w_adx(a, m.buffer); \
+} while (0)
+
+#define sqr_eltfp25519_1w_bmi2(a) do { \
+	sqr_256x256_integer_bmi2(m.buffer, a); \
+	red_eltfp25519_1w_bmi2(a, m.buffer); \
+} while (0)
+
+#define mul_eltfp25519_2w_adx(c, a, b) do { \
+	mul2_256x256_integer_adx(m.buffer, a, b); \
+	red_eltfp25519_2w_adx(c, m.buffer); \
+} while (0)
+
+#define mul_eltfp25519_2w_bmi2(c, a, b) do { \
+	mul2_256x256_integer_bmi2(m.buffer, a, b); \
+	red_eltfp25519_2w_bmi2(c, m.buffer); \
+} while (0)
+
+#define sqr_eltfp25519_2w_adx(a) do { \
+	sqr2_256x256_integer_adx(m.buffer, a); \
+	red_eltfp25519_2w_adx(a, m.buffer); \
+} while (0)
+
+#define sqr_eltfp25519_2w_bmi2(a) do { \
+	sqr2_256x256_integer_bmi2(m.buffer, a); \
+	red_eltfp25519_2w_bmi2(a, m.buffer); \
+} while (0)
+
+#define sqrn_eltfp25519_1w_adx(a, times) do { \
+	int ____counter = (times); \
+	while (____counter-- > 0) \
+		sqr_eltfp25519_1w_adx(a); \
+} while (0)
+
+#define sqrn_eltfp25519_1w_bmi2(a, times) do { \
+	int ____counter = (times); \
+	while (____counter-- > 0) \
+		sqr_eltfp25519_1w_bmi2(a); \
+} while (0)
+
+#define copy_eltfp25519_1w(C, A) do { \
+	(C)[0] = (A)[0]; \
+	(C)[1] = (A)[1]; \
+	(C)[2] = (A)[2]; \
+	(C)[3] = (A)[3]; \
+} while (0)
+
+#define setzero_eltfp25519_1w(C) do { \
+	(C)[0] = 0; \
+	(C)[1] = 0; \
+	(C)[2] = 0; \
+	(C)[3] = 0; \
+} while (0)
+
+__aligned(32) static const u64 table_ladder_8k[252 * NUM_WORDS_ELTFP25519] = {
+	/*   1 */ 0xfffffffffffffff3UL, 0xffffffffffffffffUL,
+		  0xffffffffffffffffUL, 0x5fffffffffffffffUL,
+	/*   2 */ 0x6b8220f416aafe96UL, 0x82ebeb2b4f566a34UL,
+		  0xd5a9a5b075a5950fUL, 0x5142b2cf4b2488f4UL,
+	/*   3 */ 0x6aaebc750069680cUL, 0x89cf7820a0f99c41UL,
+		  0x2a58d9183b56d0f4UL, 0x4b5aca80e36011a4UL,
+	/*   4 */ 0x329132348c29745dUL, 0xf4a2e616e1642fd7UL,
+		  0x1e45bb03ff67bc34UL, 0x306912d0f42a9b4aUL,
+	/*   5 */ 0xff886507e6af7154UL, 0x04f50e13dfeec82fUL,
+		  0xaa512fe82abab5ceUL, 0x174e251a68d5f222UL,
+	/*   6 */ 0xcf96700d82028898UL, 0x1743e3370a2c02c5UL,
+		  0x379eec98b4e86eaaUL, 0x0c59888a51e0482eUL,
+	/*   7 */ 0xfbcbf1d699b5d189UL, 0xacaef0d58e9fdc84UL,
+		  0xc1c20d06231f7614UL, 0x2938218da274f972UL,
+	/*   8 */ 0xf6af49beff1d7f18UL, 0xcc541c22387ac9c2UL,
+		  0x96fcc9ef4015c56bUL, 0x69c1627c690913a9UL,
+	/*   9 */ 0x7a86fd2f4733db0eUL, 0xfdb8c4f29e087de9UL,
+		  0x095e4b1a8ea2a229UL, 0x1ad7a7c829b37a79UL,
+	/*  10 */ 0x342d89cad17ea0c0UL, 0x67bedda6cced2051UL,
+		  0x19ca31bf2bb42f74UL, 0x3df7b4c84980acbbUL,
+	/*  11 */ 0xa8c6444dc80ad883UL, 0xb91e440366e3ab85UL,
+		  0xc215cda00164f6d8UL, 0x3d867c6ef247e668UL,
+	/*  12 */ 0xc7dd582bcc3e658cUL, 0xfd2c4748ee0e5528UL,
+		  0xa0fd9b95cc9f4f71UL, 0x7529d871b0675ddfUL,
+	/*  13 */ 0xb8f568b42d3cbd78UL, 0x1233011b91f3da82UL,
+		  0x2dce6ccd4a7c3b62UL, 0x75e7fc8e9e498603UL,
+	/*  14 */ 0x2f4f13f1fcd0b6ecUL, 0xf1a8ca1f29ff7a45UL,
+		  0xc249c1a72981e29bUL, 0x6ebe0dbb8c83b56aUL,
+	/*  15 */ 0x7114fa8d170bb222UL, 0x65a2dcd5bf93935fUL,
+		  0xbdc41f68b59c979aUL, 0x2f0eef79a2ce9289UL,
+	/*  16 */ 0x42ecbf0c083c37ceUL, 0x2930bc09ec496322UL,
+		  0xf294b0c19cfeac0dUL, 0x3780aa4bedfabb80UL,
+	/*  17 */ 0x56c17d3e7cead929UL, 0xe7cb4beb2e5722c5UL,
+		  0x0ce931732dbfe15aUL, 0x41b883c7621052f8UL,
+	/*  18 */ 0xdbf75ca0c3d25350UL, 0x2936be086eb1e351UL,
+		  0xc936e03cb4a9b212UL, 0x1d45bf82322225aaUL,
+	/*  19 */ 0xe81ab1036a024cc5UL, 0xe212201c304c9a72UL,
+		  0xc5d73fba6832b1fcUL, 0x20ffdb5a4d839581UL,
+	/*  20 */ 0xa283d367be5d0fadUL, 0x6c2b25ca8b164475UL,
+		  0x9d4935467caaf22eUL, 0x5166408eee85ff49UL,
+	/*  21 */ 0x3c67baa2fab4e361UL, 0xb3e433c67ef35cefUL,
+		  0x5259729241159b1cUL, 0x6a621892d5b0ab33UL,
+	/*  22 */ 0x20b74a387555cdcbUL, 0x532aa10e1208923fUL,
+		  0xeaa17b7762281dd1UL, 0x61ab3443f05c44bfUL,
+	/*  23 */ 0x257a6c422324def8UL, 0x131c6c1017e3cf7fUL,
+		  0x23758739f630a257UL, 0x295a407a01a78580UL,
+	/*  24 */ 0xf8c443246d5da8d9UL, 0x19d775450c52fa5dUL,
+		  0x2afcfc92731bf83dUL, 0x7d10c8e81b2b4700UL,
+	/*  25 */ 0xc8e0271f70baa20bUL, 0x993748867ca63957UL,
+		  0x5412efb3cb7ed4bbUL, 0x3196d36173e62975UL,
+	/*  26 */ 0xde5bcad141c7dffcUL, 0x47cc8cd2b395c848UL,
+		  0xa34cd942e11af3cbUL, 0x0256dbf2d04ecec2UL,
+	/*  27 */ 0x875ab7e94b0e667fUL, 0xcad4dd83c0850d10UL,
+		  0x47f12e8f4e72c79fUL, 0x5f1a87bb8c85b19bUL,
+	/*  28 */ 0x7ae9d0b6437f51b8UL, 0x12c7ce5518879065UL,
+		  0x2ade09fe5cf77aeeUL, 0x23a05a2f7d2c5627UL,
+	/*  29 */ 0x5908e128f17c169aUL, 0xf77498dd8ad0852dUL,
+		  0x74b4c4ceab102f64UL, 0x183abadd10139845UL,
+	/*  30 */ 0xb165ba8daa92aaacUL, 0xd5c5ef9599386705UL,
+		  0xbe2f8f0cf8fc40d1UL, 0x2701e635ee204514UL,
+	/*  31 */ 0x629fa80020156514UL, 0xf223868764a8c1ceUL,
+		  0x5b894fff0b3f060eUL, 0x60d9944cf708a3faUL,
+	/*  32 */ 0xaeea001a1c7a201fUL, 0xebf16a633ee2ce63UL,
+		  0x6f7709594c7a07e1UL, 0x79b958150d0208cbUL,
+	/*  33 */ 0x24b55e5301d410e7UL, 0xe3a34edff3fdc84dUL,
+		  0xd88768e4904032d8UL, 0x131384427b3aaeecUL,
+	/*  34 */ 0x8405e51286234f14UL, 0x14dc4739adb4c529UL,
+		  0xb8a2b5b250634ffdUL, 0x2fe2a94ad8a7ff93UL,
+	/*  35 */ 0xec5c57efe843faddUL, 0x2843ce40f0bb9918UL,
+		  0xa4b561d6cf3d6305UL, 0x743629bde8fb777eUL,
+	/*  36 */ 0x343edd46bbaf738fUL, 0xed981828b101a651UL,
+		  0xa401760b882c797aUL, 0x1fc223e28dc88730UL,
+	/*  37 */ 0x48604e91fc0fba0eUL, 0xb637f78f052c6fa4UL,
+		  0x91ccac3d09e9239cUL, 0x23f7eed4437a687cUL,
+	/*  38 */ 0x5173b1118d9bd800UL, 0x29d641b63189d4a7UL,
+		  0xfdbf177988bbc586UL, 0x2959894fcad81df5UL,
+	/*  39 */ 0xaebc8ef3b4bbc899UL, 0x4148995ab26992b9UL,
+		  0x24e20b0134f92cfbUL, 0x40d158894a05dee8UL,
+	/*  40 */ 0x46b00b1185af76f6UL, 0x26bac77873187a79UL,
+		  0x3dc0bf95ab8fff5fUL, 0x2a608bd8945524d7UL,
+	/*  41 */ 0x26449588bd446302UL, 0x7c4bc21c0388439cUL,
+		  0x8e98a4f383bd11b2UL, 0x26218d7bc9d876b9UL,
+	/*  42 */ 0xe3081542997c178aUL, 0x3c2d29a86fb6606fUL,
+		  0x5c217736fa279374UL, 0x7dde05734afeb1faUL,
+	/*  43 */ 0x3bf10e3906d42babUL, 0xe4f7803e1980649cUL,
+		  0xe6053bf89595bf7aUL, 0x394faf38da245530UL,
+	/*  44 */ 0x7a8efb58896928f4UL, 0xfbc778e9cc6a113cUL,
+		  0x72670ce330af596fUL, 0x48f222a81d3d6cf7UL,
+	/*  45 */ 0xf01fce410d72caa7UL, 0x5a20ecc7213b5595UL,
+		  0x7bc21165c1fa1483UL, 0x07f89ae31da8a741UL,
+	/*  46 */ 0x05d2c2b4c6830ff9UL, 0xd43e330fc6316293UL,
+		  0xa5a5590a96d3a904UL, 0x705edb91a65333b6UL,
+	/*  47 */ 0x048ee15e0bb9a5f7UL, 0x3240cfca9e0aaf5dUL,
+		  0x8f4b71ceedc4a40bUL, 0x621c0da3de544a6dUL,
+	/*  48 */ 0x92872836a08c4091UL, 0xce8375b010c91445UL,
+		  0x8a72eb524f276394UL, 0x2667fcfa7ec83635UL,
+	/*  49 */ 0x7f4c173345e8752aUL, 0x061b47feee7079a5UL,
+		  0x25dd9afa9f86ff34UL, 0x3780cef5425dc89cUL,
+	/*  50 */ 0x1a46035a513bb4e9UL, 0x3e1ef379ac575adaUL,
+		  0xc78c5f1c5fa24b50UL, 0x321a967634fd9f22UL,
+	/*  51 */ 0x946707b8826e27faUL, 0x3dca84d64c506fd0UL,
+		  0xc189218075e91436UL, 0x6d9284169b3b8484UL,
+	/*  52 */ 0x3a67e840383f2ddfUL, 0x33eec9a30c4f9b75UL,
+		  0x3ec7c86fa783ef47UL, 0x26ec449fbac9fbc4UL,
+	/*  53 */ 0x5c0f38cba09b9e7dUL, 0x81168cc762a3478cUL,
+		  0x3e23b0d306fc121cUL, 0x5a238aa0a5efdcddUL,
+	/*  54 */ 0x1ba26121c4ea43ffUL, 0x36f8c77f7c8832b5UL,
+		  0x88fbea0b0adcf99aUL, 0x5ca9938ec25bebf9UL,
+	/*  55 */ 0xd5436a5e51fccda0UL, 0x1dbc4797c2cd893bUL,
+		  0x19346a65d3224a08UL, 0x0f5034e49b9af466UL,
+	/*  56 */ 0xf23c3967a1e0b96eUL, 0xe58b08fa867a4d88UL,
+		  0xfb2fabc6a7341679UL, 0x2a75381eb6026946UL,
+	/*  57 */ 0xc80a3be4c19420acUL, 0x66b1f6c681f2b6dcUL,
+		  0x7cf7036761e93388UL, 0x25abbbd8a660a4c4UL,
+	/*  58 */ 0x91ea12ba14fd5198UL, 0x684950fc4a3cffa9UL,
+		  0xf826842130f5ad28UL, 0x3ea988f75301a441UL,
+	/*  59 */ 0xc978109a695f8c6fUL, 0x1746eb4a0530c3f3UL,
+		  0x444d6d77b4459995UL, 0x75952b8c054e5cc7UL,
+	/*  60 */ 0xa3703f7915f4d6aaUL, 0x66c346202f2647d8UL,
+		  0xd01469df811d644bUL, 0x77fea47d81a5d71fUL,
+	/*  61 */ 0xc5e9529ef57ca381UL, 0x6eeeb4b9ce2f881aUL,
+		  0xb6e91a28e8009bd6UL, 0x4b80be3e9afc3fecUL,
+	/*  62 */ 0x7e3773c526aed2c5UL, 0x1b4afcb453c9a49dUL,
+		  0xa920bdd7baffb24dUL, 0x7c54699f122d400eUL,
+	/*  63 */ 0xef46c8e14fa94bc8UL, 0xe0b074ce2952ed5eUL,
+		  0xbea450e1dbd885d5UL, 0x61b68649320f712cUL,
+	/*  64 */ 0x8a485f7309ccbdd1UL, 0xbd06320d7d4d1a2dUL,
+		  0x25232973322dbef4UL, 0x445dc4758c17f770UL,
+	/*  65 */ 0xdb0434177cc8933cUL, 0xed6fe82175ea059fUL,
+		  0x1efebefdc053db34UL, 0x4adbe867c65daf99UL,
+	/*  66 */ 0x3acd71a2a90609dfUL, 0xe5e991856dd04050UL,
+		  0x1ec69b688157c23cUL, 0x697427f6885cfe4dUL,
+	/*  67 */ 0xd7be7b9b65e1a851UL, 0xa03d28d522c536ddUL,
+		  0x28399d658fd2b645UL, 0x49e5b7e17c2641e1UL,
+	/*  68 */ 0x6f8c3a98700457a4UL, 0x5078f0a25ebb6778UL,
+		  0xd13c3ccbc382960fUL, 0x2e003258a7df84b1UL,
+	/*  69 */ 0x8ad1f39be6296a1cUL, 0xc1eeaa652a5fbfb2UL,
+		  0x33ee0673fd26f3cbUL, 0x59256173a69d2cccUL,
+	/*  70 */ 0x41ea07aa4e18fc41UL, 0xd9fc19527c87a51eUL,
+		  0xbdaacb805831ca6fUL, 0x445b652dc916694fUL,
+	/*  71 */ 0xce92a3a7f2172315UL, 0x1edc282de11b9964UL,
+		  0xa1823aafe04c314aUL, 0x790a2d94437cf586UL,
+	/*  72 */ 0x71c447fb93f6e009UL, 0x8922a56722845276UL,
+		  0xbf70903b204f5169UL, 0x2f7a89891ba319feUL,
+	/*  73 */ 0x02a08eb577e2140cUL, 0xed9a4ed4427bdcf4UL,
+		  0x5253ec44e4323cd1UL, 0x3e88363c14e9355bUL,
+	/*  74 */ 0xaa66c14277110b8cUL, 0x1ae0391610a23390UL,
+		  0x2030bd12c93fc2a2UL, 0x3ee141579555c7abUL,
+	/*  75 */ 0x9214de3a6d6e7d41UL, 0x3ccdd88607f17efeUL,
+		  0x674f1288f8e11217UL, 0x5682250f329f93d0UL,
+	/*  76 */ 0x6cf00b136d2e396eUL, 0x6e4cf86f1014debfUL,
+		  0x5930b1b5bfcc4e83UL, 0x047069b48aba16b6UL,
+	/*  77 */ 0x0d4ce4ab69b20793UL, 0xb24db91a97d0fb9eUL,
+		  0xcdfa50f54e00d01dUL, 0x221b1085368bddb5UL,
+	/*  78 */ 0xe7e59468b1e3d8d2UL, 0x53c56563bd122f93UL,
+		  0xeee8a903e0663f09UL, 0x61efa662cbbe3d42UL,
+	/*  79 */ 0x2cf8ddddde6eab2aUL, 0x9bf80ad51435f231UL,
+		  0x5deadacec9f04973UL, 0x29275b5d41d29b27UL,
+	/*  80 */ 0xcfde0f0895ebf14fUL, 0xb9aab96b054905a7UL,
+		  0xcae80dd9a1c420fdUL, 0x0a63bf2f1673bbc7UL,
+	/*  81 */ 0x092f6e11958fbc8cUL, 0x672a81e804822fadUL,
+		  0xcac8351560d52517UL, 0x6f3f7722c8f192f8UL,
+	/*  82 */ 0xf8ba90ccc2e894b7UL, 0x2c7557a438ff9f0dUL,
+		  0x894d1d855ae52359UL, 0x68e122157b743d69UL,
+	/*  83 */ 0xd87e5570cfb919f3UL, 0x3f2cdecd95798db9UL,
+		  0x2121154710c0a2ceUL, 0x3c66a115246dc5b2UL,
+	/*  84 */ 0xcbedc562294ecb72UL, 0xba7143c36a280b16UL,
+		  0x9610c2efd4078b67UL, 0x6144735d946a4b1eUL,
+	/*  85 */ 0x536f111ed75b3350UL, 0x0211db8c2041d81bUL,
+		  0xf93cb1000e10413cUL, 0x149dfd3c039e8876UL,
+	/*  86 */ 0xd479dde46b63155bUL, 0xb66e15e93c837976UL,
+		  0xdafde43b1f13e038UL, 0x5fafda1a2e4b0b35UL,
+	/*  87 */ 0x3600bbdf17197581UL, 0x3972050bbe3cd2c2UL,
+		  0x5938906dbdd5be86UL, 0x34fce5e43f9b860fUL,
+	/*  88 */ 0x75a8a4cd42d14d02UL, 0x828dabc53441df65UL,
+		  0x33dcabedd2e131d3UL, 0x3ebad76fb814d25fUL,
+	/*  89 */ 0xd4906f566f70e10fUL, 0x5d12f7aa51690f5aUL,
+		  0x45adb16e76cefcf2UL, 0x01f768aead232999UL,
+	/*  90 */ 0x2b6cc77b6248febdUL, 0x3cd30628ec3aaffdUL,
+		  0xce1c0b80d4ef486aUL, 0x4c3bff2ea6f66c23UL,
+	/*  91 */ 0x3f2ec4094aeaeb5fUL, 0x61b19b286e372ca7UL,
+		  0x5eefa966de2a701dUL, 0x23b20565de55e3efUL,
+	/*  92 */ 0xe301ca5279d58557UL, 0x07b2d4ce27c2874fUL,
+		  0xa532cd8a9dcf1d67UL, 0x2a52fee23f2bff56UL,
+	/*  93 */ 0x8624efb37cd8663dUL, 0xbbc7ac20ffbd7594UL,
+		  0x57b85e9c82d37445UL, 0x7b3052cb86a6ec66UL,
+	/*  94 */ 0x3482f0ad2525e91eUL, 0x2cb68043d28edca0UL,
+		  0xaf4f6d052e1b003aUL, 0x185f8c2529781b0aUL,
+	/*  95 */ 0xaa41de5bd80ce0d6UL, 0x9407b2416853e9d6UL,
+		  0x563ec36e357f4c3aUL, 0x4cc4b8dd0e297bceUL,
+	/*  96 */ 0xa2fc1a52ffb8730eUL, 0x1811f16e67058e37UL,
+		  0x10f9a366cddf4ee1UL, 0x72f4a0c4a0b9f099UL,
+	/*  97 */ 0x8c16c06f663f4ea7UL, 0x693b3af74e970fbaUL,
+		  0x2102e7f1d69ec345UL, 0x0ba53cbc968a8089UL,
+	/*  98 */ 0xca3d9dc7fea15537UL, 0x4c6824bb51536493UL,
+		  0xb9886314844006b1UL, 0x40d2a72ab454cc60UL,
+	/*  99 */ 0x5936a1b712570975UL, 0x91b9d648debda657UL,
+		  0x3344094bb64330eaUL, 0x006ba10d12ee51d0UL,
+	/* 100 */ 0x19228468f5de5d58UL, 0x0eb12f4c38cc05b0UL,
+		  0xa1039f9dd5601990UL, 0x4502d4ce4fff0e0bUL,
+	/* 101 */ 0xeb2054106837c189UL, 0xd0f6544c6dd3b93cUL,
+		  0x40727064c416d74fUL, 0x6e15c6114b502ef0UL,
+	/* 102 */ 0x4df2a398cfb1a76bUL, 0x11256c7419f2f6b1UL,
+		  0x4a497962066e6043UL, 0x705b3aab41355b44UL,
+	/* 103 */ 0x365ef536d797b1d8UL, 0x00076bd622ddf0dbUL,
+		  0x3bbf33b0e0575a88UL, 0x3777aa05c8e4ca4dUL,
+	/* 104 */ 0x392745c85578db5fUL, 0x6fda4149dbae5ae2UL,
+		  0xb1f0b00b8adc9867UL, 0x09963437d36f1da3UL,
+	/* 105 */ 0x7e824e90a5dc3853UL, 0xccb5f6641f135cbdUL,
+		  0x6736d86c87ce8fccUL, 0x625f3ce26604249fUL,
+	/* 106 */ 0xaf8ac8059502f63fUL, 0x0c05e70a2e351469UL,
+		  0x35292e9c764b6305UL, 0x1a394360c7e23ac3UL,
+	/* 107 */ 0xd5c6d53251183264UL, 0x62065abd43c2b74fUL,
+		  0xb5fbf5d03b973f9bUL, 0x13a3da3661206e5eUL,
+	/* 108 */ 0xc6bd5837725d94e5UL, 0x18e30912205016c5UL,
+		  0x2088ce1570033c68UL, 0x7fba1f495c837987UL,
+	/* 109 */ 0x5a8c7423f2f9079dUL, 0x1735157b34023fc5UL,
+		  0xe4f9b49ad2fab351UL, 0x6691ff72c878e33cUL,
+	/* 110 */ 0x122c2adedc5eff3eUL, 0xf8dd4bf1d8956cf4UL,
+		  0xeb86205d9e9e5bdaUL, 0x049b92b9d975c743UL,
+	/* 111 */ 0xa5379730b0f6c05aUL, 0x72a0ffacc6f3a553UL,
+		  0xb0032c34b20dcd6dUL, 0x470e9dbc88d5164aUL,
+	/* 112 */ 0xb19cf10ca237c047UL, 0xb65466711f6c81a2UL,
+		  0xb3321bd16dd80b43UL, 0x48c14f600c5fbe8eUL,
+	/* 113 */ 0x66451c264aa6c803UL, 0xb66e3904a4fa7da6UL,
+		  0xd45f19b0b3128395UL, 0x31602627c3c9bc10UL,
+	/* 114 */ 0x3120dc4832e4e10dUL, 0xeb20c46756c717f7UL,
+		  0x00f52e3f67280294UL, 0x566d4fc14730c509UL,
+	/* 115 */ 0x7e3a5d40fd837206UL, 0xc1e926dc7159547aUL,
+		  0x216730fba68d6095UL, 0x22e8c3843f69cea7UL,
+	/* 116 */ 0x33d074e8930e4b2bUL, 0xb6e4350e84d15816UL,
+		  0x5534c26ad6ba2365UL, 0x7773c12f89f1f3f3UL,
+	/* 117 */ 0x8cba404da57962aaUL, 0x5b9897a81999ce56UL,
+		  0x508e862f121692fcUL, 0x3a81907fa093c291UL,
+	/* 118 */ 0x0dded0ff4725a510UL, 0x10d8cc10673fc503UL,
+		  0x5b9d151c9f1f4e89UL, 0x32a5c1d5cb09a44cUL,
+	/* 119 */ 0x1e0aa442b90541fbUL, 0x5f85eb7cc1b485dbUL,
+		  0xbee595ce8a9df2e5UL, 0x25e496c722422236UL,
+	/* 120 */ 0x5edf3c46cd0fe5b9UL, 0x34e75a7ed2a43388UL,
+		  0xe488de11d761e352UL, 0x0e878a01a085545cUL,
+	/* 121 */ 0xba493c77e021bb04UL, 0x2b4d1843c7df899aUL,
+		  0x9ea37a487ae80d67UL, 0x67a9958011e41794UL,
+	/* 122 */ 0x4b58051a6697b065UL, 0x47e33f7d8d6ba6d4UL,
+		  0xbb4da8d483ca46c1UL, 0x68becaa181c2db0dUL,
+	/* 123 */ 0x8d8980e90b989aa5UL, 0xf95eb14a2c93c99bUL,
+		  0x51c6c7c4796e73a2UL, 0x6e228363b5efb569UL,
+	/* 124 */ 0xc6bbc0b02dd624c8UL, 0x777eb47dec8170eeUL,
+		  0x3cde15a004cfafa9UL, 0x1dc6bc087160bf9bUL,
+	/* 125 */ 0x2e07e043eec34002UL, 0x18e9fc677a68dc7fUL,
+		  0xd8da03188bd15b9aUL, 0x48fbc3bb00568253UL,
+	/* 126 */ 0x57547d4cfb654ce1UL, 0xd3565b82a058e2adUL,
+		  0xf63eaf0bbf154478UL, 0x47531ef114dfbb18UL,
+	/* 127 */ 0xe1ec630a4278c587UL, 0x5507d546ca8e83f3UL,
+		  0x85e135c63adc0c2bUL, 0x0aa7efa85682844eUL,
+	/* 128 */ 0x72691ba8b3e1f615UL, 0x32b4e9701fbe3ffaUL,
+		  0x97b6d92e39bb7868UL, 0x2cfe53dea02e39e8UL,
+	/* 129 */ 0x687392cd85cd52b0UL, 0x27ff66c910e29831UL,
+		  0x97134556a9832d06UL, 0x269bb0360a84f8a0UL,
+	/* 130 */ 0x706e55457643f85cUL, 0x3734a48c9b597d1bUL,
+		  0x7aee91e8c6efa472UL, 0x5cd6abc198a9d9e0UL,
+	/* 131 */ 0x0e04de06cb3ce41aUL, 0xd8c6eb893402e138UL,
+		  0x904659bb686e3772UL, 0x7215c371746ba8c8UL,
+	/* 132 */ 0xfd12a97eeae4a2d9UL, 0x9514b7516394f2c5UL,
+		  0x266fd5809208f294UL, 0x5c847085619a26b9UL,
+	/* 133 */ 0x52985410fed694eaUL, 0x3c905b934a2ed254UL,
+		  0x10bb47692d3be467UL, 0x063b3d2d69e5e9e1UL,
+	/* 134 */ 0x472726eedda57debUL, 0xefb6c4ae10f41891UL,
+		  0x2b1641917b307614UL, 0x117c554fc4f45b7cUL,
+	/* 135 */ 0xc07cf3118f9d8812UL, 0x01dbd82050017939UL,
+		  0xd7e803f4171b2827UL, 0x1015e87487d225eaUL,
+	/* 136 */ 0xc58de3fed23acc4dUL, 0x50db91c294a7be2dUL,
+		  0x0b94d43d1c9cf457UL, 0x6b1640fa6e37524aUL,
+	/* 137 */ 0x692f346c5fda0d09UL, 0x200b1c59fa4d3151UL,
+		  0xb8c46f760777a296UL, 0x4b38395f3ffdfbcfUL,
+	/* 138 */ 0x18d25e00be54d671UL, 0x60d50582bec8aba6UL,
+		  0x87ad8f263b78b982UL, 0x50fdf64e9cda0432UL,
+	/* 139 */ 0x90f567aac578dcf0UL, 0xef1e9b0ef2a3133bUL,
+		  0x0eebba9242d9de71UL, 0x15473c9bf03101c7UL,
+	/* 140 */ 0x7c77e8ae56b78095UL, 0xb678e7666e6f078eUL,
+		  0x2da0b9615348ba1fUL, 0x7cf931c1ff733f0bUL,
+	/* 141 */ 0x26b357f50a0a366cUL, 0xe9708cf42b87d732UL,
+		  0xc13aeea5f91cb2c0UL, 0x35d90c991143bb4cUL,
+	/* 142 */ 0x47c1c404a9a0d9dcUL, 0x659e58451972d251UL,
+		  0x3875a8c473b38c31UL, 0x1fbd9ed379561f24UL,
+	/* 143 */ 0x11fabc6fd41ec28dUL, 0x7ef8dfe3cd2a2dcaUL,
+		  0x72e73b5d8c404595UL, 0x6135fa4954b72f27UL,
+	/* 144 */ 0xccfc32a2de24b69cUL, 0x3f55698c1f095d88UL,
+		  0xbe3350ed5ac3f929UL, 0x5e9bf806ca477eebUL,
+	/* 145 */ 0xe9ce8fb63c309f68UL, 0x5376f63565e1f9f4UL,
+		  0xd1afcfb35a6393f1UL, 0x6632a1ede5623506UL,
+	/* 146 */ 0x0b7d6c390c2ded4cUL, 0x56cb3281df04cb1fUL,
+		  0x66305a1249ecc3c7UL, 0x5d588b60a38ca72aUL,
+	/* 147 */ 0xa6ecbf78e8e5f42dUL, 0x86eeb44b3c8a3eecUL,
+		  0xec219c48fbd21604UL, 0x1aaf1af517c36731UL,
+	/* 148 */ 0xc306a2836769bde7UL, 0x208280622b1e2adbUL,
+		  0x8027f51ffbff94a6UL, 0x76cfa1ce1124f26bUL,
+	/* 149 */ 0x18eb00562422abb6UL, 0xf377c4d58f8c29c3UL,
+		  0x4dbbc207f531561aUL, 0x0253b7f082128a27UL,
+	/* 150 */ 0x3d1f091cb62c17e0UL, 0x4860e1abd64628a9UL,
+		  0x52d17436309d4253UL, 0x356f97e13efae576UL,
+	/* 151 */ 0xd351e11aa150535bUL, 0x3e6b45bb1dd878ccUL,
+		  0x0c776128bed92c98UL, 0x1d34ae93032885b8UL,
+	/* 152 */ 0x4ba0488ca85ba4c3UL, 0x985348c33c9ce6ceUL,
+		  0x66124c6f97bda770UL, 0x0f81a0290654124aUL,
+	/* 153 */ 0x9ed09ca6569b86fdUL, 0x811009fd18af9a2dUL,
+		  0xff08d03f93d8c20aUL, 0x52a148199faef26bUL,
+	/* 154 */ 0x3e03f9dc2d8d1b73UL, 0x4205801873961a70UL,
+		  0xc0d987f041a35970UL, 0x07aa1f15a1c0d549UL,
+	/* 155 */ 0xdfd46ce08cd27224UL, 0x6d0a024f934e4239UL,
+		  0x808a7a6399897b59UL, 0x0a4556e9e13d95a2UL,
+	/* 156 */ 0xd21a991fe9c13045UL, 0x9b0e8548fe7751b8UL,
+		  0x5da643cb4bf30035UL, 0x77db28d63940f721UL,
+	/* 157 */ 0xfc5eeb614adc9011UL, 0x5229419ae8c411ebUL,
+		  0x9ec3e7787d1dcf74UL, 0x340d053e216e4cb5UL,
+	/* 158 */ 0xcac7af39b48df2b4UL, 0xc0faec2871a10a94UL,
+		  0x140a69245ca575edUL, 0x0cf1c37134273a4cUL,
+	/* 159 */ 0xc8ee306ac224b8a5UL, 0x57eaee7ccb4930b0UL,
+		  0xa1e806bdaacbe74fUL, 0x7d9a62742eeb657dUL,
+	/* 160 */ 0x9eb6b6ef546c4830UL, 0x885cca1fddb36e2eUL,
+		  0xe6b9f383ef0d7105UL, 0x58654fef9d2e0412UL,
+	/* 161 */ 0xa905c4ffbe0e8e26UL, 0x942de5df9b31816eUL,
+		  0x497d723f802e88e1UL, 0x30684dea602f408dUL,
+	/* 162 */ 0x21e5a278a3e6cb34UL, 0xaefb6e6f5b151dc4UL,
+		  0xb30b8e049d77ca15UL, 0x28c3c9cf53b98981UL,
+	/* 163 */ 0x287fb721556cdd2aUL, 0x0d317ca897022274UL,
+		  0x7468c7423a543258UL, 0x4a7f11464eb5642fUL,
+	/* 164 */ 0xa237a4774d193aa6UL, 0xd865986ea92129a1UL,
+		  0x24c515ecf87c1a88UL, 0x604003575f39f5ebUL,
+	/* 165 */ 0x47b9f189570a9b27UL, 0x2b98cede465e4b78UL,
+		  0x026df551dbb85c20UL, 0x74fcd91047e21901UL,
+	/* 166 */ 0x13e2a90a23c1bfa3UL, 0x0cb0074e478519f6UL,
+		  0x5ff1cbbe3af6cf44UL, 0x67fe5438be812dbeUL,
+	/* 167 */ 0xd13cf64fa40f05b0UL, 0x054dfb2f32283787UL,
+		  0x4173915b7f0d2aeaUL, 0x482f144f1f610d4eUL,
+	/* 168 */ 0xf6210201b47f8234UL, 0x5d0ae1929e70b990UL,
+		  0xdcd7f455b049567cUL, 0x7e93d0f1f0916f01UL,
+	/* 169 */ 0xdd79cbf18a7db4faUL, 0xbe8391bf6f74c62fUL,
+		  0x027145d14b8291bdUL, 0x585a73ea2cbf1705UL,
+	/* 170 */ 0x485ca03e928a0db2UL, 0x10fc01a5742857e7UL,
+		  0x2f482edbd6d551a7UL, 0x0f0433b5048fdb8aUL,
+	/* 171 */ 0x60da2e8dd7dc6247UL, 0x88b4c9d38cd4819aUL,
+		  0x13033ac001f66697UL, 0x273b24fe3b367d75UL,
+	/* 172 */ 0xc6e8f66a31b3b9d4UL, 0x281514a494df49d5UL,
+		  0xd1726fdfc8b23da7UL, 0x4b3ae7d103dee548UL,
+	/* 173 */ 0xc6256e19ce4b9d7eUL, 0xff5c5cf186e3c61cUL,
+		  0xacc63ca34b8ec145UL, 0x74621888fee66574UL,
+	/* 174 */ 0x956f409645290a1eUL, 0xef0bf8e3263a962eUL,
+		  0xed6a50eb5ec2647bUL, 0x0694283a9dca7502UL,
+	/* 175 */ 0x769b963643a2dcd1UL, 0x42b7c8ea09fc5353UL,
+		  0x4f002aee13397eabUL, 0x63005e2c19b7d63aUL,
+	/* 176 */ 0xca6736da63023beaUL, 0x966c7f6db12a99b7UL,
+		  0xace09390c537c5e1UL, 0x0b696063a1aa89eeUL,
+	/* 177 */ 0xebb03e97288c56e5UL, 0x432a9f9f938c8be8UL,
+		  0xa6a5a93d5b717f71UL, 0x1a5fb4c3e18f9d97UL,
+	/* 178 */ 0x1c94e7ad1c60cdceUL, 0xee202a43fc02c4a0UL,
+		  0x8dafe4d867c46a20UL, 0x0a10263c8ac27b58UL,
+	/* 179 */ 0xd0dea9dfe4432a4aUL, 0x856af87bbe9277c5UL,
+		  0xce8472acc212c71aUL, 0x6f151b6d9bbb1e91UL,
+	/* 180 */ 0x26776c527ceed56aUL, 0x7d211cb7fbf8faecUL,
+		  0x37ae66a6fd4609ccUL, 0x1f81b702d2770c42UL,
+	/* 181 */ 0x2fb0b057eac58392UL, 0xe1dd89fe29744e9dUL,
+		  0xc964f8eb17beb4f8UL, 0x29571073c9a2d41eUL,
+	/* 182 */ 0xa948a18981c0e254UL, 0x2df6369b65b22830UL,
+		  0xa33eb2d75fcfd3c6UL, 0x078cd6ec4199a01fUL,
+	/* 183 */ 0x4a584a41ad900d2fUL, 0x32142b78e2c74c52UL,
+		  0x68c4e8338431c978UL, 0x7f69ea9008689fc2UL,
+	/* 184 */ 0x52f2c81e46a38265UL, 0xfd78072d04a832fdUL,
+		  0x8cd7d5fa25359e94UL, 0x4de71b7454cc29d2UL,
+	/* 185 */ 0x42eb60ad1eda6ac9UL, 0x0aad37dfdbc09c3aUL,
+		  0x81004b71e33cc191UL, 0x44e6be345122803cUL,
+	/* 186 */ 0x03fe8388ba1920dbUL, 0xf5d57c32150db008UL,
+		  0x49c8c4281af60c29UL, 0x21edb518de701aeeUL,
+	/* 187 */ 0x7fb63e418f06dc99UL, 0xa4460d99c166d7b8UL,
+		  0x24dd5248ce520a83UL, 0x5ec3ad712b928358UL,
+	/* 188 */ 0x15022a5fbd17930fUL, 0xa4f64a77d82570e3UL,
+		  0x12bc8d6915783712UL, 0x498194c0fc620abbUL,
+	/* 189 */ 0x38a2d9d255686c82UL, 0x785c6bd9193e21f0UL,
+		  0xe4d5c81ab24a5484UL, 0x56307860b2e20989UL,
+	/* 190 */ 0x429d55f78b4d74c4UL, 0x22f1834643350131UL,
+		  0x1e60c24598c71fffUL, 0x59f2f014979983efUL,
+	/* 191 */ 0x46a47d56eb494a44UL, 0x3e22a854d636a18eUL,
+		  0xb346e15274491c3bUL, 0x2ceafd4e5390cde7UL,
+	/* 192 */ 0xba8a8538be0d6675UL, 0x4b9074bb50818e23UL,
+		  0xcbdab89085d304c3UL, 0x61a24fe0e56192c4UL,
+	/* 193 */ 0xcb7615e6db525bcbUL, 0xdd7d8c35a567e4caUL,
+		  0xe6b4153acafcdd69UL, 0x2d668e097f3c9766UL,
+	/* 194 */ 0xa57e7e265ce55ef0UL, 0x5d9f4e527cd4b967UL,
+		  0xfbc83606492fd1e5UL, 0x090d52beb7c3f7aeUL,
+	/* 195 */ 0x09b9515a1e7b4d7cUL, 0x1f266a2599da44c0UL,
+		  0xa1c49548e2c55504UL, 0x7ef04287126f15ccUL,
+	/* 196 */ 0xfed1659dbd30ef15UL, 0x8b4ab9eec4e0277bUL,
+		  0x884d6236a5df3291UL, 0x1fd96ea6bf5cf788UL,
+	/* 197 */ 0x42a161981f190d9aUL, 0x61d849507e6052c1UL,
+		  0x9fe113bf285a2cd5UL, 0x7c22d676dbad85d8UL,
+	/* 198 */ 0x82e770ed2bfbd27dUL, 0x4c05b2ece996f5a5UL,
+		  0xcd40a9c2b0900150UL, 0x5895319213d9bf64UL,
+	/* 199 */ 0xe7cc5d703fea2e08UL, 0xb50c491258e2188cUL,
+		  0xcce30baa48205bf0UL, 0x537c659ccfa32d62UL,
+	/* 200 */ 0x37b6623a98cfc088UL, 0xfe9bed1fa4d6aca4UL,
+		  0x04d29b8e56a8d1b0UL, 0x725f71c40b519575UL,
+	/* 201 */ 0x28c7f89cd0339ce6UL, 0x8367b14469ddc18bUL,
+		  0x883ada83a6a1652cUL, 0x585f1974034d6c17UL,
+	/* 202 */ 0x89cfb266f1b19188UL, 0xe63b4863e7c35217UL,
+		  0xd88c9da6b4c0526aUL, 0x3e035c9df0954635UL,
+	/* 203 */ 0xdd9d5412fb45de9dUL, 0xdd684532e4cff40dUL,
+		  0x4b5c999b151d671cUL, 0x2d8c2cc811e7f690UL,
+	/* 204 */ 0x7f54be1d90055d40UL, 0xa464c5df464aaf40UL,
+		  0x33979624f0e917beUL, 0x2c018dc527356b30UL,
+	/* 205 */ 0xa5415024e330b3d4UL, 0x73ff3d96691652d3UL,
+		  0x94ec42c4ef9b59f1UL, 0x0747201618d08e5aUL,
+	/* 206 */ 0x4d6ca48aca411c53UL, 0x66415f2fcfa66119UL,
+		  0x9c4dd40051e227ffUL, 0x59810bc09a02f7ebUL,
+	/* 207 */ 0x2a7eb171b3dc101dUL, 0x441c5ab99ffef68eUL,
+		  0x32025c9b93b359eaUL, 0x5e8ce0a71e9d112fUL,
+	/* 208 */ 0xbfcccb92429503fdUL, 0xd271ba752f095d55UL,
+		  0x345ead5e972d091eUL, 0x18c8df11a83103baUL,
+	/* 209 */ 0x90cd949a9aed0f4cUL, 0xc5d1f4cb6660e37eUL,
+		  0xb8cac52d56c52e0bUL, 0x6e42e400c5808e0dUL,
+	/* 210 */ 0xa3b46966eeaefd23UL, 0x0c4f1f0be39ecdcaUL,
+		  0x189dc8c9d683a51dUL, 0x51f27f054c09351bUL,
+	/* 211 */ 0x4c487ccd2a320682UL, 0x587ea95bb3df1c96UL,
+		  0xc8ccf79e555cb8e8UL, 0x547dc829a206d73dUL,
+	/* 212 */ 0xb822a6cd80c39b06UL, 0xe96d54732000d4c6UL,
+		  0x28535b6f91463b4dUL, 0x228f4660e2486e1dUL,
+	/* 213 */ 0x98799538de8d3abfUL, 0x8cd8330045ebca6eUL,
+		  0x79952a008221e738UL, 0x4322e1a7535cd2bbUL,
+	/* 214 */ 0xb114c11819d1801cUL, 0x2016e4d84f3f5ec7UL,
+		  0xdd0e2df409260f4cUL, 0x5ec362c0ae5f7266UL,
+	/* 215 */ 0xc0462b18b8b2b4eeUL, 0x7cc8d950274d1afbUL,
+		  0xf25f7105436b02d2UL, 0x43bbf8dcbff9ccd3UL,
+	/* 216 */ 0xb6ad1767a039e9dfUL, 0xb0714da8f69d3583UL,
+		  0x5e55fa18b42931f5UL, 0x4ed5558f33c60961UL,
+	/* 217 */ 0x1fe37901c647a5ddUL, 0x593ddf1f8081d357UL,
+		  0x0249a4fd813fd7a6UL, 0x69acca274e9caf61UL,
+	/* 218 */ 0x047ba3ea330721c9UL, 0x83423fc20e7e1ea0UL,
+		  0x1df4c0af01314a60UL, 0x09a62dab89289527UL,
+	/* 219 */ 0xa5b325a49cc6cb00UL, 0xe94b5dc654b56cb6UL,
+		  0x3be28779adc994a0UL, 0x4296e8f8ba3a4aadUL,
+	/* 220 */ 0x328689761e451eabUL, 0x2e4d598bff59594aUL,
+		  0x49b96853d7a7084aUL, 0x4980a319601420a8UL,
+	/* 221 */ 0x9565b9e12f552c42UL, 0x8a5318db7100fe96UL,
+		  0x05c90b4d43add0d7UL, 0x538b4cd66a5d4edaUL,
+	/* 222 */ 0xf4e94fc3e89f039fUL, 0x592c9af26f618045UL,
+		  0x08a36eb5fd4b9550UL, 0x25fffaf6c2ed1419UL,
+	/* 223 */ 0x34434459cc79d354UL, 0xeeecbfb4b1d5476bUL,
+		  0xddeb34a061615d99UL, 0x5129cecceb64b773UL,
+	/* 224 */ 0xee43215894993520UL, 0x772f9c7cf14c0b3bUL,
+		  0xd2e2fce306bedad5UL, 0x715f42b546f06a97UL,
+	/* 225 */ 0x434ecdceda5b5f1aUL, 0x0da17115a49741a9UL,
+		  0x680bd77c73edad2eUL, 0x487c02354edd9041UL,
+	/* 226 */ 0xb8efeff3a70ed9c4UL, 0x56a32aa3e857e302UL,
+		  0xdf3a68bd48a2a5a0UL, 0x07f650b73176c444UL,
+	/* 227 */ 0xe38b9b1626e0ccb1UL, 0x79e053c18b09fb36UL,
+		  0x56d90319c9f94964UL, 0x1ca941e7ac9ff5c4UL,
+	/* 228 */ 0x49c4df29162fa0bbUL, 0x8488cf3282b33305UL,
+		  0x95dfda14cabb437dUL, 0x3391f78264d5ad86UL,
+	/* 229 */ 0x729ae06ae2b5095dUL, 0xd58a58d73259a946UL,
+		  0xe9834262d13921edUL, 0x27fedafaa54bb592UL,
+	/* 230 */ 0xa99dc5b829ad48bbUL, 0x5f025742499ee260UL,
+		  0x802c8ecd5d7513fdUL, 0x78ceb3ef3f6dd938UL,
+	/* 231 */ 0xc342f44f8a135d94UL, 0x7b9edb44828cdda3UL,
+		  0x9436d11a0537cfe7UL, 0x5064b164ec1ab4c8UL,
+	/* 232 */ 0x7020eccfd37eb2fcUL, 0x1f31ea3ed90d25fcUL,
+		  0x1b930d7bdfa1bb34UL, 0x5344467a48113044UL,
+	/* 233 */ 0x70073170f25e6dfbUL, 0xe385dc1a50114cc8UL,
+		  0x2348698ac8fc4f00UL, 0x2a77a55284dd40d8UL,
+	/* 234 */ 0xfe06afe0c98c6ce4UL, 0xc235df96dddfd6e4UL,
+		  0x1428d01e33bf1ed3UL, 0x785768ec9300bdafUL,
+	/* 235 */ 0x9702e57a91deb63bUL, 0x61bdb8bfe5ce8b80UL,
+		  0x645b426f3d1d58acUL, 0x4804a82227a557bcUL,
+	/* 236 */ 0x8e57048ab44d2601UL, 0x68d6501a4b3a6935UL,
+		  0xc39c9ec3f9e1c293UL, 0x4172f257d4de63e2UL,
+	/* 237 */ 0xd368b450330c6401UL, 0x040d3017418f2391UL,
+		  0x2c34bb6090b7d90dUL, 0x16f649228fdfd51fUL,
+	/* 238 */ 0xbea6818e2b928ef5UL, 0xe28ccf91cdc11e72UL,
+		  0x594aaa68e77a36cdUL, 0x313034806c7ffd0fUL,
+	/* 239 */ 0x8a9d27ac2249bd65UL, 0x19a3b464018e9512UL,
+		  0xc26ccff352b37ec7UL, 0x056f68341d797b21UL,
+	/* 240 */ 0x5e79d6757efd2327UL, 0xfabdbcb6553afe15UL,
+		  0xd3e7222c6eaf5a60UL, 0x7046c76d4dae743bUL,
+	/* 241 */ 0x660be872b18d4a55UL, 0x19992518574e1496UL,
+		  0xc103053a302bdcbbUL, 0x3ed8e9800b218e8eUL,
+	/* 242 */ 0x7b0b9239fa75e03eUL, 0xefe9fb684633c083UL,
+		  0x98a35fbe391a7793UL, 0x6065510fe2d0fe34UL,
+	/* 243 */ 0x55cb668548abad0cUL, 0xb4584548da87e527UL,
+		  0x2c43ecea0107c1ddUL, 0x526028809372de35UL,
+	/* 244 */ 0x3415c56af9213b1fUL, 0x5bee1a4d017e98dbUL,
+		  0x13f6b105b5cf709bUL, 0x5ff20e3482b29ab6UL,
+	/* 245 */ 0x0aa29c75cc2e6c90UL, 0xfc7d73ca3a70e206UL,
+		  0x899fc38fc4b5c515UL, 0x250386b124ffc207UL,
+	/* 246 */ 0x54ea28d5ae3d2b56UL, 0x9913149dd6de60ceUL,
+		  0x16694fc58f06d6c1UL, 0x46b23975eb018fc7UL,
+	/* 247 */ 0x470a6a0fb4b7b4e2UL, 0x5d92475a8f7253deUL,
+		  0xabeee5b52fbd3adbUL, 0x7fa20801a0806968UL,
+	/* 248 */ 0x76f3faf19f7714d2UL, 0xb3e840c12f4660c3UL,
+		  0x0fb4cd8df212744eUL, 0x4b065a251d3a2dd2UL,
+	/* 249 */ 0x5cebde383d77cd4aUL, 0x6adf39df882c9cb1UL,
+		  0xa2dd242eb09af759UL, 0x3147c0e50e5f6422UL,
+	/* 250 */ 0x164ca5101d1350dbUL, 0xf8d13479c33fc962UL,
+		  0xe640ce4d13e5da08UL, 0x4bdee0c45061f8baUL,
+	/* 251 */ 0xd7c46dc1a4edb1c9UL, 0x5514d7b6437fd98aUL,
+		  0x58942f6bb2a1c00bUL, 0x2dffb2ab1d70710eUL,
+	/* 252 */ 0xccdfcf2fc18b6d68UL, 0xa8ebcba8b7806167UL,
+		  0x980697f95e2937e3UL, 0x02fbba1cd0126e8cUL
+};
+
+/* c is two 512-bit products: c0[0:7]=a0[0:3]*b0[0:3] and c1[8:15]=a1[4:7]*b1[4:7]
+ * a is two 256-bit integers: a0[0:3] and a1[4:7]
+ * b is two 256-bit integers: b0[0:3] and b1[4:7]
+ */
+static void mul2_256x256_integer_adx(u64 *const c, const u64 *const a,
+				     const u64 *const b)
+{
+	asm volatile(
+		"xorl %%r14d, %%r14d ;"
+		"movq   (%1), %%rdx; "	/* A[0] */
+		"mulx   (%2),  %%r8, %%r15; " /* A[0]*B[0] */
+		"xorl %%r10d, %%r10d ;"
+		"movq %%r8, (%0) ;"
+		"mulx  8(%2), %%r10, %%rax; " /* A[0]*B[1] */
+		"adox %%r10, %%r15 ;"
+		"mulx 16(%2),  %%r8, %%rbx; " /* A[0]*B[2] */
+		"adox  %%r8, %%rax ;"
+		"mulx 24(%2), %%r10, %%rcx; " /* A[0]*B[3] */
+		"adox %%r10, %%rbx ;"
+		/******************************************/
+		"adox %%r14, %%rcx ;"
+
+		"movq  8(%1), %%rdx; "	/* A[1] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[1]*B[0] */
+		"adox %%r15,  %%r8 ;"
+		"movq  %%r8, 8(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[1]*B[1] */
+		"adox %%r10,  %%r9 ;"
+		"adcx  %%r9, %%rax ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[1]*B[2] */
+		"adox  %%r8, %%r11 ;"
+		"adcx %%r11, %%rbx ;"
+		"mulx 24(%2), %%r10, %%r15; " /* A[1]*B[3] */
+		"adox %%r10, %%r13 ;"
+		"adcx %%r13, %%rcx ;"
+		/******************************************/
+		"adox %%r14, %%r15 ;"
+		"adcx %%r14, %%r15 ;"
+
+		"movq 16(%1), %%rdx; " /* A[2] */
+		"xorl %%r10d, %%r10d ;"
+		"mulx   (%2),  %%r8,  %%r9; " /* A[2]*B[0] */
+		"adox %%rax,  %%r8 ;"
+		"movq %%r8, 16(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[2]*B[1] */
+		"adox %%r10,  %%r9 ;"
+		"adcx  %%r9, %%rbx ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[2]*B[2] */
+		"adox  %%r8, %%r11 ;"
+		"adcx %%r11, %%rcx ;"
+		"mulx 24(%2), %%r10, %%rax; " /* A[2]*B[3] */
+		"adox %%r10, %%r13 ;"
+		"adcx %%r13, %%r15 ;"
+		/******************************************/
+		"adox %%r14, %%rax ;"
+		"adcx %%r14, %%rax ;"
+
+		"movq 24(%1), %%rdx; " /* A[3] */
+		"xorl %%r10d, %%r10d ;"
+		"mulx   (%2),  %%r8,  %%r9; " /* A[3]*B[0] */
+		"adox %%rbx,  %%r8 ;"
+		"movq %%r8, 24(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[3]*B[1] */
+		"adox %%r10,  %%r9 ;"
+		"adcx  %%r9, %%rcx ;"
+		"movq %%rcx, 32(%0) ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[3]*B[2] */
+		"adox  %%r8, %%r11 ;"
+		"adcx %%r11, %%r15 ;"
+		"movq %%r15, 40(%0) ;"
+		"mulx 24(%2), %%r10, %%rbx; " /* A[3]*B[3] */
+		"adox %%r10, %%r13 ;"
+		"adcx %%r13, %%rax ;"
+		"movq %%rax, 48(%0) ;"
+		/******************************************/
+		"adox %%r14, %%rbx ;"
+		"adcx %%r14, %%rbx ;"
+		"movq %%rbx, 56(%0) ;"
+
+		"movq 32(%1), %%rdx; "	/* C[0] */
+		"mulx 32(%2),  %%r8, %%r15; " /* C[0]*D[0] */
+		"xorl %%r10d, %%r10d ;"
+		"movq %%r8, 64(%0);"
+		"mulx 40(%2), %%r10, %%rax; " /* C[0]*D[1] */
+		"adox %%r10, %%r15 ;"
+		"mulx 48(%2),  %%r8, %%rbx; " /* C[0]*D[2] */
+		"adox  %%r8, %%rax ;"
+		"mulx 56(%2), %%r10, %%rcx; " /* C[0]*D[3] */
+		"adox %%r10, %%rbx ;"
+		/******************************************/
+		"adox %%r14, %%rcx ;"
+
+		"movq 40(%1), %%rdx; " /* C[1] */
+		"xorl %%r10d, %%r10d ;"
+		"mulx 32(%2),  %%r8,  %%r9; " /* C[1]*D[0] */
+		"adox %%r15,  %%r8 ;"
+		"movq  %%r8, 72(%0);"
+		"mulx 40(%2), %%r10, %%r11; " /* C[1]*D[1] */
+		"adox %%r10,  %%r9 ;"
+		"adcx  %%r9, %%rax ;"
+		"mulx 48(%2),  %%r8, %%r13; " /* C[1]*D[2] */
+		"adox  %%r8, %%r11 ;"
+		"adcx %%r11, %%rbx ;"
+		"mulx 56(%2), %%r10, %%r15; " /* C[1]*D[3] */
+		"adox %%r10, %%r13 ;"
+		"adcx %%r13, %%rcx ;"
+		/******************************************/
+		"adox %%r14, %%r15 ;"
+		"adcx %%r14, %%r15 ;"
+
+		"movq 48(%1), %%rdx; " /* C[2] */
+		"xorl %%r10d, %%r10d ;"
+		"mulx 32(%2),  %%r8,  %%r9; " /* C[2]*D[0] */
+		"adox %%rax,  %%r8 ;"
+		"movq  %%r8, 80(%0);"
+		"mulx 40(%2), %%r10, %%r11; " /* C[2]*D[1] */
+		"adox %%r10,  %%r9 ;"
+		"adcx  %%r9, %%rbx ;"
+		"mulx 48(%2),  %%r8, %%r13; " /* C[2]*D[2] */
+		"adox  %%r8, %%r11 ;"
+		"adcx %%r11, %%rcx ;"
+		"mulx 56(%2), %%r10, %%rax; " /* C[2]*D[3] */
+		"adox %%r10, %%r13 ;"
+		"adcx %%r13, %%r15 ;"
+		/******************************************/
+		"adox %%r14, %%rax ;"
+		"adcx %%r14, %%rax ;"
+
+		"movq 56(%1), %%rdx; " /* C[3] */
+		"xorl %%r10d, %%r10d ;"
+		"mulx 32(%2),  %%r8,  %%r9; " /* C[3]*D[0] */
+		"adox %%rbx,  %%r8 ;"
+		"movq  %%r8, 88(%0);"
+		"mulx 40(%2), %%r10, %%r11; " /* C[3]*D[1] */
+		"adox %%r10,  %%r9 ;"
+		"adcx  %%r9, %%rcx ;"
+		"movq %%rcx,  96(%0) ;"
+		"mulx 48(%2),  %%r8, %%r13; " /* C[3]*D[2] */
+		"adox  %%r8, %%r11 ;"
+		"adcx %%r11, %%r15 ;"
+		"movq %%r15, 104(%0) ;"
+		"mulx 56(%2), %%r10, %%rbx; " /* C[3]*D[3] */
+		"adox %%r10, %%r13 ;"
+		"adcx %%r13, %%rax ;"
+		"movq %%rax, 112(%0) ;"
+		/******************************************/
+		"adox %%r14, %%rbx ;"
+		"adcx %%r14, %%rbx ;"
+		"movq %%rbx, 120(%0) ;"
+		:
+		: "r"(c), "r"(a), "r"(b)
+		: "memory", "cc", "%rax", "%rbx", "%rcx", "%rdx", "%r8", "%r9",
+		  "%r10", "%r11", "%r13", "%r14", "%r15");
+}
+
+static void mul2_256x256_integer_bmi2(u64 *const c, const u64 *const a,
+				      const u64 *const b)
+{
+	asm volatile(
+		"movq   (%1), %%rdx; "	/* A[0] */
+		"mulx   (%2),  %%r8, %%r15; " /* A[0]*B[0] */
+		"movq %%r8,  (%0) ;"
+		"mulx  8(%2), %%r10, %%rax; " /* A[0]*B[1] */
+		"addq %%r10, %%r15 ;"
+		"mulx 16(%2),  %%r8, %%rbx; " /* A[0]*B[2] */
+		"adcq  %%r8, %%rax ;"
+		"mulx 24(%2), %%r10, %%rcx; " /* A[0]*B[3] */
+		"adcq %%r10, %%rbx ;"
+		/******************************************/
+		"adcq    $0, %%rcx ;"
+
+		"movq  8(%1), %%rdx; "	/* A[1] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[1]*B[0] */
+		"addq %%r15,  %%r8 ;"
+		"movq %%r8, 8(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[1]*B[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[1]*B[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 24(%2), %%r10, %%r15; " /* A[1]*B[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%r15 ;"
+
+		"addq  %%r9, %%rax ;"
+		"adcq %%r11, %%rbx ;"
+		"adcq %%r13, %%rcx ;"
+		"adcq    $0, %%r15 ;"
+
+		"movq 16(%1), %%rdx; "	/* A[2] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[2]*B[0] */
+		"addq %%rax,  %%r8 ;"
+		"movq %%r8, 16(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[2]*B[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[2]*B[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 24(%2), %%r10, %%rax; " /* A[2]*B[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%rax ;"
+
+		"addq  %%r9, %%rbx ;"
+		"adcq %%r11, %%rcx ;"
+		"adcq %%r13, %%r15 ;"
+		"adcq    $0, %%rax ;"
+
+		"movq 24(%1), %%rdx; "	/* A[3] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[3]*B[0] */
+		"addq %%rbx,  %%r8 ;"
+		"movq %%r8, 24(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[3]*B[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[3]*B[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 24(%2), %%r10, %%rbx; " /* A[3]*B[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%rbx ;"
+
+		"addq  %%r9, %%rcx ;"
+		"movq %%rcx, 32(%0) ;"
+		"adcq %%r11, %%r15 ;"
+		"movq %%r15, 40(%0) ;"
+		"adcq %%r13, %%rax ;"
+		"movq %%rax, 48(%0) ;"
+		"adcq    $0, %%rbx ;"
+		"movq %%rbx, 56(%0) ;"
+
+		"movq 32(%1), %%rdx; "	/* C[0] */
+		"mulx 32(%2),  %%r8, %%r15; " /* C[0]*D[0] */
+		"movq %%r8, 64(%0) ;"
+		"mulx 40(%2), %%r10, %%rax; " /* C[0]*D[1] */
+		"addq %%r10, %%r15 ;"
+		"mulx 48(%2),  %%r8, %%rbx; " /* C[0]*D[2] */
+		"adcq  %%r8, %%rax ;"
+		"mulx 56(%2), %%r10, %%rcx; " /* C[0]*D[3] */
+		"adcq %%r10, %%rbx ;"
+		/******************************************/
+		"adcq    $0, %%rcx ;"
+
+		"movq 40(%1), %%rdx; "	/* C[1] */
+		"mulx 32(%2),  %%r8,  %%r9; " /* C[1]*D[0] */
+		"addq %%r15,  %%r8 ;"
+		"movq %%r8, 72(%0) ;"
+		"mulx 40(%2), %%r10, %%r11; " /* C[1]*D[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 48(%2),  %%r8, %%r13; " /* C[1]*D[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 56(%2), %%r10, %%r15; " /* C[1]*D[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%r15 ;"
+
+		"addq  %%r9, %%rax ;"
+		"adcq %%r11, %%rbx ;"
+		"adcq %%r13, %%rcx ;"
+		"adcq    $0, %%r15 ;"
+
+		"movq 48(%1), %%rdx; "	/* C[2] */
+		"mulx 32(%2),  %%r8,  %%r9; " /* C[2]*D[0] */
+		"addq %%rax,  %%r8 ;"
+		"movq %%r8, 80(%0) ;"
+		"mulx 40(%2), %%r10, %%r11; " /* C[2]*D[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 48(%2),  %%r8, %%r13; " /* C[2]*D[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 56(%2), %%r10, %%rax; " /* C[2]*D[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%rax ;"
+
+		"addq  %%r9, %%rbx ;"
+		"adcq %%r11, %%rcx ;"
+		"adcq %%r13, %%r15 ;"
+		"adcq    $0, %%rax ;"
+
+		"movq 56(%1), %%rdx; "	/* C[3] */
+		"mulx 32(%2),  %%r8,  %%r9; " /* C[3]*D[0] */
+		"addq %%rbx,  %%r8 ;"
+		"movq %%r8, 88(%0) ;"
+		"mulx 40(%2), %%r10, %%r11; " /* C[3]*D[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 48(%2),  %%r8, %%r13; " /* C[3]*D[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 56(%2), %%r10, %%rbx; " /* C[3]*D[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%rbx ;"
+
+		"addq  %%r9, %%rcx ;"
+		"movq %%rcx,  96(%0) ;"
+		"adcq %%r11, %%r15 ;"
+		"movq %%r15, 104(%0) ;"
+		"adcq %%r13, %%rax ;"
+		"movq %%rax, 112(%0) ;"
+		"adcq    $0, %%rbx ;"
+		"movq %%rbx, 120(%0) ;"
+		:
+		: "r"(c), "r"(a), "r"(b)
+		: "memory", "cc", "%rax", "%rbx", "%rcx", "%rdx", "%r8", "%r9",
+		  "%r10", "%r11", "%r13", "%r15");
+}
+
+static void sqr2_256x256_integer_adx(u64 *const c, const u64 *const a)
+{
+	asm volatile(
+		"movq   (%1), %%rdx        ;" /* A[0]      */
+		"mulx  8(%1),  %%r8, %%r14 ;" /* A[1]*A[0] */
+		"xorl %%r15d, %%r15d;"
+		"mulx 16(%1),  %%r9, %%r10 ;" /* A[2]*A[0] */
+		"adcx %%r14,  %%r9 ;"
+		"mulx 24(%1), %%rax, %%rcx ;" /* A[3]*A[0] */
+		"adcx %%rax, %%r10 ;"
+		"movq 24(%1), %%rdx        ;" /* A[3]      */
+		"mulx  8(%1), %%r11, %%rbx ;" /* A[1]*A[3] */
+		"adcx %%rcx, %%r11 ;"
+		"mulx 16(%1), %%rax, %%r13 ;" /* A[2]*A[3] */
+		"adcx %%rax, %%rbx ;"
+		"movq  8(%1), %%rdx        ;" /* A[1]      */
+		"adcx %%r15, %%r13 ;"
+		"mulx 16(%1), %%rax, %%rcx ;" /* A[2]*A[1] */
+		"movq    $0, %%r14 ;"
+		/******************************************/
+		"adcx %%r15, %%r14 ;"
+
+		"xorl %%r15d, %%r15d;"
+		"adox %%rax, %%r10 ;"
+		"adcx  %%r8,  %%r8 ;"
+		"adox %%rcx, %%r11 ;"
+		"adcx  %%r9,  %%r9 ;"
+		"adox %%r15, %%rbx ;"
+		"adcx %%r10, %%r10 ;"
+		"adox %%r15, %%r13 ;"
+		"adcx %%r11, %%r11 ;"
+		"adox %%r15, %%r14 ;"
+		"adcx %%rbx, %%rbx ;"
+		"adcx %%r13, %%r13 ;"
+		"adcx %%r14, %%r14 ;"
+
+		"movq   (%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[0]^2 */
+		/*******************/
+		"movq %%rax,  0(%0) ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,  8(%0) ;"
+		"movq  8(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[1]^2 */
+		"adcq %%rax,  %%r9 ;"
+		"movq  %%r9, 16(%0) ;"
+		"adcq %%rcx, %%r10 ;"
+		"movq %%r10, 24(%0) ;"
+		"movq 16(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[2]^2 */
+		"adcq %%rax, %%r11 ;"
+		"movq %%r11, 32(%0) ;"
+		"adcq %%rcx, %%rbx ;"
+		"movq %%rbx, 40(%0) ;"
+		"movq 24(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[3]^2 */
+		"adcq %%rax, %%r13 ;"
+		"movq %%r13, 48(%0) ;"
+		"adcq %%rcx, %%r14 ;"
+		"movq %%r14, 56(%0) ;"
+
+
+		"movq 32(%1), %%rdx        ;" /* B[0]      */
+		"mulx 40(%1),  %%r8, %%r14 ;" /* B[1]*B[0] */
+		"xorl %%r15d, %%r15d;"
+		"mulx 48(%1),  %%r9, %%r10 ;" /* B[2]*B[0] */
+		"adcx %%r14,  %%r9 ;"
+		"mulx 56(%1), %%rax, %%rcx ;" /* B[3]*B[0] */
+		"adcx %%rax, %%r10 ;"
+		"movq 56(%1), %%rdx        ;" /* B[3]      */
+		"mulx 40(%1), %%r11, %%rbx ;" /* B[1]*B[3] */
+		"adcx %%rcx, %%r11 ;"
+		"mulx 48(%1), %%rax, %%r13 ;" /* B[2]*B[3] */
+		"adcx %%rax, %%rbx ;"
+		"movq 40(%1), %%rdx        ;" /* B[1]      */
+		"adcx %%r15, %%r13 ;"
+		"mulx 48(%1), %%rax, %%rcx ;" /* B[2]*B[1] */
+		"movq    $0, %%r14 ;"
+		/******************************************/
+		"adcx %%r15, %%r14 ;"
+
+		"xorl %%r15d, %%r15d;"
+		"adox %%rax, %%r10 ;"
+		"adcx  %%r8,  %%r8 ;"
+		"adox %%rcx, %%r11 ;"
+		"adcx  %%r9,  %%r9 ;"
+		"adox %%r15, %%rbx ;"
+		"adcx %%r10, %%r10 ;"
+		"adox %%r15, %%r13 ;"
+		"adcx %%r11, %%r11 ;"
+		"adox %%r15, %%r14 ;"
+		"adcx %%rbx, %%rbx ;"
+		"adcx %%r13, %%r13 ;"
+		"adcx %%r14, %%r14 ;"
+
+		"movq 32(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* B[0]^2 */
+		/*******************/
+		"movq %%rax,  64(%0) ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,  72(%0) ;"
+		"movq 40(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* B[1]^2 */
+		"adcq %%rax,  %%r9 ;"
+		"movq  %%r9,  80(%0) ;"
+		"adcq %%rcx, %%r10 ;"
+		"movq %%r10,  88(%0) ;"
+		"movq 48(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* B[2]^2 */
+		"adcq %%rax, %%r11 ;"
+		"movq %%r11,  96(%0) ;"
+		"adcq %%rcx, %%rbx ;"
+		"movq %%rbx, 104(%0) ;"
+		"movq 56(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* B[3]^2 */
+		"adcq %%rax, %%r13 ;"
+		"movq %%r13, 112(%0) ;"
+		"adcq %%rcx, %%r14 ;"
+		"movq %%r14, 120(%0) ;"
+		:
+		: "r"(c), "r"(a)
+		: "memory", "cc", "%rax", "%rbx", "%rcx", "%rdx", "%r8", "%r9",
+		  "%r10", "%r11", "%r13", "%r14", "%r15");
+}
+
+static void sqr2_256x256_integer_bmi2(u64 *const c, const u64 *const a)
+{
+	asm volatile(
+		"movq  8(%1), %%rdx        ;" /* A[1]      */
+		"mulx   (%1),  %%r8,  %%r9 ;" /* A[0]*A[1] */
+		"mulx 16(%1), %%r10, %%r11 ;" /* A[2]*A[1] */
+		"mulx 24(%1), %%rcx, %%r14 ;" /* A[3]*A[1] */
+
+		"movq 16(%1), %%rdx        ;" /* A[2]      */
+		"mulx 24(%1), %%r15, %%r13 ;" /* A[3]*A[2] */
+		"mulx   (%1), %%rax, %%rdx ;" /* A[0]*A[2] */
+
+		"addq %%rax,  %%r9 ;"
+		"adcq %%rdx, %%r10 ;"
+		"adcq %%rcx, %%r11 ;"
+		"adcq %%r14, %%r15 ;"
+		"adcq    $0, %%r13 ;"
+		"movq    $0, %%r14 ;"
+		"adcq    $0, %%r14 ;"
+
+		"movq   (%1), %%rdx        ;" /* A[0]      */
+		"mulx 24(%1), %%rax, %%rcx ;" /* A[0]*A[3] */
+
+		"addq %%rax, %%r10 ;"
+		"adcq %%rcx, %%r11 ;"
+		"adcq    $0, %%r15 ;"
+		"adcq    $0, %%r13 ;"
+		"adcq    $0, %%r14 ;"
+
+		"shldq $1, %%r13, %%r14 ;"
+		"shldq $1, %%r15, %%r13 ;"
+		"shldq $1, %%r11, %%r15 ;"
+		"shldq $1, %%r10, %%r11 ;"
+		"shldq $1,  %%r9, %%r10 ;"
+		"shldq $1,  %%r8,  %%r9 ;"
+		"shlq  $1,  %%r8        ;"
+
+		/*******************/
+		"mulx %%rdx, %%rax, %%rcx ; " /* A[0]^2 */
+		/*******************/
+		"movq %%rax,  0(%0) ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,  8(%0) ;"
+		"movq  8(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ; " /* A[1]^2 */
+		"adcq %%rax,  %%r9 ;"
+		"movq  %%r9, 16(%0) ;"
+		"adcq %%rcx, %%r10 ;"
+		"movq %%r10, 24(%0) ;"
+		"movq 16(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ; " /* A[2]^2 */
+		"adcq %%rax, %%r11 ;"
+		"movq %%r11, 32(%0) ;"
+		"adcq %%rcx, %%r15 ;"
+		"movq %%r15, 40(%0) ;"
+		"movq 24(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ; " /* A[3]^2 */
+		"adcq %%rax, %%r13 ;"
+		"movq %%r13, 48(%0) ;"
+		"adcq %%rcx, %%r14 ;"
+		"movq %%r14, 56(%0) ;"
+
+		"movq 40(%1), %%rdx        ;" /* B[1]      */
+		"mulx 32(%1),  %%r8,  %%r9 ;" /* B[0]*B[1] */
+		"mulx 48(%1), %%r10, %%r11 ;" /* B[2]*B[1] */
+		"mulx 56(%1), %%rcx, %%r14 ;" /* B[3]*B[1] */
+
+		"movq 48(%1), %%rdx        ;" /* B[2]      */
+		"mulx 56(%1), %%r15, %%r13 ;" /* B[3]*B[2] */
+		"mulx 32(%1), %%rax, %%rdx ;" /* B[0]*B[2] */
+
+		"addq %%rax,  %%r9 ;"
+		"adcq %%rdx, %%r10 ;"
+		"adcq %%rcx, %%r11 ;"
+		"adcq %%r14, %%r15 ;"
+		"adcq    $0, %%r13 ;"
+		"movq    $0, %%r14 ;"
+		"adcq    $0, %%r14 ;"
+
+		"movq 32(%1), %%rdx        ;" /* B[0]      */
+		"mulx 56(%1), %%rax, %%rcx ;" /* B[0]*B[3] */
+
+		"addq %%rax, %%r10 ;"
+		"adcq %%rcx, %%r11 ;"
+		"adcq    $0, %%r15 ;"
+		"adcq    $0, %%r13 ;"
+		"adcq    $0, %%r14 ;"
+
+		"shldq $1, %%r13, %%r14 ;"
+		"shldq $1, %%r15, %%r13 ;"
+		"shldq $1, %%r11, %%r15 ;"
+		"shldq $1, %%r10, %%r11 ;"
+		"shldq $1,  %%r9, %%r10 ;"
+		"shldq $1,  %%r8,  %%r9 ;"
+		"shlq  $1,  %%r8        ;"
+
+		/*******************/
+		"mulx %%rdx, %%rax, %%rcx ; " /* B[0]^2 */
+		/*******************/
+		"movq %%rax,  64(%0) ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,  72(%0) ;"
+		"movq 40(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ; " /* B[1]^2 */
+		"adcq %%rax,  %%r9 ;"
+		"movq  %%r9,  80(%0) ;"
+		"adcq %%rcx, %%r10 ;"
+		"movq %%r10,  88(%0) ;"
+		"movq 48(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ; " /* B[2]^2 */
+		"adcq %%rax, %%r11 ;"
+		"movq %%r11,  96(%0) ;"
+		"adcq %%rcx, %%r15 ;"
+		"movq %%r15, 104(%0) ;"
+		"movq 56(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ; " /* B[3]^2 */
+		"adcq %%rax, %%r13 ;"
+		"movq %%r13, 112(%0) ;"
+		"adcq %%rcx, %%r14 ;"
+		"movq %%r14, 120(%0) ;"
+		:
+		: "r"(c), "r"(a)
+		: "memory", "cc", "%rax", "%rcx", "%rdx", "%r8", "%r9", "%r10",
+		  "%r11", "%r13", "%r14", "%r15");
+}
+
+static void red_eltfp25519_2w_adx(u64 *const c, const u64 *const a)
+{
+	asm volatile(
+		"movl    $38, %%edx; "	/* 2*c = 38 = 2^256 */
+		"mulx 32(%1),  %%r8, %%r10; " /* c*C[4] */
+		"xorl %%ebx, %%ebx ;"
+		"adox   (%1),  %%r8 ;"
+		"mulx 40(%1),  %%r9, %%r11; " /* c*C[5] */
+		"adcx %%r10,  %%r9 ;"
+		"adox  8(%1),  %%r9 ;"
+		"mulx 48(%1), %%r10, %%rax; " /* c*C[6] */
+		"adcx %%r11, %%r10 ;"
+		"adox 16(%1), %%r10 ;"
+		"mulx 56(%1), %%r11, %%rcx; " /* c*C[7] */
+		"adcx %%rax, %%r11 ;"
+		"adox 24(%1), %%r11 ;"
+		/***************************************/
+		"adcx %%rbx, %%rcx ;"
+		"adox  %%rbx, %%rcx ;"
+		"imul %%rdx, %%rcx ;" /* c*C[4], cf=0, of=0 */
+		"adcx %%rcx,  %%r8 ;"
+		"adcx %%rbx,  %%r9 ;"
+		"movq  %%r9,  8(%0) ;"
+		"adcx %%rbx, %%r10 ;"
+		"movq %%r10, 16(%0) ;"
+		"adcx %%rbx, %%r11 ;"
+		"movq %%r11, 24(%0) ;"
+		"mov     $0, %%ecx ;"
+		"cmovc %%edx, %%ecx ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,   (%0) ;"
+
+		"mulx  96(%1),  %%r8, %%r10; " /* c*C[4] */
+		"xorl %%ebx, %%ebx ;"
+		"adox 64(%1),  %%r8 ;"
+		"mulx 104(%1),  %%r9, %%r11; " /* c*C[5] */
+		"adcx %%r10,  %%r9 ;"
+		"adox 72(%1),  %%r9 ;"
+		"mulx 112(%1), %%r10, %%rax; " /* c*C[6] */
+		"adcx %%r11, %%r10 ;"
+		"adox 80(%1), %%r10 ;"
+		"mulx 120(%1), %%r11, %%rcx; " /* c*C[7] */
+		"adcx %%rax, %%r11 ;"
+		"adox 88(%1), %%r11 ;"
+		/****************************************/
+		"adcx %%rbx, %%rcx ;"
+		"adox  %%rbx, %%rcx ;"
+		"imul %%rdx, %%rcx ;" /* c*C[4], cf=0, of=0 */
+		"adcx %%rcx,  %%r8 ;"
+		"adcx %%rbx,  %%r9 ;"
+		"movq  %%r9, 40(%0) ;"
+		"adcx %%rbx, %%r10 ;"
+		"movq %%r10, 48(%0) ;"
+		"adcx %%rbx, %%r11 ;"
+		"movq %%r11, 56(%0) ;"
+		"mov     $0, %%ecx ;"
+		"cmovc %%edx, %%ecx ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8, 32(%0) ;"
+		:
+		: "r"(c), "r"(a)
+		: "memory", "cc", "%rax", "%rbx", "%rcx", "%rdx", "%r8", "%r9",
+		  "%r10", "%r11");
+}
+
+static void red_eltfp25519_2w_bmi2(u64 *const c, const u64 *const a)
+{
+	asm volatile(
+		"movl    $38, %%edx ; "       /* 2*c = 38 = 2^256 */
+		"mulx 32(%1),  %%r8, %%r10 ;" /* c*C[4] */
+		"mulx 40(%1),  %%r9, %%r11 ;" /* c*C[5] */
+		"addq %%r10,  %%r9 ;"
+		"mulx 48(%1), %%r10, %%rax ;" /* c*C[6] */
+		"adcq %%r11, %%r10 ;"
+		"mulx 56(%1), %%r11, %%rcx ;" /* c*C[7] */
+		"adcq %%rax, %%r11 ;"
+		/***************************************/
+		"adcq    $0, %%rcx ;"
+		"addq   (%1),  %%r8 ;"
+		"adcq  8(%1),  %%r9 ;"
+		"adcq 16(%1), %%r10 ;"
+		"adcq 24(%1), %%r11 ;"
+		"adcq     $0, %%rcx ;"
+		"imul %%rdx, %%rcx ;" /* c*C[4], cf=0 */
+		"addq %%rcx,  %%r8 ;"
+		"adcq    $0,  %%r9 ;"
+		"movq  %%r9,  8(%0) ;"
+		"adcq    $0, %%r10 ;"
+		"movq %%r10, 16(%0) ;"
+		"adcq    $0, %%r11 ;"
+		"movq %%r11, 24(%0) ;"
+		"mov     $0, %%ecx ;"
+		"cmovc %%edx, %%ecx ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,   (%0) ;"
+
+		"mulx  96(%1),  %%r8, %%r10 ;" /* c*C[4] */
+		"mulx 104(%1),  %%r9, %%r11 ;" /* c*C[5] */
+		"addq %%r10,  %%r9 ;"
+		"mulx 112(%1), %%r10, %%rax ;" /* c*C[6] */
+		"adcq %%r11, %%r10 ;"
+		"mulx 120(%1), %%r11, %%rcx ;" /* c*C[7] */
+		"adcq %%rax, %%r11 ;"
+		/****************************************/
+		"adcq    $0, %%rcx ;"
+		"addq 64(%1),  %%r8 ;"
+		"adcq 72(%1),  %%r9 ;"
+		"adcq 80(%1), %%r10 ;"
+		"adcq 88(%1), %%r11 ;"
+		"adcq     $0, %%rcx ;"
+		"imul %%rdx, %%rcx ;" /* c*C[4], cf=0 */
+		"addq %%rcx,  %%r8 ;"
+		"adcq    $0,  %%r9 ;"
+		"movq  %%r9, 40(%0) ;"
+		"adcq    $0, %%r10 ;"
+		"movq %%r10, 48(%0) ;"
+		"adcq    $0, %%r11 ;"
+		"movq %%r11, 56(%0) ;"
+		"mov     $0, %%ecx ;"
+		"cmovc %%edx, %%ecx ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8, 32(%0) ;"
+		:
+		: "r"(c), "r"(a)
+		: "memory", "cc", "%rax", "%rcx", "%rdx", "%r8", "%r9", "%r10",
+		  "%r11");
+}
+
+static void mul_256x256_integer_adx(u64 *const c, const u64 *const a,
+				    const u64 *const b)
+{
+	asm volatile(
+		"movq   (%1), %%rdx; "	/* A[0] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[0]*B[0] */
+		"xorl %%r10d, %%r10d ;"
+		"movq  %%r8,  (%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[0]*B[1] */
+		"adox  %%r9, %%r10 ;"
+		"movq %%r10, 8(%0) ;"
+		"mulx 16(%2), %%r15, %%r13; " /* A[0]*B[2] */
+		"adox %%r11, %%r15 ;"
+		"mulx 24(%2), %%r14, %%rdx; " /* A[0]*B[3] */
+		"adox %%r13, %%r14 ;"
+		"movq $0, %%rax ;"
+		/******************************************/
+		"adox %%rdx, %%rax ;"
+
+		"movq  8(%1), %%rdx; "	/* A[1] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[1]*B[0] */
+		"xorl %%r10d, %%r10d ;"
+		"adcx 8(%0),  %%r8 ;"
+		"movq  %%r8,  8(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[1]*B[1] */
+		"adox  %%r9, %%r10 ;"
+		"adcx %%r15, %%r10 ;"
+		"movq %%r10, 16(%0) ;"
+		"mulx 16(%2), %%r15, %%r13; " /* A[1]*B[2] */
+		"adox %%r11, %%r15 ;"
+		"adcx %%r14, %%r15 ;"
+		"movq $0, %%r8  ;"
+		"mulx 24(%2), %%r14, %%rdx; " /* A[1]*B[3] */
+		"adox %%r13, %%r14 ;"
+		"adcx %%rax, %%r14 ;"
+		"movq $0, %%rax ;"
+		/******************************************/
+		"adox %%rdx, %%rax ;"
+		"adcx  %%r8, %%rax ;"
+
+		"movq 16(%1), %%rdx; "	/* A[2] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[2]*B[0] */
+		"xorl %%r10d, %%r10d ;"
+		"adcx 16(%0), %%r8 ;"
+		"movq  %%r8, 16(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[2]*B[1] */
+		"adox  %%r9, %%r10 ;"
+		"adcx %%r15, %%r10 ;"
+		"movq %%r10, 24(%0) ;"
+		"mulx 16(%2), %%r15, %%r13; " /* A[2]*B[2] */
+		"adox %%r11, %%r15 ;"
+		"adcx %%r14, %%r15 ;"
+		"movq $0, %%r8  ;"
+		"mulx 24(%2), %%r14, %%rdx; " /* A[2]*B[3] */
+		"adox %%r13, %%r14 ;"
+		"adcx %%rax, %%r14 ;"
+		"movq $0, %%rax ;"
+		/******************************************/
+		"adox %%rdx, %%rax ;"
+		"adcx  %%r8, %%rax ;"
+
+		"movq 24(%1), %%rdx; "	/* A[3] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[3]*B[0] */
+		"xorl %%r10d, %%r10d ;"
+		"adcx 24(%0), %%r8 ;"
+		"movq  %%r8, 24(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[3]*B[1] */
+		"adox  %%r9, %%r10 ;"
+		"adcx %%r15, %%r10 ;"
+		"movq %%r10, 32(%0) ;"
+		"mulx 16(%2), %%r15, %%r13; " /* A[3]*B[2] */
+		"adox %%r11, %%r15 ;"
+		"adcx %%r14, %%r15 ;"
+		"movq %%r15, 40(%0) ;"
+		"movq $0, %%r8  ;"
+		"mulx 24(%2), %%r14, %%rdx; " /* A[3]*B[3] */
+		"adox %%r13, %%r14 ;"
+		"adcx %%rax, %%r14 ;"
+		"movq %%r14, 48(%0) ;"
+		"movq $0, %%rax ;"
+		/******************************************/
+		"adox %%rdx, %%rax ;"
+		"adcx  %%r8, %%rax ;"
+		"movq %%rax, 56(%0) ;"
+		:
+		: "r"(c), "r"(a), "r"(b)
+		: "memory", "cc", "%rax", "%rdx", "%r8", "%r9", "%r10", "%r11",
+		  "%r13", "%r14", "%r15");
+}
+
+static void mul_256x256_integer_bmi2(u64 *const c, const u64 *const a,
+				     const u64 *const b)
+{
+	asm volatile(
+		"movq   (%1), %%rdx; "	/* A[0] */
+		"mulx   (%2),  %%r8, %%r15; " /* A[0]*B[0] */
+		"movq %%r8,  (%0) ;"
+		"mulx  8(%2), %%r10, %%rax; " /* A[0]*B[1] */
+		"addq %%r10, %%r15 ;"
+		"mulx 16(%2),  %%r8, %%rbx; " /* A[0]*B[2] */
+		"adcq  %%r8, %%rax ;"
+		"mulx 24(%2), %%r10, %%rcx; " /* A[0]*B[3] */
+		"adcq %%r10, %%rbx ;"
+		/******************************************/
+		"adcq    $0, %%rcx ;"
+
+		"movq  8(%1), %%rdx; "	/* A[1] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[1]*B[0] */
+		"addq %%r15,  %%r8 ;"
+		"movq %%r8, 8(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[1]*B[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[1]*B[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 24(%2), %%r10, %%r15; " /* A[1]*B[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%r15 ;"
+
+		"addq  %%r9, %%rax ;"
+		"adcq %%r11, %%rbx ;"
+		"adcq %%r13, %%rcx ;"
+		"adcq    $0, %%r15 ;"
+
+		"movq 16(%1), %%rdx; "	/* A[2] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[2]*B[0] */
+		"addq %%rax,  %%r8 ;"
+		"movq %%r8, 16(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[2]*B[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[2]*B[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 24(%2), %%r10, %%rax; " /* A[2]*B[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%rax ;"
+
+		"addq  %%r9, %%rbx ;"
+		"adcq %%r11, %%rcx ;"
+		"adcq %%r13, %%r15 ;"
+		"adcq    $0, %%rax ;"
+
+		"movq 24(%1), %%rdx; "	/* A[3] */
+		"mulx   (%2),  %%r8,  %%r9; " /* A[3]*B[0] */
+		"addq %%rbx,  %%r8 ;"
+		"movq %%r8, 24(%0) ;"
+		"mulx  8(%2), %%r10, %%r11; " /* A[3]*B[1] */
+		"adcq %%r10,  %%r9 ;"
+		"mulx 16(%2),  %%r8, %%r13; " /* A[3]*B[2] */
+		"adcq  %%r8, %%r11 ;"
+		"mulx 24(%2), %%r10, %%rbx; " /* A[3]*B[3] */
+		"adcq %%r10, %%r13 ;"
+		/******************************************/
+		"adcq    $0, %%rbx ;"
+
+		"addq  %%r9, %%rcx ;"
+		"movq %%rcx, 32(%0) ;"
+		"adcq %%r11, %%r15 ;"
+		"movq %%r15, 40(%0) ;"
+		"adcq %%r13, %%rax ;"
+		"movq %%rax, 48(%0) ;"
+		"adcq    $0, %%rbx ;"
+		"movq %%rbx, 56(%0) ;"
+		:
+		: "r"(c), "r"(a), "r"(b)
+		: "memory", "cc", "%rax", "%rbx", "%rcx", "%rdx", "%r8", "%r9",
+		  "%r10", "%r11", "%r13", "%r15");
+}
+
+static void sqr_256x256_integer_adx(u64 *const c, const u64 *const a)
+{
+	asm volatile(
+		"movq   (%1), %%rdx        ;" /* A[0]      */
+		"mulx  8(%1),  %%r8, %%r14 ;" /* A[1]*A[0] */
+		"xorl %%r15d, %%r15d;"
+		"mulx 16(%1),  %%r9, %%r10 ;" /* A[2]*A[0] */
+		"adcx %%r14,  %%r9 ;"
+		"mulx 24(%1), %%rax, %%rcx ;" /* A[3]*A[0] */
+		"adcx %%rax, %%r10 ;"
+		"movq 24(%1), %%rdx        ;" /* A[3]      */
+		"mulx  8(%1), %%r11, %%rbx ;" /* A[1]*A[3] */
+		"adcx %%rcx, %%r11 ;"
+		"mulx 16(%1), %%rax, %%r13 ;" /* A[2]*A[3] */
+		"adcx %%rax, %%rbx ;"
+		"movq  8(%1), %%rdx        ;" /* A[1]      */
+		"adcx %%r15, %%r13 ;"
+		"mulx 16(%1), %%rax, %%rcx ;" /* A[2]*A[1] */
+		"movq    $0, %%r14 ;"
+		/******************************************/
+		"adcx %%r15, %%r14 ;"
+
+		"xorl %%r15d, %%r15d;"
+		"adox %%rax, %%r10 ;"
+		"adcx  %%r8,  %%r8 ;"
+		"adox %%rcx, %%r11 ;"
+		"adcx  %%r9,  %%r9 ;"
+		"adox %%r15, %%rbx ;"
+		"adcx %%r10, %%r10 ;"
+		"adox %%r15, %%r13 ;"
+		"adcx %%r11, %%r11 ;"
+		"adox %%r15, %%r14 ;"
+		"adcx %%rbx, %%rbx ;"
+		"adcx %%r13, %%r13 ;"
+		"adcx %%r14, %%r14 ;"
+
+		"movq   (%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[0]^2 */
+		/*******************/
+		"movq %%rax,  0(%0) ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,  8(%0) ;"
+		"movq  8(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[1]^2 */
+		"adcq %%rax,  %%r9 ;"
+		"movq  %%r9, 16(%0) ;"
+		"adcq %%rcx, %%r10 ;"
+		"movq %%r10, 24(%0) ;"
+		"movq 16(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[2]^2 */
+		"adcq %%rax, %%r11 ;"
+		"movq %%r11, 32(%0) ;"
+		"adcq %%rcx, %%rbx ;"
+		"movq %%rbx, 40(%0) ;"
+		"movq 24(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[3]^2 */
+		"adcq %%rax, %%r13 ;"
+		"movq %%r13, 48(%0) ;"
+		"adcq %%rcx, %%r14 ;"
+		"movq %%r14, 56(%0) ;"
+		:
+		: "r"(c), "r"(a)
+		: "memory", "cc", "%rax", "%rbx", "%rcx", "%rdx", "%r8", "%r9",
+		  "%r10", "%r11", "%r13", "%r14", "%r15");
+}
+
+static void sqr_256x256_integer_bmi2(u64 *const c, const u64 *const a)
+{
+	asm volatile(
+		"movq  8(%1), %%rdx        ;" /* A[1]      */
+		"mulx   (%1),  %%r8,  %%r9 ;" /* A[0]*A[1] */
+		"mulx 16(%1), %%r10, %%r11 ;" /* A[2]*A[1] */
+		"mulx 24(%1), %%rcx, %%r14 ;" /* A[3]*A[1] */
+
+		"movq 16(%1), %%rdx        ;" /* A[2]      */
+		"mulx 24(%1), %%r15, %%r13 ;" /* A[3]*A[2] */
+		"mulx   (%1), %%rax, %%rdx ;" /* A[0]*A[2] */
+
+		"addq %%rax,  %%r9 ;"
+		"adcq %%rdx, %%r10 ;"
+		"adcq %%rcx, %%r11 ;"
+		"adcq %%r14, %%r15 ;"
+		"adcq    $0, %%r13 ;"
+		"movq    $0, %%r14 ;"
+		"adcq    $0, %%r14 ;"
+
+		"movq   (%1), %%rdx        ;" /* A[0]      */
+		"mulx 24(%1), %%rax, %%rcx ;" /* A[0]*A[3] */
+
+		"addq %%rax, %%r10 ;"
+		"adcq %%rcx, %%r11 ;"
+		"adcq    $0, %%r15 ;"
+		"adcq    $0, %%r13 ;"
+		"adcq    $0, %%r14 ;"
+
+		"shldq $1, %%r13, %%r14 ;"
+		"shldq $1, %%r15, %%r13 ;"
+		"shldq $1, %%r11, %%r15 ;"
+		"shldq $1, %%r10, %%r11 ;"
+		"shldq $1,  %%r9, %%r10 ;"
+		"shldq $1,  %%r8,  %%r9 ;"
+		"shlq  $1,  %%r8        ;"
+
+		/*******************/
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[0]^2 */
+		/*******************/
+		"movq %%rax,  0(%0) ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,  8(%0) ;"
+		"movq  8(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[1]^2 */
+		"adcq %%rax,  %%r9 ;"
+		"movq  %%r9, 16(%0) ;"
+		"adcq %%rcx, %%r10 ;"
+		"movq %%r10, 24(%0) ;"
+		"movq 16(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[2]^2 */
+		"adcq %%rax, %%r11 ;"
+		"movq %%r11, 32(%0) ;"
+		"adcq %%rcx, %%r15 ;"
+		"movq %%r15, 40(%0) ;"
+		"movq 24(%1), %%rdx ;"
+		"mulx %%rdx, %%rax, %%rcx ;" /* A[3]^2 */
+		"adcq %%rax, %%r13 ;"
+		"movq %%r13, 48(%0) ;"
+		"adcq %%rcx, %%r14 ;"
+		"movq %%r14, 56(%0) ;"
+		:
+		: "r"(c), "r"(a)
+		: "memory", "cc", "%rax", "%rcx", "%rdx", "%r8", "%r9", "%r10",
+		  "%r11", "%r13", "%r14", "%r15");
+}
+
+static void red_eltfp25519_1w_adx(u64 *const c, const u64 *const a)
+{
+	asm volatile(
+		"movl    $38, %%edx ;"	/* 2*c = 38 = 2^256 */
+		"mulx 32(%1),  %%r8, %%r10 ;" /* c*C[4] */
+		"xorl %%ebx, %%ebx ;"
+		"adox   (%1),  %%r8 ;"
+		"mulx 40(%1),  %%r9, %%r11 ;" /* c*C[5] */
+		"adcx %%r10,  %%r9 ;"
+		"adox  8(%1),  %%r9 ;"
+		"mulx 48(%1), %%r10, %%rax ;" /* c*C[6] */
+		"adcx %%r11, %%r10 ;"
+		"adox 16(%1), %%r10 ;"
+		"mulx 56(%1), %%r11, %%rcx ;" /* c*C[7] */
+		"adcx %%rax, %%r11 ;"
+		"adox 24(%1), %%r11 ;"
+		/***************************************/
+		"adcx %%rbx, %%rcx ;"
+		"adox  %%rbx, %%rcx ;"
+		"imul %%rdx, %%rcx ;" /* c*C[4], cf=0, of=0 */
+		"adcx %%rcx,  %%r8 ;"
+		"adcx %%rbx,  %%r9 ;"
+		"movq  %%r9,  8(%0) ;"
+		"adcx %%rbx, %%r10 ;"
+		"movq %%r10, 16(%0) ;"
+		"adcx %%rbx, %%r11 ;"
+		"movq %%r11, 24(%0) ;"
+		"mov     $0, %%ecx ;"
+		"cmovc %%edx, %%ecx ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,   (%0) ;"
+		:
+		: "r"(c), "r"(a)
+		: "memory", "cc", "%rax", "%rbx", "%rcx", "%rdx", "%r8", "%r9",
+		  "%r10", "%r11");
+}
+
+static void red_eltfp25519_1w_bmi2(u64 *const c, const u64 *const a)
+{
+	asm volatile(
+		"movl    $38, %%edx ;"	/* 2*c = 38 = 2^256 */
+		"mulx 32(%1),  %%r8, %%r10 ;" /* c*C[4] */
+		"mulx 40(%1),  %%r9, %%r11 ;" /* c*C[5] */
+		"addq %%r10,  %%r9 ;"
+		"mulx 48(%1), %%r10, %%rax ;" /* c*C[6] */
+		"adcq %%r11, %%r10 ;"
+		"mulx 56(%1), %%r11, %%rcx ;" /* c*C[7] */
+		"adcq %%rax, %%r11 ;"
+		/***************************************/
+		"adcq    $0, %%rcx ;"
+		"addq   (%1),  %%r8 ;"
+		"adcq  8(%1),  %%r9 ;"
+		"adcq 16(%1), %%r10 ;"
+		"adcq 24(%1), %%r11 ;"
+		"adcq     $0, %%rcx ;"
+		"imul %%rdx, %%rcx ;" /* c*C[4], cf=0 */
+		"addq %%rcx,  %%r8 ;"
+		"adcq    $0,  %%r9 ;"
+		"movq  %%r9,  8(%0) ;"
+		"adcq    $0, %%r10 ;"
+		"movq %%r10, 16(%0) ;"
+		"adcq    $0, %%r11 ;"
+		"movq %%r11, 24(%0) ;"
+		"mov     $0, %%ecx ;"
+		"cmovc %%edx, %%ecx ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,   (%0) ;"
+		:
+		: "r"(c), "r"(a)
+		: "memory", "cc", "%rax", "%rcx", "%rdx", "%r8", "%r9", "%r10",
+		  "%r11");
+}
+
+static __always_inline void
+add_eltfp25519_1w_adx(u64 *const c, const u64 *const a, const u64 *const b)
+{
+	asm volatile(
+		"mov     $38, %%eax ;"
+		"xorl  %%ecx, %%ecx ;"
+		"movq   (%2),  %%r8 ;"
+		"adcx   (%1),  %%r8 ;"
+		"movq  8(%2),  %%r9 ;"
+		"adcx  8(%1),  %%r9 ;"
+		"movq 16(%2), %%r10 ;"
+		"adcx 16(%1), %%r10 ;"
+		"movq 24(%2), %%r11 ;"
+		"adcx 24(%1), %%r11 ;"
+		"cmovc %%eax, %%ecx ;"
+		"xorl %%eax, %%eax  ;"
+		"adcx %%rcx,  %%r8  ;"
+		"adcx %%rax,  %%r9  ;"
+		"movq  %%r9,  8(%0) ;"
+		"adcx %%rax, %%r10  ;"
+		"movq %%r10, 16(%0) ;"
+		"adcx %%rax, %%r11  ;"
+		"movq %%r11, 24(%0) ;"
+		"mov     $38, %%ecx ;"
+		"cmovc %%ecx, %%eax ;"
+		"addq %%rax,  %%r8  ;"
+		"movq  %%r8,   (%0) ;"
+		:
+		: "r"(c), "r"(a), "r"(b)
+		: "memory", "cc", "%rax", "%rcx", "%r8", "%r9", "%r10", "%r11");
+}
+
+static __always_inline void
+add_eltfp25519_1w_bmi2(u64 *const c, const u64 *const a, const u64 *const b)
+{
+	asm volatile(
+		"mov     $38, %%eax ;"
+		"movq   (%2),  %%r8 ;"
+		"addq   (%1),  %%r8 ;"
+		"movq  8(%2),  %%r9 ;"
+		"adcq  8(%1),  %%r9 ;"
+		"movq 16(%2), %%r10 ;"
+		"adcq 16(%1), %%r10 ;"
+		"movq 24(%2), %%r11 ;"
+		"adcq 24(%1), %%r11 ;"
+		"mov      $0, %%ecx ;"
+		"cmovc %%eax, %%ecx ;"
+		"addq %%rcx,  %%r8  ;"
+		"adcq    $0,  %%r9  ;"
+		"movq  %%r9,  8(%0) ;"
+		"adcq    $0, %%r10  ;"
+		"movq %%r10, 16(%0) ;"
+		"adcq    $0, %%r11  ;"
+		"movq %%r11, 24(%0) ;"
+		"mov     $0, %%ecx  ;"
+		"cmovc %%eax, %%ecx ;"
+		"addq %%rcx,  %%r8  ;"
+		"movq  %%r8,   (%0) ;"
+		:
+		: "r"(c), "r"(a), "r"(b)
+		: "memory", "cc", "%rax", "%rcx", "%r8", "%r9", "%r10", "%r11");
+}
+
+static __always_inline void
+sub_eltfp25519_1w(u64 *const c, const u64 *const a, const u64 *const b)
+{
+	asm volatile(
+		"mov     $38, %%eax ;"
+		"movq   (%1),  %%r8 ;"
+		"subq   (%2),  %%r8 ;"
+		"movq  8(%1),  %%r9 ;"
+		"sbbq  8(%2),  %%r9 ;"
+		"movq 16(%1), %%r10 ;"
+		"sbbq 16(%2), %%r10 ;"
+		"movq 24(%1), %%r11 ;"
+		"sbbq 24(%2), %%r11 ;"
+		"mov      $0, %%ecx ;"
+		"cmovc %%eax, %%ecx ;"
+		"subq %%rcx,  %%r8  ;"
+		"sbbq    $0,  %%r9  ;"
+		"movq  %%r9,  8(%0) ;"
+		"sbbq    $0, %%r10  ;"
+		"movq %%r10, 16(%0) ;"
+		"sbbq    $0, %%r11  ;"
+		"movq %%r11, 24(%0) ;"
+		"mov     $0, %%ecx  ;"
+		"cmovc %%eax, %%ecx ;"
+		"subq %%rcx,  %%r8  ;"
+		"movq  %%r8,   (%0) ;"
+		:
+		: "r"(c), "r"(a), "r"(b)
+		: "memory", "cc", "%rax", "%rcx", "%r8", "%r9", "%r10", "%r11");
+}
+
+/* Multiplication by a24 = (A+2)/4 = (486662+2)/4 = 121666 */
+static __always_inline void
+mul_a24_eltfp25519_1w(u64 *const c, const u64 *const a)
+{
+	const u64 a24 = 121666;
+	asm volatile(
+		"movq     %2, %%rdx ;"
+		"mulx   (%1),  %%r8, %%r10 ;"
+		"mulx  8(%1),  %%r9, %%r11 ;"
+		"addq %%r10,  %%r9 ;"
+		"mulx 16(%1), %%r10, %%rax ;"
+		"adcq %%r11, %%r10 ;"
+		"mulx 24(%1), %%r11, %%rcx ;"
+		"adcq %%rax, %%r11 ;"
+		/**************************/
+		"adcq    $0, %%rcx ;"
+		"movl   $38, %%edx ;" /* 2*c = 38 = 2^256 mod 2^255-19*/
+		"imul %%rdx, %%rcx ;"
+		"addq %%rcx,  %%r8 ;"
+		"adcq    $0,  %%r9 ;"
+		"movq  %%r9,  8(%0) ;"
+		"adcq    $0, %%r10 ;"
+		"movq %%r10, 16(%0) ;"
+		"adcq    $0, %%r11 ;"
+		"movq %%r11, 24(%0) ;"
+		"mov     $0, %%ecx ;"
+		"cmovc %%edx, %%ecx ;"
+		"addq %%rcx,  %%r8 ;"
+		"movq  %%r8,   (%0) ;"
+		:
+		: "r"(c), "r"(a), "r"(a24)
+		: "memory", "cc", "%rax", "%rcx", "%rdx", "%r8", "%r9", "%r10",
+		  "%r11");
+}
+
+static void inv_eltfp25519_1w_adx(u64 *const c, const u64 *const a)
+{
+	struct {
+		eltfp25519_1w_buffer buffer;
+		eltfp25519_1w x0, x1, x2;
+	} __aligned(32) m;
+	u64 *T[4];
+
+	T[0] = m.x0;
+	T[1] = c; /* x^(-1) */
+	T[2] = m.x1;
+	T[3] = m.x2;
+
+	copy_eltfp25519_1w(T[1], a);
+	sqrn_eltfp25519_1w_adx(T[1], 1);
+	copy_eltfp25519_1w(T[2], T[1]);
+	sqrn_eltfp25519_1w_adx(T[2], 2);
+	mul_eltfp25519_1w_adx(T[0], a, T[2]);
+	mul_eltfp25519_1w_adx(T[1], T[1], T[0]);
+	copy_eltfp25519_1w(T[2], T[1]);
+	sqrn_eltfp25519_1w_adx(T[2], 1);
+	mul_eltfp25519_1w_adx(T[0], T[0], T[2]);
+	copy_eltfp25519_1w(T[2], T[0]);
+	sqrn_eltfp25519_1w_adx(T[2], 5);
+	mul_eltfp25519_1w_adx(T[0], T[0], T[2]);
+	copy_eltfp25519_1w(T[2], T[0]);
+	sqrn_eltfp25519_1w_adx(T[2], 10);
+	mul_eltfp25519_1w_adx(T[2], T[2], T[0]);
+	copy_eltfp25519_1w(T[3], T[2]);
+	sqrn_eltfp25519_1w_adx(T[3], 20);
+	mul_eltfp25519_1w_adx(T[3], T[3], T[2]);
+	sqrn_eltfp25519_1w_adx(T[3], 10);
+	mul_eltfp25519_1w_adx(T[3], T[3], T[0]);
+	copy_eltfp25519_1w(T[0], T[3]);
+	sqrn_eltfp25519_1w_adx(T[0], 50);
+	mul_eltfp25519_1w_adx(T[0], T[0], T[3]);
+	copy_eltfp25519_1w(T[2], T[0]);
+	sqrn_eltfp25519_1w_adx(T[2], 100);
+	mul_eltfp25519_1w_adx(T[2], T[2], T[0]);
+	sqrn_eltfp25519_1w_adx(T[2], 50);
+	mul_eltfp25519_1w_adx(T[2], T[2], T[3]);
+	sqrn_eltfp25519_1w_adx(T[2], 5);
+	mul_eltfp25519_1w_adx(T[1], T[1], T[2]);
+
+	memzero_explicit(&m, sizeof(m));
+}
+
+static void inv_eltfp25519_1w_bmi2(u64 *const c, const u64 *const a)
+{
+	struct {
+		eltfp25519_1w_buffer buffer;
+		eltfp25519_1w x0, x1, x2;
+	} __aligned(32) m;
+	u64 *T[5];
+
+	T[0] = m.x0;
+	T[1] = c; /* x^(-1) */
+	T[2] = m.x1;
+	T[3] = m.x2;
+
+	copy_eltfp25519_1w(T[1], a);
+	sqrn_eltfp25519_1w_bmi2(T[1], 1);
+	copy_eltfp25519_1w(T[2], T[1]);
+	sqrn_eltfp25519_1w_bmi2(T[2], 2);
+	mul_eltfp25519_1w_bmi2(T[0], a, T[2]);
+	mul_eltfp25519_1w_bmi2(T[1], T[1], T[0]);
+	copy_eltfp25519_1w(T[2], T[1]);
+	sqrn_eltfp25519_1w_bmi2(T[2], 1);
+	mul_eltfp25519_1w_bmi2(T[0], T[0], T[2]);
+	copy_eltfp25519_1w(T[2], T[0]);
+	sqrn_eltfp25519_1w_bmi2(T[2], 5);
+	mul_eltfp25519_1w_bmi2(T[0], T[0], T[2]);
+	copy_eltfp25519_1w(T[2], T[0]);
+	sqrn_eltfp25519_1w_bmi2(T[2], 10);
+	mul_eltfp25519_1w_bmi2(T[2], T[2], T[0]);
+	copy_eltfp25519_1w(T[3], T[2]);
+	sqrn_eltfp25519_1w_bmi2(T[3], 20);
+	mul_eltfp25519_1w_bmi2(T[3], T[3], T[2]);
+	sqrn_eltfp25519_1w_bmi2(T[3], 10);
+	mul_eltfp25519_1w_bmi2(T[3], T[3], T[0]);
+	copy_eltfp25519_1w(T[0], T[3]);
+	sqrn_eltfp25519_1w_bmi2(T[0], 50);
+	mul_eltfp25519_1w_bmi2(T[0], T[0], T[3]);
+	copy_eltfp25519_1w(T[2], T[0]);
+	sqrn_eltfp25519_1w_bmi2(T[2], 100);
+	mul_eltfp25519_1w_bmi2(T[2], T[2], T[0]);
+	sqrn_eltfp25519_1w_bmi2(T[2], 50);
+	mul_eltfp25519_1w_bmi2(T[2], T[2], T[3]);
+	sqrn_eltfp25519_1w_bmi2(T[2], 5);
+	mul_eltfp25519_1w_bmi2(T[1], T[1], T[2]);
+
+	memzero_explicit(&m, sizeof(m));
+}
+
+/* Given c, a 256-bit number, fred_eltfp25519_1w updates c
+ * with a number such that 0 <= C < 2**255-19.
+ */
+static __always_inline void fred_eltfp25519_1w(u64 *const c)
+{
+	u64 tmp0 = 38, tmp1 = 19;
+	asm volatile(
+		"btrq   $63,    %3 ;" /* Put bit 255 in carry flag and clear */
+		"cmovncl %k5,   %k4 ;" /* c[255] ? 38 : 19 */
+
+		/* Add either 19 or 38 to c */
+		"addq    %4,   %0 ;"
+		"adcq    $0,   %1 ;"
+		"adcq    $0,   %2 ;"
+		"adcq    $0,   %3 ;"
+
+		/* Test for bit 255 again; only triggered on overflow modulo 2^255-19 */
+		"movl    $0,  %k4 ;"
+		"cmovnsl %k5,  %k4 ;" /* c[255] ? 0 : 19 */
+		"btrq   $63,   %3 ;" /* Clear bit 255 */
+
+		/* Subtract 19 if necessary */
+		"subq    %4,   %0 ;"
+		"sbbq    $0,   %1 ;"
+		"sbbq    $0,   %2 ;"
+		"sbbq    $0,   %3 ;"
+
+		: "+r"(c[0]), "+r"(c[1]), "+r"(c[2]), "+r"(c[3]), "+r"(tmp0),
+		  "+r"(tmp1)
+		:
+		: "memory", "cc");
+}
+
+static __always_inline void cswap(u8 bit, u64 *const px, u64 *const py)
+{
+	u64 temp;
+	asm volatile(
+		"test %9, %9 ;"
+		"movq %0, %8 ;"
+		"cmovnzq %4, %0 ;"
+		"cmovnzq %8, %4 ;"
+		"movq %1, %8 ;"
+		"cmovnzq %5, %1 ;"
+		"cmovnzq %8, %5 ;"
+		"movq %2, %8 ;"
+		"cmovnzq %6, %2 ;"
+		"cmovnzq %8, %6 ;"
+		"movq %3, %8 ;"
+		"cmovnzq %7, %3 ;"
+		"cmovnzq %8, %7 ;"
+		: "+r"(px[0]), "+r"(px[1]), "+r"(px[2]), "+r"(px[3]),
+		  "+r"(py[0]), "+r"(py[1]), "+r"(py[2]), "+r"(py[3]),
+		  "=r"(temp)
+		: "r"(bit)
+		: "cc"
+	);
+}
+
+static __always_inline void cselect(u8 bit, u64 *const px, const u64 *const py)
+{
+	asm volatile(
+		"test %4, %4 ;"
+		"cmovnzq %5, %0 ;"
+		"cmovnzq %6, %1 ;"
+		"cmovnzq %7, %2 ;"
+		"cmovnzq %8, %3 ;"
+		: "+r"(px[0]), "+r"(px[1]), "+r"(px[2]), "+r"(px[3])
+		: "r"(bit), "rm"(py[0]), "rm"(py[1]), "rm"(py[2]), "rm"(py[3])
+		: "cc"
+	);
+}
+
+static __always_inline void clamp_secret(u8 secret[CURVE25519_POINT_SIZE])
+{
+	secret[0] &= 248;
+	secret[31] &= 127;
+	secret[31] |= 64;
+}
+
+static void curve25519_adx(u8 shared[CURVE25519_POINT_SIZE],
+			   const u8 private_key[CURVE25519_POINT_SIZE],
+			   const u8 session_key[CURVE25519_POINT_SIZE])
+{
+	struct {
+		u64 buffer[4 * NUM_WORDS_ELTFP25519];
+		u64 coordinates[4 * NUM_WORDS_ELTFP25519];
+		u64 workspace[6 * NUM_WORDS_ELTFP25519];
+		u8 session[CURVE25519_POINT_SIZE];
+		u8 private[CURVE25519_POINT_SIZE];
+	} __aligned(32) m;
+
+	int i = 0, j = 0;
+	u64 prev = 0;
+	u64 *const X1 = (u64 *)m.session;
+	u64 *const key = (u64 *)m.private;
+	u64 *const Px = m.coordinates + 0;
+	u64 *const Pz = m.coordinates + 4;
+	u64 *const Qx = m.coordinates + 8;
+	u64 *const Qz = m.coordinates + 12;
+	u64 *const X2 = Qx;
+	u64 *const Z2 = Qz;
+	u64 *const X3 = Px;
+	u64 *const Z3 = Pz;
+	u64 *const X2Z2 = Qx;
+	u64 *const X3Z3 = Px;
+
+	u64 *const A = m.workspace + 0;
+	u64 *const B = m.workspace + 4;
+	u64 *const D = m.workspace + 8;
+	u64 *const C = m.workspace + 12;
+	u64 *const DA = m.workspace + 16;
+	u64 *const CB = m.workspace + 20;
+	u64 *const AB = A;
+	u64 *const DC = D;
+	u64 *const DACB = DA;
+
+	memcpy(m.private, private_key, sizeof(m.private));
+	memcpy(m.session, session_key, sizeof(m.session));
+
+	clamp_secret(m.private);
+
+	/* As in the draft:
+	 * When receiving such an array, implementations of curve25519
+	 * MUST mask the most-significant bit in the final byte. This
+	 * is done to preserve compatibility with point formats which
+	 * reserve the sign bit for use in other protocols and to
+	 * increase resistance to implementation fingerprinting
+	 */
+	m.session[CURVE25519_POINT_SIZE - 1] &= (1 << (255 % 8)) - 1;
+
+	copy_eltfp25519_1w(Px, X1);
+	setzero_eltfp25519_1w(Pz);
+	setzero_eltfp25519_1w(Qx);
+	setzero_eltfp25519_1w(Qz);
+
+	Pz[0] = 1;
+	Qx[0] = 1;
+
+	/* main-loop */
+	prev = 0;
+	j = 62;
+	for (i = 3; i >= 0; --i) {
+		while (j >= 0) {
+			u64 bit = (key[i] >> j) & 0x1;
+			u64 swap = bit ^ prev;
+			prev = bit;
+
+			add_eltfp25519_1w_adx(A, X2, Z2);	/* A = (X2+Z2) */
+			sub_eltfp25519_1w(B, X2, Z2);		/* B = (X2-Z2) */
+			add_eltfp25519_1w_adx(C, X3, Z3);	/* C = (X3+Z3) */
+			sub_eltfp25519_1w(D, X3, Z3);		/* D = (X3-Z3) */
+			mul_eltfp25519_2w_adx(DACB, AB, DC);	/* [DA|CB] = [A|B]*[D|C] */
+
+			cselect(swap, A, C);
+			cselect(swap, B, D);
+
+			sqr_eltfp25519_2w_adx(AB);		/* [AA|BB] = [A^2|B^2] */
+			add_eltfp25519_1w_adx(X3, DA, CB);	/* X3 = (DA+CB) */
+			sub_eltfp25519_1w(Z3, DA, CB);		/* Z3 = (DA-CB) */
+			sqr_eltfp25519_2w_adx(X3Z3);		/* [X3|Z3] = [(DA+CB)|(DA+CB)]^2 */
+
+			copy_eltfp25519_1w(X2, B);		/* X2 = B^2 */
+			sub_eltfp25519_1w(Z2, A, B);		/* Z2 = E = AA-BB */
+
+			mul_a24_eltfp25519_1w(B, Z2);		/* B = a24*E */
+			add_eltfp25519_1w_adx(B, B, X2);	/* B = a24*E+B */
+			mul_eltfp25519_2w_adx(X2Z2, X2Z2, AB);	/* [X2|Z2] = [B|E]*[A|a24*E+B] */
+			mul_eltfp25519_1w_adx(Z3, Z3, X1);	/* Z3 = Z3*X1 */
+			--j;
+		}
+		j = 63;
+	}
+
+	inv_eltfp25519_1w_adx(A, Qz);
+	mul_eltfp25519_1w_adx((u64 *)shared, Qx, A);
+	fred_eltfp25519_1w((u64 *)shared);
+
+	memzero_explicit(&m, sizeof(m));
+}
+
+static void curve25519_adx_base(u8 session_key[CURVE25519_POINT_SIZE],
+				const u8 private_key[CURVE25519_POINT_SIZE])
+{
+	struct {
+		u64 buffer[4 * NUM_WORDS_ELTFP25519];
+		u64 coordinates[4 * NUM_WORDS_ELTFP25519];
+		u64 workspace[4 * NUM_WORDS_ELTFP25519];
+		u8 private[CURVE25519_POINT_SIZE];
+	} __aligned(32) m;
+
+	const int ite[4] = { 64, 64, 64, 63 };
+	const int q = 3;
+	u64 swap = 1;
+
+	int i = 0, j = 0, k = 0;
+	u64 *const key = (u64 *)m.private;
+	u64 *const Ur1 = m.coordinates + 0;
+	u64 *const Zr1 = m.coordinates + 4;
+	u64 *const Ur2 = m.coordinates + 8;
+	u64 *const Zr2 = m.coordinates + 12;
+
+	u64 *const UZr1 = m.coordinates + 0;
+	u64 *const ZUr2 = m.coordinates + 8;
+
+	u64 *const A = m.workspace + 0;
+	u64 *const B = m.workspace + 4;
+	u64 *const C = m.workspace + 8;
+	u64 *const D = m.workspace + 12;
+
+	u64 *const AB = m.workspace + 0;
+	u64 *const CD = m.workspace + 8;
+
+	const u64 *const P = table_ladder_8k;
+
+	memcpy(m.private, private_key, sizeof(m.private));
+
+	clamp_secret(m.private);
+
+	setzero_eltfp25519_1w(Ur1);
+	setzero_eltfp25519_1w(Zr1);
+	setzero_eltfp25519_1w(Zr2);
+	Ur1[0] = 1;
+	Zr1[0] = 1;
+	Zr2[0] = 1;
+
+	/* G-S */
+	Ur2[3] = 0x1eaecdeee27cab34UL;
+	Ur2[2] = 0xadc7a0b9235d48e2UL;
+	Ur2[1] = 0xbbf095ae14b2edf8UL;
+	Ur2[0] = 0x7e94e1fec82faabdUL;
+
+	/* main-loop */
+	j = q;
+	for (i = 0; i < NUM_WORDS_ELTFP25519; ++i) {
+		while (j < ite[i]) {
+			u64 bit = (key[i] >> j) & 0x1;
+			k = (64 * i + j - q);
+			swap = swap ^ bit;
+			cswap(swap, Ur1, Ur2);
+			cswap(swap, Zr1, Zr2);
+			swap = bit;
+			/* Addition */
+			sub_eltfp25519_1w(B, Ur1, Zr1);		/* B = Ur1-Zr1 */
+			add_eltfp25519_1w_adx(A, Ur1, Zr1);	/* A = Ur1+Zr1 */
+			mul_eltfp25519_1w_adx(C, &P[4 * k], B);	/* C = M0-B */
+			sub_eltfp25519_1w(B, A, C);		/* B = (Ur1+Zr1) - M*(Ur1-Zr1) */
+			add_eltfp25519_1w_adx(A, A, C);		/* A = (Ur1+Zr1) + M*(Ur1-Zr1) */
+			sqr_eltfp25519_2w_adx(AB);		/* A = A^2      |  B = B^2 */
+			mul_eltfp25519_2w_adx(UZr1, ZUr2, AB);	/* Ur1 = Zr2*A  |  Zr1 = Ur2*B */
+			++j;
+		}
+		j = 0;
+	}
+
+	/* Doubling */
+	for (i = 0; i < q; ++i) {
+		add_eltfp25519_1w_adx(A, Ur1, Zr1);	/*  A = Ur1+Zr1 */
+		sub_eltfp25519_1w(B, Ur1, Zr1);		/*  B = Ur1-Zr1 */
+		sqr_eltfp25519_2w_adx(AB);		/*  A = A**2     B = B**2 */
+		copy_eltfp25519_1w(C, B);		/*  C = B */
+		sub_eltfp25519_1w(B, A, B);		/*  B = A-B */
+		mul_a24_eltfp25519_1w(D, B);		/*  D = my_a24*B */
+		add_eltfp25519_1w_adx(D, D, C);		/*  D = D+C */
+		mul_eltfp25519_2w_adx(UZr1, AB, CD);	/*  Ur1 = A*B   Zr1 = Zr1*A */
+	}
+
+	/* Convert to affine coordinates */
+	inv_eltfp25519_1w_adx(A, Zr1);
+	mul_eltfp25519_1w_adx((u64 *)session_key, Ur1, A);
+	fred_eltfp25519_1w((u64 *)session_key);
+
+	memzero_explicit(&m, sizeof(m));
+}
+
+static void curve25519_bmi2(u8 shared[CURVE25519_POINT_SIZE],
+			    const u8 private_key[CURVE25519_POINT_SIZE],
+			    const u8 session_key[CURVE25519_POINT_SIZE])
+{
+	struct {
+		u64 buffer[4 * NUM_WORDS_ELTFP25519];
+		u64 coordinates[4 * NUM_WORDS_ELTFP25519];
+		u64 workspace[6 * NUM_WORDS_ELTFP25519];
+		u8 session[CURVE25519_POINT_SIZE];
+		u8 private[CURVE25519_POINT_SIZE];
+	} __aligned(32) m;
+
+	int i = 0, j = 0;
+	u64 prev = 0;
+	u64 *const X1 = (u64 *)m.session;
+	u64 *const key = (u64 *)m.private;
+	u64 *const Px = m.coordinates + 0;
+	u64 *const Pz = m.coordinates + 4;
+	u64 *const Qx = m.coordinates + 8;
+	u64 *const Qz = m.coordinates + 12;
+	u64 *const X2 = Qx;
+	u64 *const Z2 = Qz;
+	u64 *const X3 = Px;
+	u64 *const Z3 = Pz;
+	u64 *const X2Z2 = Qx;
+	u64 *const X3Z3 = Px;
+
+	u64 *const A = m.workspace + 0;
+	u64 *const B = m.workspace + 4;
+	u64 *const D = m.workspace + 8;
+	u64 *const C = m.workspace + 12;
+	u64 *const DA = m.workspace + 16;
+	u64 *const CB = m.workspace + 20;
+	u64 *const AB = A;
+	u64 *const DC = D;
+	u64 *const DACB = DA;
+
+	memcpy(m.private, private_key, sizeof(m.private));
+	memcpy(m.session, session_key, sizeof(m.session));
+
+	clamp_secret(m.private);
+
+	/* As in the draft:
+	 * When receiving such an array, implementations of curve25519
+	 * MUST mask the most-significant bit in the final byte. This
+	 * is done to preserve compatibility with point formats which
+	 * reserve the sign bit for use in other protocols and to
+	 * increase resistance to implementation fingerprinting
+	 */
+	m.session[CURVE25519_POINT_SIZE - 1] &= (1 << (255 % 8)) - 1;
+
+	copy_eltfp25519_1w(Px, X1);
+	setzero_eltfp25519_1w(Pz);
+	setzero_eltfp25519_1w(Qx);
+	setzero_eltfp25519_1w(Qz);
+
+	Pz[0] = 1;
+	Qx[0] = 1;
+
+	/* main-loop */
+	prev = 0;
+	j = 62;
+	for (i = 3; i >= 0; --i) {
+		while (j >= 0) {
+			u64 bit = (key[i] >> j) & 0x1;
+			u64 swap = bit ^ prev;
+			prev = bit;
+
+			add_eltfp25519_1w_bmi2(A, X2, Z2);	/* A = (X2+Z2) */
+			sub_eltfp25519_1w(B, X2, Z2);		/* B = (X2-Z2) */
+			add_eltfp25519_1w_bmi2(C, X3, Z3);	/* C = (X3+Z3) */
+			sub_eltfp25519_1w(D, X3, Z3);		/* D = (X3-Z3) */
+			mul_eltfp25519_2w_bmi2(DACB, AB, DC);	/* [DA|CB] = [A|B]*[D|C] */
+
+			cselect(swap, A, C);
+			cselect(swap, B, D);
+
+			sqr_eltfp25519_2w_bmi2(AB);		/* [AA|BB] = [A^2|B^2] */
+			add_eltfp25519_1w_bmi2(X3, DA, CB);	/* X3 = (DA+CB) */
+			sub_eltfp25519_1w(Z3, DA, CB);		/* Z3 = (DA-CB) */
+			sqr_eltfp25519_2w_bmi2(X3Z3);		/* [X3|Z3] = [(DA+CB)|(DA+CB)]^2 */
+
+			copy_eltfp25519_1w(X2, B);		/* X2 = B^2 */
+			sub_eltfp25519_1w(Z2, A, B);		/* Z2 = E = AA-BB */
+
+			mul_a24_eltfp25519_1w(B, Z2);		/* B = a24*E */
+			add_eltfp25519_1w_bmi2(B, B, X2);	/* B = a24*E+B */
+			mul_eltfp25519_2w_bmi2(X2Z2, X2Z2, AB);	/* [X2|Z2] = [B|E]*[A|a24*E+B] */
+			mul_eltfp25519_1w_bmi2(Z3, Z3, X1);	/* Z3 = Z3*X1 */
+			--j;
+		}
+		j = 63;
+	}
+
+	inv_eltfp25519_1w_bmi2(A, Qz);
+	mul_eltfp25519_1w_bmi2((u64 *)shared, Qx, A);
+	fred_eltfp25519_1w((u64 *)shared);
+
+	memzero_explicit(&m, sizeof(m));
+}
+
+static void curve25519_bmi2_base(u8 session_key[CURVE25519_POINT_SIZE],
+				 const u8 private_key[CURVE25519_POINT_SIZE])
+{
+	struct {
+		u64 buffer[4 * NUM_WORDS_ELTFP25519];
+		u64 coordinates[4 * NUM_WORDS_ELTFP25519];
+		u64 workspace[4 * NUM_WORDS_ELTFP25519];
+		u8 private[CURVE25519_POINT_SIZE];
+	} __aligned(32) m;
+
+	const int ite[4] = { 64, 64, 64, 63 };
+	const int q = 3;
+	u64 swap = 1;
+
+	int i = 0, j = 0, k = 0;
+	u64 *const key = (u64 *)m.private;
+	u64 *const Ur1 = m.coordinates + 0;
+	u64 *const Zr1 = m.coordinates + 4;
+	u64 *const Ur2 = m.coordinates + 8;
+	u64 *const Zr2 = m.coordinates + 12;
+
+	u64 *const UZr1 = m.coordinates + 0;
+	u64 *const ZUr2 = m.coordinates + 8;
+
+	u64 *const A = m.workspace + 0;
+	u64 *const B = m.workspace + 4;
+	u64 *const C = m.workspace + 8;
+	u64 *const D = m.workspace + 12;
+
+	u64 *const AB = m.workspace + 0;
+	u64 *const CD = m.workspace + 8;
+
+	const u64 *const P = table_ladder_8k;
+
+	memcpy(m.private, private_key, sizeof(m.private));
+
+	clamp_secret(m.private);
+
+	setzero_eltfp25519_1w(Ur1);
+	setzero_eltfp25519_1w(Zr1);
+	setzero_eltfp25519_1w(Zr2);
+	Ur1[0] = 1;
+	Zr1[0] = 1;
+	Zr2[0] = 1;
+
+	/* G-S */
+	Ur2[3] = 0x1eaecdeee27cab34UL;
+	Ur2[2] = 0xadc7a0b9235d48e2UL;
+	Ur2[1] = 0xbbf095ae14b2edf8UL;
+	Ur2[0] = 0x7e94e1fec82faabdUL;
+
+	/* main-loop */
+	j = q;
+	for (i = 0; i < NUM_WORDS_ELTFP25519; ++i) {
+		while (j < ite[i]) {
+			u64 bit = (key[i] >> j) & 0x1;
+			k = (64 * i + j - q);
+			swap = swap ^ bit;
+			cswap(swap, Ur1, Ur2);
+			cswap(swap, Zr1, Zr2);
+			swap = bit;
+			/* Addition */
+			sub_eltfp25519_1w(B, Ur1, Zr1);		/* B = Ur1-Zr1 */
+			add_eltfp25519_1w_bmi2(A, Ur1, Zr1);	/* A = Ur1+Zr1 */
+			mul_eltfp25519_1w_bmi2(C, &P[4 * k], B);/* C = M0-B */
+			sub_eltfp25519_1w(B, A, C);		/* B = (Ur1+Zr1) - M*(Ur1-Zr1) */
+			add_eltfp25519_1w_bmi2(A, A, C);	/* A = (Ur1+Zr1) + M*(Ur1-Zr1) */
+			sqr_eltfp25519_2w_bmi2(AB);		/* A = A^2      |  B = B^2 */
+			mul_eltfp25519_2w_bmi2(UZr1, ZUr2, AB);	/* Ur1 = Zr2*A  |  Zr1 = Ur2*B */
+			++j;
+		}
+		j = 0;
+	}
+
+	/* Doubling */
+	for (i = 0; i < q; ++i) {
+		add_eltfp25519_1w_bmi2(A, Ur1, Zr1);	/*  A = Ur1+Zr1 */
+		sub_eltfp25519_1w(B, Ur1, Zr1);		/*  B = Ur1-Zr1 */
+		sqr_eltfp25519_2w_bmi2(AB);		/*  A = A**2     B = B**2 */
+		copy_eltfp25519_1w(C, B);		/*  C = B */
+		sub_eltfp25519_1w(B, A, B);		/*  B = A-B */
+		mul_a24_eltfp25519_1w(D, B);		/*  D = my_a24*B */
+		add_eltfp25519_1w_bmi2(D, D, C);	/*  D = D+C */
+		mul_eltfp25519_2w_bmi2(UZr1, AB, CD);	/*  Ur1 = A*B   Zr1 = Zr1*A */
+	}
+
+	/* Convert to affine coordinates */
+	inv_eltfp25519_1w_bmi2(A, Zr1);
+	mul_eltfp25519_1w_bmi2((u64 *)session_key, Ur1, A);
+	fred_eltfp25519_1w((u64 *)session_key);
+
+	memzero_explicit(&m, sizeof(m));
+}
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (15 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 16/20] zinc: Curve25519 x86_64 implementation Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-15 23:02   ` kbuild test robot
  2018-09-15 23:39   ` kbuild test robot
  2018-09-14 16:22 ` [PATCH net-next v4 18/20] crypto: port ChaCha20 " Jason A. Donenfeld
                   ` (2 subsequent siblings)
  19 siblings, 2 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers

Now that Poly1305 is in Zinc, we can have the crypto API code simply
call into it. We have to do a little bit of book keeping here, because
the crypto API receives the key in the first few calls to update.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/Makefile               |   3 -
 arch/x86/crypto/poly1305-avx2-x86_64.S | 388 ----------------
 arch/x86/crypto/poly1305-sse2-x86_64.S | 584 -------------------------
 arch/x86/crypto/poly1305_glue.c        | 205 ---------
 crypto/Kconfig                         |  15 +-
 crypto/Makefile                        |   2 +-
 crypto/chacha20poly1305.c              |  12 +-
 crypto/poly1305_generic.c              | 304 -------------
 crypto/poly1305_zinc.c                 |  92 ++++
 include/crypto/poly1305.h              |  40 --
 10 files changed, 101 insertions(+), 1544 deletions(-)
 delete mode 100644 arch/x86/crypto/poly1305-avx2-x86_64.S
 delete mode 100644 arch/x86/crypto/poly1305-sse2-x86_64.S
 delete mode 100644 arch/x86/crypto/poly1305_glue.c
 delete mode 100644 crypto/poly1305_generic.c
 create mode 100644 crypto/poly1305_zinc.c
 delete mode 100644 include/crypto/poly1305.h

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index a450ad573dcb..cf830219846b 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -34,7 +34,6 @@ obj-$(CONFIG_CRYPTO_CRC32_PCLMUL) += crc32-pclmul.o
 obj-$(CONFIG_CRYPTO_SHA256_SSSE3) += sha256-ssse3.o
 obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
-obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
 
 obj-$(CONFIG_CRYPTO_AEGIS128_AESNI_SSE2) += aegis128-aesni.o
 obj-$(CONFIG_CRYPTO_AEGIS128L_AESNI_SSE2) += aegis128l-aesni.o
@@ -110,10 +109,8 @@ aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
 sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
-poly1305-x86_64-y := poly1305-sse2-x86_64.o poly1305_glue.o
 ifeq ($(avx2_supported),yes)
 sha1-ssse3-y += sha1_avx2_x86_64_asm.o
-poly1305-x86_64-y += poly1305-avx2-x86_64.o
 endif
 ifeq ($(sha1_ni_supported),yes)
 sha1-ssse3-y += sha1_ni_asm.o
diff --git a/arch/x86/crypto/poly1305-avx2-x86_64.S b/arch/x86/crypto/poly1305-avx2-x86_64.S
deleted file mode 100644
index 3b6e70d085da..000000000000
--- a/arch/x86/crypto/poly1305-avx2-x86_64.S
+++ /dev/null
@@ -1,388 +0,0 @@
-/*
- * Poly1305 authenticator algorithm, RFC7539, x64 AVX2 functions
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <linux/linkage.h>
-
-.section	.rodata.cst32.ANMASK, "aM", @progbits, 32
-.align 32
-ANMASK:	.octa 0x0000000003ffffff0000000003ffffff
-	.octa 0x0000000003ffffff0000000003ffffff
-
-.section	.rodata.cst32.ORMASK, "aM", @progbits, 32
-.align 32
-ORMASK:	.octa 0x00000000010000000000000001000000
-	.octa 0x00000000010000000000000001000000
-
-.text
-
-#define h0 0x00(%rdi)
-#define h1 0x04(%rdi)
-#define h2 0x08(%rdi)
-#define h3 0x0c(%rdi)
-#define h4 0x10(%rdi)
-#define r0 0x00(%rdx)
-#define r1 0x04(%rdx)
-#define r2 0x08(%rdx)
-#define r3 0x0c(%rdx)
-#define r4 0x10(%rdx)
-#define u0 0x00(%r8)
-#define u1 0x04(%r8)
-#define u2 0x08(%r8)
-#define u3 0x0c(%r8)
-#define u4 0x10(%r8)
-#define w0 0x14(%r8)
-#define w1 0x18(%r8)
-#define w2 0x1c(%r8)
-#define w3 0x20(%r8)
-#define w4 0x24(%r8)
-#define y0 0x28(%r8)
-#define y1 0x2c(%r8)
-#define y2 0x30(%r8)
-#define y3 0x34(%r8)
-#define y4 0x38(%r8)
-#define m %rsi
-#define hc0 %ymm0
-#define hc1 %ymm1
-#define hc2 %ymm2
-#define hc3 %ymm3
-#define hc4 %ymm4
-#define hc0x %xmm0
-#define hc1x %xmm1
-#define hc2x %xmm2
-#define hc3x %xmm3
-#define hc4x %xmm4
-#define t1 %ymm5
-#define t2 %ymm6
-#define t1x %xmm5
-#define t2x %xmm6
-#define ruwy0 %ymm7
-#define ruwy1 %ymm8
-#define ruwy2 %ymm9
-#define ruwy3 %ymm10
-#define ruwy4 %ymm11
-#define ruwy0x %xmm7
-#define ruwy1x %xmm8
-#define ruwy2x %xmm9
-#define ruwy3x %xmm10
-#define ruwy4x %xmm11
-#define svxz1 %ymm12
-#define svxz2 %ymm13
-#define svxz3 %ymm14
-#define svxz4 %ymm15
-#define d0 %r9
-#define d1 %r10
-#define d2 %r11
-#define d3 %r12
-#define d4 %r13
-
-ENTRY(poly1305_4block_avx2)
-	# %rdi: Accumulator h[5]
-	# %rsi: 64 byte input block m
-	# %rdx: Poly1305 key r[5]
-	# %rcx: Quadblock count
-	# %r8:  Poly1305 derived key r^2 u[5], r^3 w[5], r^4 y[5],
-
-	# This four-block variant uses loop unrolled block processing. It
-	# requires 4 Poly1305 keys: r, r^2, r^3 and r^4:
-	# h = (h + m) * r  =>  h = (h + m1) * r^4 + m2 * r^3 + m3 * r^2 + m4 * r
-
-	vzeroupper
-	push		%rbx
-	push		%r12
-	push		%r13
-
-	# combine r0,u0,w0,y0
-	vmovd		y0,ruwy0x
-	vmovd		w0,t1x
-	vpunpcklqdq	t1,ruwy0,ruwy0
-	vmovd		u0,t1x
-	vmovd		r0,t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,ruwy0,ruwy0
-
-	# combine r1,u1,w1,y1 and s1=r1*5,v1=u1*5,x1=w1*5,z1=y1*5
-	vmovd		y1,ruwy1x
-	vmovd		w1,t1x
-	vpunpcklqdq	t1,ruwy1,ruwy1
-	vmovd		u1,t1x
-	vmovd		r1,t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,ruwy1,ruwy1
-	vpslld		$2,ruwy1,svxz1
-	vpaddd		ruwy1,svxz1,svxz1
-
-	# combine r2,u2,w2,y2 and s2=r2*5,v2=u2*5,x2=w2*5,z2=y2*5
-	vmovd		y2,ruwy2x
-	vmovd		w2,t1x
-	vpunpcklqdq	t1,ruwy2,ruwy2
-	vmovd		u2,t1x
-	vmovd		r2,t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,ruwy2,ruwy2
-	vpslld		$2,ruwy2,svxz2
-	vpaddd		ruwy2,svxz2,svxz2
-
-	# combine r3,u3,w3,y3 and s3=r3*5,v3=u3*5,x3=w3*5,z3=y3*5
-	vmovd		y3,ruwy3x
-	vmovd		w3,t1x
-	vpunpcklqdq	t1,ruwy3,ruwy3
-	vmovd		u3,t1x
-	vmovd		r3,t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,ruwy3,ruwy3
-	vpslld		$2,ruwy3,svxz3
-	vpaddd		ruwy3,svxz3,svxz3
-
-	# combine r4,u4,w4,y4 and s4=r4*5,v4=u4*5,x4=w4*5,z4=y4*5
-	vmovd		y4,ruwy4x
-	vmovd		w4,t1x
-	vpunpcklqdq	t1,ruwy4,ruwy4
-	vmovd		u4,t1x
-	vmovd		r4,t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,ruwy4,ruwy4
-	vpslld		$2,ruwy4,svxz4
-	vpaddd		ruwy4,svxz4,svxz4
-
-.Ldoblock4:
-	# hc0 = [m[48-51] & 0x3ffffff, m[32-35] & 0x3ffffff,
-	#	 m[16-19] & 0x3ffffff, m[ 0- 3] & 0x3ffffff + h0]
-	vmovd		0x00(m),hc0x
-	vmovd		0x10(m),t1x
-	vpunpcklqdq	t1,hc0,hc0
-	vmovd		0x20(m),t1x
-	vmovd		0x30(m),t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,hc0,hc0
-	vpand		ANMASK(%rip),hc0,hc0
-	vmovd		h0,t1x
-	vpaddd		t1,hc0,hc0
-	# hc1 = [(m[51-54] >> 2) & 0x3ffffff, (m[35-38] >> 2) & 0x3ffffff,
-	#	 (m[19-22] >> 2) & 0x3ffffff, (m[ 3- 6] >> 2) & 0x3ffffff + h1]
-	vmovd		0x03(m),hc1x
-	vmovd		0x13(m),t1x
-	vpunpcklqdq	t1,hc1,hc1
-	vmovd		0x23(m),t1x
-	vmovd		0x33(m),t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,hc1,hc1
-	vpsrld		$2,hc1,hc1
-	vpand		ANMASK(%rip),hc1,hc1
-	vmovd		h1,t1x
-	vpaddd		t1,hc1,hc1
-	# hc2 = [(m[54-57] >> 4) & 0x3ffffff, (m[38-41] >> 4) & 0x3ffffff,
-	#	 (m[22-25] >> 4) & 0x3ffffff, (m[ 6- 9] >> 4) & 0x3ffffff + h2]
-	vmovd		0x06(m),hc2x
-	vmovd		0x16(m),t1x
-	vpunpcklqdq	t1,hc2,hc2
-	vmovd		0x26(m),t1x
-	vmovd		0x36(m),t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,hc2,hc2
-	vpsrld		$4,hc2,hc2
-	vpand		ANMASK(%rip),hc2,hc2
-	vmovd		h2,t1x
-	vpaddd		t1,hc2,hc2
-	# hc3 = [(m[57-60] >> 6) & 0x3ffffff, (m[41-44] >> 6) & 0x3ffffff,
-	#	 (m[25-28] >> 6) & 0x3ffffff, (m[ 9-12] >> 6) & 0x3ffffff + h3]
-	vmovd		0x09(m),hc3x
-	vmovd		0x19(m),t1x
-	vpunpcklqdq	t1,hc3,hc3
-	vmovd		0x29(m),t1x
-	vmovd		0x39(m),t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,hc3,hc3
-	vpsrld		$6,hc3,hc3
-	vpand		ANMASK(%rip),hc3,hc3
-	vmovd		h3,t1x
-	vpaddd		t1,hc3,hc3
-	# hc4 = [(m[60-63] >> 8) | (1<<24), (m[44-47] >> 8) | (1<<24),
-	#	 (m[28-31] >> 8) | (1<<24), (m[12-15] >> 8) | (1<<24) + h4]
-	vmovd		0x0c(m),hc4x
-	vmovd		0x1c(m),t1x
-	vpunpcklqdq	t1,hc4,hc4
-	vmovd		0x2c(m),t1x
-	vmovd		0x3c(m),t2x
-	vpunpcklqdq	t2,t1,t1
-	vperm2i128	$0x20,t1,hc4,hc4
-	vpsrld		$8,hc4,hc4
-	vpor		ORMASK(%rip),hc4,hc4
-	vmovd		h4,t1x
-	vpaddd		t1,hc4,hc4
-
-	# t1 = [ hc0[3] * r0, hc0[2] * u0, hc0[1] * w0, hc0[0] * y0 ]
-	vpmuludq	hc0,ruwy0,t1
-	# t1 += [ hc1[3] * s4, hc1[2] * v4, hc1[1] * x4, hc1[0] * z4 ]
-	vpmuludq	hc1,svxz4,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc2[3] * s3, hc2[2] * v3, hc2[1] * x3, hc2[0] * z3 ]
-	vpmuludq	hc2,svxz3,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc3[3] * s2, hc3[2] * v2, hc3[1] * x2, hc3[0] * z2 ]
-	vpmuludq	hc3,svxz2,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc4[3] * s1, hc4[2] * v1, hc4[1] * x1, hc4[0] * z1 ]
-	vpmuludq	hc4,svxz1,t2
-	vpaddq		t2,t1,t1
-	# d0 = t1[0] + t1[1] + t[2] + t[3]
-	vpermq		$0xee,t1,t2
-	vpaddq		t2,t1,t1
-	vpsrldq		$8,t1,t2
-	vpaddq		t2,t1,t1
-	vmovq		t1x,d0
-
-	# t1 = [ hc0[3] * r1, hc0[2] * u1,hc0[1] * w1, hc0[0] * y1 ]
-	vpmuludq	hc0,ruwy1,t1
-	# t1 += [ hc1[3] * r0, hc1[2] * u0, hc1[1] * w0, hc1[0] * y0 ]
-	vpmuludq	hc1,ruwy0,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc2[3] * s4, hc2[2] * v4, hc2[1] * x4, hc2[0] * z4 ]
-	vpmuludq	hc2,svxz4,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc3[3] * s3, hc3[2] * v3, hc3[1] * x3, hc3[0] * z3 ]
-	vpmuludq	hc3,svxz3,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc4[3] * s2, hc4[2] * v2, hc4[1] * x2, hc4[0] * z2 ]
-	vpmuludq	hc4,svxz2,t2
-	vpaddq		t2,t1,t1
-	# d1 = t1[0] + t1[1] + t1[3] + t1[4]
-	vpermq		$0xee,t1,t2
-	vpaddq		t2,t1,t1
-	vpsrldq		$8,t1,t2
-	vpaddq		t2,t1,t1
-	vmovq		t1x,d1
-
-	# t1 = [ hc0[3] * r2, hc0[2] * u2, hc0[1] * w2, hc0[0] * y2 ]
-	vpmuludq	hc0,ruwy2,t1
-	# t1 += [ hc1[3] * r1, hc1[2] * u1, hc1[1] * w1, hc1[0] * y1 ]
-	vpmuludq	hc1,ruwy1,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc2[3] * r0, hc2[2] * u0, hc2[1] * w0, hc2[0] * y0 ]
-	vpmuludq	hc2,ruwy0,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc3[3] * s4, hc3[2] * v4, hc3[1] * x4, hc3[0] * z4 ]
-	vpmuludq	hc3,svxz4,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc4[3] * s3, hc4[2] * v3, hc4[1] * x3, hc4[0] * z3 ]
-	vpmuludq	hc4,svxz3,t2
-	vpaddq		t2,t1,t1
-	# d2 = t1[0] + t1[1] + t1[2] + t1[3]
-	vpermq		$0xee,t1,t2
-	vpaddq		t2,t1,t1
-	vpsrldq		$8,t1,t2
-	vpaddq		t2,t1,t1
-	vmovq		t1x,d2
-
-	# t1 = [ hc0[3] * r3, hc0[2] * u3, hc0[1] * w3, hc0[0] * y3 ]
-	vpmuludq	hc0,ruwy3,t1
-	# t1 += [ hc1[3] * r2, hc1[2] * u2, hc1[1] * w2, hc1[0] * y2 ]
-	vpmuludq	hc1,ruwy2,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc2[3] * r1, hc2[2] * u1, hc2[1] * w1, hc2[0] * y1 ]
-	vpmuludq	hc2,ruwy1,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc3[3] * r0, hc3[2] * u0, hc3[1] * w0, hc3[0] * y0 ]
-	vpmuludq	hc3,ruwy0,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc4[3] * s4, hc4[2] * v4, hc4[1] * x4, hc4[0] * z4 ]
-	vpmuludq	hc4,svxz4,t2
-	vpaddq		t2,t1,t1
-	# d3 = t1[0] + t1[1] + t1[2] + t1[3]
-	vpermq		$0xee,t1,t2
-	vpaddq		t2,t1,t1
-	vpsrldq		$8,t1,t2
-	vpaddq		t2,t1,t1
-	vmovq		t1x,d3
-
-	# t1 = [ hc0[3] * r4, hc0[2] * u4, hc0[1] * w4, hc0[0] * y4 ]
-	vpmuludq	hc0,ruwy4,t1
-	# t1 += [ hc1[3] * r3, hc1[2] * u3, hc1[1] * w3, hc1[0] * y3 ]
-	vpmuludq	hc1,ruwy3,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc2[3] * r2, hc2[2] * u2, hc2[1] * w2, hc2[0] * y2 ]
-	vpmuludq	hc2,ruwy2,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc3[3] * r1, hc3[2] * u1, hc3[1] * w1, hc3[0] * y1 ]
-	vpmuludq	hc3,ruwy1,t2
-	vpaddq		t2,t1,t1
-	# t1 += [ hc4[3] * r0, hc4[2] * u0, hc4[1] * w0, hc4[0] * y0 ]
-	vpmuludq	hc4,ruwy0,t2
-	vpaddq		t2,t1,t1
-	# d4 = t1[0] + t1[1] + t1[2] + t1[3]
-	vpermq		$0xee,t1,t2
-	vpaddq		t2,t1,t1
-	vpsrldq		$8,t1,t2
-	vpaddq		t2,t1,t1
-	vmovq		t1x,d4
-
-	# d1 += d0 >> 26
-	mov		d0,%rax
-	shr		$26,%rax
-	add		%rax,d1
-	# h0 = d0 & 0x3ffffff
-	mov		d0,%rbx
-	and		$0x3ffffff,%ebx
-
-	# d2 += d1 >> 26
-	mov		d1,%rax
-	shr		$26,%rax
-	add		%rax,d2
-	# h1 = d1 & 0x3ffffff
-	mov		d1,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h1
-
-	# d3 += d2 >> 26
-	mov		d2,%rax
-	shr		$26,%rax
-	add		%rax,d3
-	# h2 = d2 & 0x3ffffff
-	mov		d2,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h2
-
-	# d4 += d3 >> 26
-	mov		d3,%rax
-	shr		$26,%rax
-	add		%rax,d4
-	# h3 = d3 & 0x3ffffff
-	mov		d3,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h3
-
-	# h0 += (d4 >> 26) * 5
-	mov		d4,%rax
-	shr		$26,%rax
-	lea		(%eax,%eax,4),%eax
-	add		%eax,%ebx
-	# h4 = d4 & 0x3ffffff
-	mov		d4,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h4
-
-	# h1 += h0 >> 26
-	mov		%ebx,%eax
-	shr		$26,%eax
-	add		%eax,h1
-	# h0 = h0 & 0x3ffffff
-	andl		$0x3ffffff,%ebx
-	mov		%ebx,h0
-
-	add		$0x40,m
-	dec		%rcx
-	jnz		.Ldoblock4
-
-	vzeroupper
-	pop		%r13
-	pop		%r12
-	pop		%rbx
-	ret
-ENDPROC(poly1305_4block_avx2)
diff --git a/arch/x86/crypto/poly1305-sse2-x86_64.S b/arch/x86/crypto/poly1305-sse2-x86_64.S
deleted file mode 100644
index c88c670cb5fc..000000000000
--- a/arch/x86/crypto/poly1305-sse2-x86_64.S
+++ /dev/null
@@ -1,584 +0,0 @@
-/*
- * Poly1305 authenticator algorithm, RFC7539, x64 SSE2 functions
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <linux/linkage.h>
-
-.section	.rodata.cst16.ANMASK, "aM", @progbits, 16
-.align 16
-ANMASK:	.octa 0x0000000003ffffff0000000003ffffff
-
-.section	.rodata.cst16.ORMASK, "aM", @progbits, 16
-.align 16
-ORMASK:	.octa 0x00000000010000000000000001000000
-
-.text
-
-#define h0 0x00(%rdi)
-#define h1 0x04(%rdi)
-#define h2 0x08(%rdi)
-#define h3 0x0c(%rdi)
-#define h4 0x10(%rdi)
-#define r0 0x00(%rdx)
-#define r1 0x04(%rdx)
-#define r2 0x08(%rdx)
-#define r3 0x0c(%rdx)
-#define r4 0x10(%rdx)
-#define s1 0x00(%rsp)
-#define s2 0x04(%rsp)
-#define s3 0x08(%rsp)
-#define s4 0x0c(%rsp)
-#define m %rsi
-#define h01 %xmm0
-#define h23 %xmm1
-#define h44 %xmm2
-#define t1 %xmm3
-#define t2 %xmm4
-#define t3 %xmm5
-#define t4 %xmm6
-#define mask %xmm7
-#define d0 %r8
-#define d1 %r9
-#define d2 %r10
-#define d3 %r11
-#define d4 %r12
-
-ENTRY(poly1305_block_sse2)
-	# %rdi: Accumulator h[5]
-	# %rsi: 16 byte input block m
-	# %rdx: Poly1305 key r[5]
-	# %rcx: Block count
-
-	# This single block variant tries to improve performance by doing two
-	# multiplications in parallel using SSE instructions. There is quite
-	# some quardword packing involved, hence the speedup is marginal.
-
-	push		%rbx
-	push		%r12
-	sub		$0x10,%rsp
-
-	# s1..s4 = r1..r4 * 5
-	mov		r1,%eax
-	lea		(%eax,%eax,4),%eax
-	mov		%eax,s1
-	mov		r2,%eax
-	lea		(%eax,%eax,4),%eax
-	mov		%eax,s2
-	mov		r3,%eax
-	lea		(%eax,%eax,4),%eax
-	mov		%eax,s3
-	mov		r4,%eax
-	lea		(%eax,%eax,4),%eax
-	mov		%eax,s4
-
-	movdqa		ANMASK(%rip),mask
-
-.Ldoblock:
-	# h01 = [0, h1, 0, h0]
-	# h23 = [0, h3, 0, h2]
-	# h44 = [0, h4, 0, h4]
-	movd		h0,h01
-	movd		h1,t1
-	movd		h2,h23
-	movd		h3,t2
-	movd		h4,h44
-	punpcklqdq	t1,h01
-	punpcklqdq	t2,h23
-	punpcklqdq	h44,h44
-
-	# h01 += [ (m[3-6] >> 2) & 0x3ffffff, m[0-3] & 0x3ffffff ]
-	movd		0x00(m),t1
-	movd		0x03(m),t2
-	psrld		$2,t2
-	punpcklqdq	t2,t1
-	pand		mask,t1
-	paddd		t1,h01
-	# h23 += [ (m[9-12] >> 6) & 0x3ffffff, (m[6-9] >> 4) & 0x3ffffff ]
-	movd		0x06(m),t1
-	movd		0x09(m),t2
-	psrld		$4,t1
-	psrld		$6,t2
-	punpcklqdq	t2,t1
-	pand		mask,t1
-	paddd		t1,h23
-	# h44 += [ (m[12-15] >> 8) | (1 << 24), (m[12-15] >> 8) | (1 << 24) ]
-	mov		0x0c(m),%eax
-	shr		$8,%eax
-	or		$0x01000000,%eax
-	movd		%eax,t1
-	pshufd		$0xc4,t1,t1
-	paddd		t1,h44
-
-	# t1[0] = h0 * r0 + h2 * s3
-	# t1[1] = h1 * s4 + h3 * s2
-	movd		r0,t1
-	movd		s4,t2
-	punpcklqdq	t2,t1
-	pmuludq		h01,t1
-	movd		s3,t2
-	movd		s2,t3
-	punpcklqdq	t3,t2
-	pmuludq		h23,t2
-	paddq		t2,t1
-	# t2[0] = h0 * r1 + h2 * s4
-	# t2[1] = h1 * r0 + h3 * s3
-	movd		r1,t2
-	movd		r0,t3
-	punpcklqdq	t3,t2
-	pmuludq		h01,t2
-	movd		s4,t3
-	movd		s3,t4
-	punpcklqdq	t4,t3
-	pmuludq		h23,t3
-	paddq		t3,t2
-	# t3[0] = h4 * s1
-	# t3[1] = h4 * s2
-	movd		s1,t3
-	movd		s2,t4
-	punpcklqdq	t4,t3
-	pmuludq		h44,t3
-	# d0 = t1[0] + t1[1] + t3[0]
-	# d1 = t2[0] + t2[1] + t3[1]
-	movdqa		t1,t4
-	punpcklqdq	t2,t4
-	punpckhqdq	t2,t1
-	paddq		t4,t1
-	paddq		t3,t1
-	movq		t1,d0
-	psrldq		$8,t1
-	movq		t1,d1
-
-	# t1[0] = h0 * r2 + h2 * r0
-	# t1[1] = h1 * r1 + h3 * s4
-	movd		r2,t1
-	movd		r1,t2
-	punpcklqdq 	t2,t1
-	pmuludq		h01,t1
-	movd		r0,t2
-	movd		s4,t3
-	punpcklqdq	t3,t2
-	pmuludq		h23,t2
-	paddq		t2,t1
-	# t2[0] = h0 * r3 + h2 * r1
-	# t2[1] = h1 * r2 + h3 * r0
-	movd		r3,t2
-	movd		r2,t3
-	punpcklqdq	t3,t2
-	pmuludq		h01,t2
-	movd		r1,t3
-	movd		r0,t4
-	punpcklqdq	t4,t3
-	pmuludq		h23,t3
-	paddq		t3,t2
-	# t3[0] = h4 * s3
-	# t3[1] = h4 * s4
-	movd		s3,t3
-	movd		s4,t4
-	punpcklqdq	t4,t3
-	pmuludq		h44,t3
-	# d2 = t1[0] + t1[1] + t3[0]
-	# d3 = t2[0] + t2[1] + t3[1]
-	movdqa		t1,t4
-	punpcklqdq	t2,t4
-	punpckhqdq	t2,t1
-	paddq		t4,t1
-	paddq		t3,t1
-	movq		t1,d2
-	psrldq		$8,t1
-	movq		t1,d3
-
-	# t1[0] = h0 * r4 + h2 * r2
-	# t1[1] = h1 * r3 + h3 * r1
-	movd		r4,t1
-	movd		r3,t2
-	punpcklqdq	t2,t1
-	pmuludq		h01,t1
-	movd		r2,t2
-	movd		r1,t3
-	punpcklqdq	t3,t2
-	pmuludq		h23,t2
-	paddq		t2,t1
-	# t3[0] = h4 * r0
-	movd		r0,t3
-	pmuludq		h44,t3
-	# d4 = t1[0] + t1[1] + t3[0]
-	movdqa		t1,t4
-	psrldq		$8,t4
-	paddq		t4,t1
-	paddq		t3,t1
-	movq		t1,d4
-
-	# d1 += d0 >> 26
-	mov		d0,%rax
-	shr		$26,%rax
-	add		%rax,d1
-	# h0 = d0 & 0x3ffffff
-	mov		d0,%rbx
-	and		$0x3ffffff,%ebx
-
-	# d2 += d1 >> 26
-	mov		d1,%rax
-	shr		$26,%rax
-	add		%rax,d2
-	# h1 = d1 & 0x3ffffff
-	mov		d1,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h1
-
-	# d3 += d2 >> 26
-	mov		d2,%rax
-	shr		$26,%rax
-	add		%rax,d3
-	# h2 = d2 & 0x3ffffff
-	mov		d2,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h2
-
-	# d4 += d3 >> 26
-	mov		d3,%rax
-	shr		$26,%rax
-	add		%rax,d4
-	# h3 = d3 & 0x3ffffff
-	mov		d3,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h3
-
-	# h0 += (d4 >> 26) * 5
-	mov		d4,%rax
-	shr		$26,%rax
-	lea		(%eax,%eax,4),%eax
-	add		%eax,%ebx
-	# h4 = d4 & 0x3ffffff
-	mov		d4,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h4
-
-	# h1 += h0 >> 26
-	mov		%ebx,%eax
-	shr		$26,%eax
-	add		%eax,h1
-	# h0 = h0 & 0x3ffffff
-	andl		$0x3ffffff,%ebx
-	mov		%ebx,h0
-
-	add		$0x10,m
-	dec		%rcx
-	jnz		.Ldoblock
-
-	add		$0x10,%rsp
-	pop		%r12
-	pop		%rbx
-	ret
-ENDPROC(poly1305_block_sse2)
-
-
-#define u0 0x00(%r8)
-#define u1 0x04(%r8)
-#define u2 0x08(%r8)
-#define u3 0x0c(%r8)
-#define u4 0x10(%r8)
-#define hc0 %xmm0
-#define hc1 %xmm1
-#define hc2 %xmm2
-#define hc3 %xmm5
-#define hc4 %xmm6
-#define ru0 %xmm7
-#define ru1 %xmm8
-#define ru2 %xmm9
-#define ru3 %xmm10
-#define ru4 %xmm11
-#define sv1 %xmm12
-#define sv2 %xmm13
-#define sv3 %xmm14
-#define sv4 %xmm15
-#undef d0
-#define d0 %r13
-
-ENTRY(poly1305_2block_sse2)
-	# %rdi: Accumulator h[5]
-	# %rsi: 16 byte input block m
-	# %rdx: Poly1305 key r[5]
-	# %rcx: Doubleblock count
-	# %r8:  Poly1305 derived key r^2 u[5]
-
-	# This two-block variant further improves performance by using loop
-	# unrolled block processing. This is more straight forward and does
-	# less byte shuffling, but requires a second Poly1305 key r^2:
-	# h = (h + m) * r    =>    h = (h + m1) * r^2 + m2 * r
-
-	push		%rbx
-	push		%r12
-	push		%r13
-
-	# combine r0,u0
-	movd		u0,ru0
-	movd		r0,t1
-	punpcklqdq	t1,ru0
-
-	# combine r1,u1 and s1=r1*5,v1=u1*5
-	movd		u1,ru1
-	movd		r1,t1
-	punpcklqdq	t1,ru1
-	movdqa		ru1,sv1
-	pslld		$2,sv1
-	paddd		ru1,sv1
-
-	# combine r2,u2 and s2=r2*5,v2=u2*5
-	movd		u2,ru2
-	movd		r2,t1
-	punpcklqdq	t1,ru2
-	movdqa		ru2,sv2
-	pslld		$2,sv2
-	paddd		ru2,sv2
-
-	# combine r3,u3 and s3=r3*5,v3=u3*5
-	movd		u3,ru3
-	movd		r3,t1
-	punpcklqdq	t1,ru3
-	movdqa		ru3,sv3
-	pslld		$2,sv3
-	paddd		ru3,sv3
-
-	# combine r4,u4 and s4=r4*5,v4=u4*5
-	movd		u4,ru4
-	movd		r4,t1
-	punpcklqdq	t1,ru4
-	movdqa		ru4,sv4
-	pslld		$2,sv4
-	paddd		ru4,sv4
-
-.Ldoblock2:
-	# hc0 = [ m[16-19] & 0x3ffffff, h0 + m[0-3] & 0x3ffffff ]
-	movd		0x00(m),hc0
-	movd		0x10(m),t1
-	punpcklqdq	t1,hc0
-	pand		ANMASK(%rip),hc0
-	movd		h0,t1
-	paddd		t1,hc0
-	# hc1 = [ (m[19-22] >> 2) & 0x3ffffff, h1 + (m[3-6] >> 2) & 0x3ffffff ]
-	movd		0x03(m),hc1
-	movd		0x13(m),t1
-	punpcklqdq	t1,hc1
-	psrld		$2,hc1
-	pand		ANMASK(%rip),hc1
-	movd		h1,t1
-	paddd		t1,hc1
-	# hc2 = [ (m[22-25] >> 4) & 0x3ffffff, h2 + (m[6-9] >> 4) & 0x3ffffff ]
-	movd		0x06(m),hc2
-	movd		0x16(m),t1
-	punpcklqdq	t1,hc2
-	psrld		$4,hc2
-	pand		ANMASK(%rip),hc2
-	movd		h2,t1
-	paddd		t1,hc2
-	# hc3 = [ (m[25-28] >> 6) & 0x3ffffff, h3 + (m[9-12] >> 6) & 0x3ffffff ]
-	movd		0x09(m),hc3
-	movd		0x19(m),t1
-	punpcklqdq	t1,hc3
-	psrld		$6,hc3
-	pand		ANMASK(%rip),hc3
-	movd		h3,t1
-	paddd		t1,hc3
-	# hc4 = [ (m[28-31] >> 8) | (1<<24), h4 + (m[12-15] >> 8) | (1<<24) ]
-	movd		0x0c(m),hc4
-	movd		0x1c(m),t1
-	punpcklqdq	t1,hc4
-	psrld		$8,hc4
-	por		ORMASK(%rip),hc4
-	movd		h4,t1
-	paddd		t1,hc4
-
-	# t1 = [ hc0[1] * r0, hc0[0] * u0 ]
-	movdqa		ru0,t1
-	pmuludq		hc0,t1
-	# t1 += [ hc1[1] * s4, hc1[0] * v4 ]
-	movdqa		sv4,t2
-	pmuludq		hc1,t2
-	paddq		t2,t1
-	# t1 += [ hc2[1] * s3, hc2[0] * v3 ]
-	movdqa		sv3,t2
-	pmuludq		hc2,t2
-	paddq		t2,t1
-	# t1 += [ hc3[1] * s2, hc3[0] * v2 ]
-	movdqa		sv2,t2
-	pmuludq		hc3,t2
-	paddq		t2,t1
-	# t1 += [ hc4[1] * s1, hc4[0] * v1 ]
-	movdqa		sv1,t2
-	pmuludq		hc4,t2
-	paddq		t2,t1
-	# d0 = t1[0] + t1[1]
-	movdqa		t1,t2
-	psrldq		$8,t2
-	paddq		t2,t1
-	movq		t1,d0
-
-	# t1 = [ hc0[1] * r1, hc0[0] * u1 ]
-	movdqa		ru1,t1
-	pmuludq		hc0,t1
-	# t1 += [ hc1[1] * r0, hc1[0] * u0 ]
-	movdqa		ru0,t2
-	pmuludq		hc1,t2
-	paddq		t2,t1
-	# t1 += [ hc2[1] * s4, hc2[0] * v4 ]
-	movdqa		sv4,t2
-	pmuludq		hc2,t2
-	paddq		t2,t1
-	# t1 += [ hc3[1] * s3, hc3[0] * v3 ]
-	movdqa		sv3,t2
-	pmuludq		hc3,t2
-	paddq		t2,t1
-	# t1 += [ hc4[1] * s2, hc4[0] * v2 ]
-	movdqa		sv2,t2
-	pmuludq		hc4,t2
-	paddq		t2,t1
-	# d1 = t1[0] + t1[1]
-	movdqa		t1,t2
-	psrldq		$8,t2
-	paddq		t2,t1
-	movq		t1,d1
-
-	# t1 = [ hc0[1] * r2, hc0[0] * u2 ]
-	movdqa		ru2,t1
-	pmuludq		hc0,t1
-	# t1 += [ hc1[1] * r1, hc1[0] * u1 ]
-	movdqa		ru1,t2
-	pmuludq		hc1,t2
-	paddq		t2,t1
-	# t1 += [ hc2[1] * r0, hc2[0] * u0 ]
-	movdqa		ru0,t2
-	pmuludq		hc2,t2
-	paddq		t2,t1
-	# t1 += [ hc3[1] * s4, hc3[0] * v4 ]
-	movdqa		sv4,t2
-	pmuludq		hc3,t2
-	paddq		t2,t1
-	# t1 += [ hc4[1] * s3, hc4[0] * v3 ]
-	movdqa		sv3,t2
-	pmuludq		hc4,t2
-	paddq		t2,t1
-	# d2 = t1[0] + t1[1]
-	movdqa		t1,t2
-	psrldq		$8,t2
-	paddq		t2,t1
-	movq		t1,d2
-
-	# t1 = [ hc0[1] * r3, hc0[0] * u3 ]
-	movdqa		ru3,t1
-	pmuludq		hc0,t1
-	# t1 += [ hc1[1] * r2, hc1[0] * u2 ]
-	movdqa		ru2,t2
-	pmuludq		hc1,t2
-	paddq		t2,t1
-	# t1 += [ hc2[1] * r1, hc2[0] * u1 ]
-	movdqa		ru1,t2
-	pmuludq		hc2,t2
-	paddq		t2,t1
-	# t1 += [ hc3[1] * r0, hc3[0] * u0 ]
-	movdqa		ru0,t2
-	pmuludq		hc3,t2
-	paddq		t2,t1
-	# t1 += [ hc4[1] * s4, hc4[0] * v4 ]
-	movdqa		sv4,t2
-	pmuludq		hc4,t2
-	paddq		t2,t1
-	# d3 = t1[0] + t1[1]
-	movdqa		t1,t2
-	psrldq		$8,t2
-	paddq		t2,t1
-	movq		t1,d3
-
-	# t1 = [ hc0[1] * r4, hc0[0] * u4 ]
-	movdqa		ru4,t1
-	pmuludq		hc0,t1
-	# t1 += [ hc1[1] * r3, hc1[0] * u3 ]
-	movdqa		ru3,t2
-	pmuludq		hc1,t2
-	paddq		t2,t1
-	# t1 += [ hc2[1] * r2, hc2[0] * u2 ]
-	movdqa		ru2,t2
-	pmuludq		hc2,t2
-	paddq		t2,t1
-	# t1 += [ hc3[1] * r1, hc3[0] * u1 ]
-	movdqa		ru1,t2
-	pmuludq		hc3,t2
-	paddq		t2,t1
-	# t1 += [ hc4[1] * r0, hc4[0] * u0 ]
-	movdqa		ru0,t2
-	pmuludq		hc4,t2
-	paddq		t2,t1
-	# d4 = t1[0] + t1[1]
-	movdqa		t1,t2
-	psrldq		$8,t2
-	paddq		t2,t1
-	movq		t1,d4
-
-	# d1 += d0 >> 26
-	mov		d0,%rax
-	shr		$26,%rax
-	add		%rax,d1
-	# h0 = d0 & 0x3ffffff
-	mov		d0,%rbx
-	and		$0x3ffffff,%ebx
-
-	# d2 += d1 >> 26
-	mov		d1,%rax
-	shr		$26,%rax
-	add		%rax,d2
-	# h1 = d1 & 0x3ffffff
-	mov		d1,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h1
-
-	# d3 += d2 >> 26
-	mov		d2,%rax
-	shr		$26,%rax
-	add		%rax,d3
-	# h2 = d2 & 0x3ffffff
-	mov		d2,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h2
-
-	# d4 += d3 >> 26
-	mov		d3,%rax
-	shr		$26,%rax
-	add		%rax,d4
-	# h3 = d3 & 0x3ffffff
-	mov		d3,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h3
-
-	# h0 += (d4 >> 26) * 5
-	mov		d4,%rax
-	shr		$26,%rax
-	lea		(%eax,%eax,4),%eax
-	add		%eax,%ebx
-	# h4 = d4 & 0x3ffffff
-	mov		d4,%rax
-	and		$0x3ffffff,%eax
-	mov		%eax,h4
-
-	# h1 += h0 >> 26
-	mov		%ebx,%eax
-	shr		$26,%eax
-	add		%eax,h1
-	# h0 = h0 & 0x3ffffff
-	andl		$0x3ffffff,%ebx
-	mov		%ebx,h0
-
-	add		$0x20,m
-	dec		%rcx
-	jnz		.Ldoblock2
-
-	pop		%r13
-	pop		%r12
-	pop		%rbx
-	ret
-ENDPROC(poly1305_2block_sse2)
diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c
deleted file mode 100644
index f012b7e28ad1..000000000000
--- a/arch/x86/crypto/poly1305_glue.c
+++ /dev/null
@@ -1,205 +0,0 @@
-/*
- * Poly1305 authenticator algorithm, RFC7539, SIMD glue code
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <crypto/algapi.h>
-#include <crypto/internal/hash.h>
-#include <crypto/poly1305.h>
-#include <linux/crypto.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <asm/fpu/api.h>
-#include <asm/simd.h>
-
-struct poly1305_simd_desc_ctx {
-	struct poly1305_desc_ctx base;
-	/* derived key u set? */
-	bool uset;
-#ifdef CONFIG_AS_AVX2
-	/* derived keys r^3, r^4 set? */
-	bool wset;
-#endif
-	/* derived Poly1305 key r^2 */
-	u32 u[5];
-	/* ... silently appended r^3 and r^4 when using AVX2 */
-};
-
-asmlinkage void poly1305_block_sse2(u32 *h, const u8 *src,
-				    const u32 *r, unsigned int blocks);
-asmlinkage void poly1305_2block_sse2(u32 *h, const u8 *src, const u32 *r,
-				     unsigned int blocks, const u32 *u);
-#ifdef CONFIG_AS_AVX2
-asmlinkage void poly1305_4block_avx2(u32 *h, const u8 *src, const u32 *r,
-				     unsigned int blocks, const u32 *u);
-static bool poly1305_use_avx2;
-#endif
-
-static int poly1305_simd_init(struct shash_desc *desc)
-{
-	struct poly1305_simd_desc_ctx *sctx = shash_desc_ctx(desc);
-
-	sctx->uset = false;
-#ifdef CONFIG_AS_AVX2
-	sctx->wset = false;
-#endif
-
-	return crypto_poly1305_init(desc);
-}
-
-static void poly1305_simd_mult(u32 *a, const u32 *b)
-{
-	u8 m[POLY1305_BLOCK_SIZE];
-
-	memset(m, 0, sizeof(m));
-	/* The poly1305 block function adds a hi-bit to the accumulator which
-	 * we don't need for key multiplication; compensate for it. */
-	a[4] -= 1 << 24;
-	poly1305_block_sse2(a, m, b, 1);
-}
-
-static unsigned int poly1305_simd_blocks(struct poly1305_desc_ctx *dctx,
-					 const u8 *src, unsigned int srclen)
-{
-	struct poly1305_simd_desc_ctx *sctx;
-	unsigned int blocks, datalen;
-
-	BUILD_BUG_ON(offsetof(struct poly1305_simd_desc_ctx, base));
-	sctx = container_of(dctx, struct poly1305_simd_desc_ctx, base);
-
-	if (unlikely(!dctx->sset)) {
-		datalen = crypto_poly1305_setdesckey(dctx, src, srclen);
-		src += srclen - datalen;
-		srclen = datalen;
-	}
-
-#ifdef CONFIG_AS_AVX2
-	if (poly1305_use_avx2 && srclen >= POLY1305_BLOCK_SIZE * 4) {
-		if (unlikely(!sctx->wset)) {
-			if (!sctx->uset) {
-				memcpy(sctx->u, dctx->r, sizeof(sctx->u));
-				poly1305_simd_mult(sctx->u, dctx->r);
-				sctx->uset = true;
-			}
-			memcpy(sctx->u + 5, sctx->u, sizeof(sctx->u));
-			poly1305_simd_mult(sctx->u + 5, dctx->r);
-			memcpy(sctx->u + 10, sctx->u + 5, sizeof(sctx->u));
-			poly1305_simd_mult(sctx->u + 10, dctx->r);
-			sctx->wset = true;
-		}
-		blocks = srclen / (POLY1305_BLOCK_SIZE * 4);
-		poly1305_4block_avx2(dctx->h, src, dctx->r, blocks, sctx->u);
-		src += POLY1305_BLOCK_SIZE * 4 * blocks;
-		srclen -= POLY1305_BLOCK_SIZE * 4 * blocks;
-	}
-#endif
-	if (likely(srclen >= POLY1305_BLOCK_SIZE * 2)) {
-		if (unlikely(!sctx->uset)) {
-			memcpy(sctx->u, dctx->r, sizeof(sctx->u));
-			poly1305_simd_mult(sctx->u, dctx->r);
-			sctx->uset = true;
-		}
-		blocks = srclen / (POLY1305_BLOCK_SIZE * 2);
-		poly1305_2block_sse2(dctx->h, src, dctx->r, blocks, sctx->u);
-		src += POLY1305_BLOCK_SIZE * 2 * blocks;
-		srclen -= POLY1305_BLOCK_SIZE * 2 * blocks;
-	}
-	if (srclen >= POLY1305_BLOCK_SIZE) {
-		poly1305_block_sse2(dctx->h, src, dctx->r, 1);
-		srclen -= POLY1305_BLOCK_SIZE;
-	}
-	return srclen;
-}
-
-static int poly1305_simd_update(struct shash_desc *desc,
-				const u8 *src, unsigned int srclen)
-{
-	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
-	unsigned int bytes;
-
-	/* kernel_fpu_begin/end is costly, use fallback for small updates */
-	if (srclen <= 288 || !may_use_simd())
-		return crypto_poly1305_update(desc, src, srclen);
-
-	kernel_fpu_begin();
-
-	if (unlikely(dctx->buflen)) {
-		bytes = min(srclen, POLY1305_BLOCK_SIZE - dctx->buflen);
-		memcpy(dctx->buf + dctx->buflen, src, bytes);
-		src += bytes;
-		srclen -= bytes;
-		dctx->buflen += bytes;
-
-		if (dctx->buflen == POLY1305_BLOCK_SIZE) {
-			poly1305_simd_blocks(dctx, dctx->buf,
-					     POLY1305_BLOCK_SIZE);
-			dctx->buflen = 0;
-		}
-	}
-
-	if (likely(srclen >= POLY1305_BLOCK_SIZE)) {
-		bytes = poly1305_simd_blocks(dctx, src, srclen);
-		src += srclen - bytes;
-		srclen = bytes;
-	}
-
-	kernel_fpu_end();
-
-	if (unlikely(srclen)) {
-		dctx->buflen = srclen;
-		memcpy(dctx->buf, src, srclen);
-	}
-
-	return 0;
-}
-
-static struct shash_alg alg = {
-	.digestsize	= POLY1305_DIGEST_SIZE,
-	.init		= poly1305_simd_init,
-	.update		= poly1305_simd_update,
-	.final		= crypto_poly1305_final,
-	.descsize	= sizeof(struct poly1305_simd_desc_ctx),
-	.base		= {
-		.cra_name		= "poly1305",
-		.cra_driver_name	= "poly1305-simd",
-		.cra_priority		= 300,
-		.cra_blocksize		= POLY1305_BLOCK_SIZE,
-		.cra_module		= THIS_MODULE,
-	},
-};
-
-static int __init poly1305_simd_mod_init(void)
-{
-	if (!boot_cpu_has(X86_FEATURE_XMM2))
-		return -ENODEV;
-
-#ifdef CONFIG_AS_AVX2
-	poly1305_use_avx2 = boot_cpu_has(X86_FEATURE_AVX) &&
-			    boot_cpu_has(X86_FEATURE_AVX2) &&
-			    cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
-	alg.descsize = sizeof(struct poly1305_simd_desc_ctx);
-	if (poly1305_use_avx2)
-		alg.descsize += 10 * sizeof(u32);
-#endif
-	return crypto_register_shash(&alg);
-}
-
-static void __exit poly1305_simd_mod_exit(void)
-{
-	crypto_unregister_shash(&alg);
-}
-
-module_init(poly1305_simd_mod_init);
-module_exit(poly1305_simd_mod_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
-MODULE_DESCRIPTION("Poly1305 authenticator");
-MODULE_ALIAS_CRYPTO("poly1305");
-MODULE_ALIAS_CRYPTO("poly1305-simd");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index f3e40ac56d93..47859a0f8052 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -656,24 +656,13 @@ config CRYPTO_GHASH
 config CRYPTO_POLY1305
 	tristate "Poly1305 authenticator algorithm"
 	select CRYPTO_HASH
+	select ZINC_POLY1305
 	help
 	  Poly1305 authenticator algorithm, RFC7539.
 
 	  Poly1305 is an authenticator algorithm designed by Daniel J. Bernstein.
 	  It is used for the ChaCha20-Poly1305 AEAD, specified in RFC7539 for use
-	  in IETF protocols. This is the portable C implementation of Poly1305.
-
-config CRYPTO_POLY1305_X86_64
-	tristate "Poly1305 authenticator algorithm (x86_64/SSE2/AVX2)"
-	depends on X86 && 64BIT
-	select CRYPTO_POLY1305
-	help
-	  Poly1305 authenticator algorithm, RFC7539.
-
-	  Poly1305 is an authenticator algorithm designed by Daniel J. Bernstein.
-	  It is used for the ChaCha20-Poly1305 AEAD, specified in RFC7539 for use
-	  in IETF protocols. This is the x86_64 assembler implementation using SIMD
-	  instructions.
+	  in IETF protocols.
 
 config CRYPTO_MD4
 	tristate "MD4 digest algorithm"
diff --git a/crypto/Makefile b/crypto/Makefile
index 6d1d40eeb964..5e60348d02e2 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -118,7 +118,7 @@ obj-$(CONFIG_CRYPTO_SEED) += seed.o
 obj-$(CONFIG_CRYPTO_SPECK) += speck.o
 obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
 obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
-obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
+obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_zinc.o
 obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
 obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
 obj-$(CONFIG_CRYPTO_CRC32C) += crc32c_generic.o
diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c
index 600afa99941f..bf523797bef3 100644
--- a/crypto/chacha20poly1305.c
+++ b/crypto/chacha20poly1305.c
@@ -14,7 +14,7 @@
 #include <crypto/internal/skcipher.h>
 #include <crypto/scatterwalk.h>
 #include <crypto/chacha20.h>
-#include <crypto/poly1305.h>
+#include <zinc/poly1305.h>
 #include <linux/err.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
@@ -62,7 +62,7 @@ struct chachapoly_req_ctx {
 	/* the key we generate for Poly1305 using Chacha20 */
 	u8 key[POLY1305_KEY_SIZE];
 	/* calculated Poly1305 tag */
-	u8 tag[POLY1305_DIGEST_SIZE];
+	u8 tag[POLY1305_MAC_SIZE];
 	/* length of data to en/decrypt, without ICV */
 	unsigned int cryptlen;
 	/* Actual AD, excluding IV */
@@ -471,7 +471,7 @@ static int chachapoly_decrypt(struct aead_request *req)
 {
 	struct chachapoly_req_ctx *rctx = aead_request_ctx(req);
 
-	rctx->cryptlen = req->cryptlen - POLY1305_DIGEST_SIZE;
+	rctx->cryptlen = req->cryptlen - POLY1305_MAC_SIZE;
 
 	/* decrypt call chain:
 	 * - poly_genkey/done()
@@ -513,7 +513,7 @@ static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key,
 static int chachapoly_setauthsize(struct crypto_aead *tfm,
 				  unsigned int authsize)
 {
-	if (authsize != POLY1305_DIGEST_SIZE)
+	if (authsize != POLY1305_MAC_SIZE)
 		return -EINVAL;
 
 	return 0;
@@ -613,7 +613,7 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb,
 	poly_hash = __crypto_hash_alg_common(poly);
 
 	err = -EINVAL;
-	if (poly_hash->digestsize != POLY1305_DIGEST_SIZE)
+	if (poly_hash->digestsize != POLY1305_MAC_SIZE)
 		goto out_put_poly;
 
 	err = -ENOMEM;
@@ -666,7 +666,7 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb,
 				     ctx->saltlen;
 	inst->alg.ivsize = ivsize;
 	inst->alg.chunksize = crypto_skcipher_alg_chunksize(chacha);
-	inst->alg.maxauthsize = POLY1305_DIGEST_SIZE;
+	inst->alg.maxauthsize = POLY1305_MAC_SIZE;
 	inst->alg.init = chachapoly_init;
 	inst->alg.exit = chachapoly_exit;
 	inst->alg.encrypt = chachapoly_encrypt;
diff --git a/crypto/poly1305_generic.c b/crypto/poly1305_generic.c
deleted file mode 100644
index 47d3a6b83931..000000000000
--- a/crypto/poly1305_generic.c
+++ /dev/null
@@ -1,304 +0,0 @@
-/*
- * Poly1305 authenticator algorithm, RFC7539
- *
- * Copyright (C) 2015 Martin Willi
- *
- * Based on public domain code by Andrew Moon and Daniel J. Bernstein.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <crypto/algapi.h>
-#include <crypto/internal/hash.h>
-#include <crypto/poly1305.h>
-#include <linux/crypto.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <asm/unaligned.h>
-
-static inline u64 mlt(u64 a, u64 b)
-{
-	return a * b;
-}
-
-static inline u32 sr(u64 v, u_char n)
-{
-	return v >> n;
-}
-
-static inline u32 and(u32 v, u32 mask)
-{
-	return v & mask;
-}
-
-int crypto_poly1305_init(struct shash_desc *desc)
-{
-	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
-
-	memset(dctx->h, 0, sizeof(dctx->h));
-	dctx->buflen = 0;
-	dctx->rset = false;
-	dctx->sset = false;
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(crypto_poly1305_init);
-
-static void poly1305_setrkey(struct poly1305_desc_ctx *dctx, const u8 *key)
-{
-	/* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
-	dctx->r[0] = (get_unaligned_le32(key +  0) >> 0) & 0x3ffffff;
-	dctx->r[1] = (get_unaligned_le32(key +  3) >> 2) & 0x3ffff03;
-	dctx->r[2] = (get_unaligned_le32(key +  6) >> 4) & 0x3ffc0ff;
-	dctx->r[3] = (get_unaligned_le32(key +  9) >> 6) & 0x3f03fff;
-	dctx->r[4] = (get_unaligned_le32(key + 12) >> 8) & 0x00fffff;
-}
-
-static void poly1305_setskey(struct poly1305_desc_ctx *dctx, const u8 *key)
-{
-	dctx->s[0] = get_unaligned_le32(key +  0);
-	dctx->s[1] = get_unaligned_le32(key +  4);
-	dctx->s[2] = get_unaligned_le32(key +  8);
-	dctx->s[3] = get_unaligned_le32(key + 12);
-}
-
-/*
- * Poly1305 requires a unique key for each tag, which implies that we can't set
- * it on the tfm that gets accessed by multiple users simultaneously. Instead we
- * expect the key as the first 32 bytes in the update() call.
- */
-unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx,
-					const u8 *src, unsigned int srclen)
-{
-	if (!dctx->sset) {
-		if (!dctx->rset && srclen >= POLY1305_BLOCK_SIZE) {
-			poly1305_setrkey(dctx, src);
-			src += POLY1305_BLOCK_SIZE;
-			srclen -= POLY1305_BLOCK_SIZE;
-			dctx->rset = true;
-		}
-		if (srclen >= POLY1305_BLOCK_SIZE) {
-			poly1305_setskey(dctx, src);
-			src += POLY1305_BLOCK_SIZE;
-			srclen -= POLY1305_BLOCK_SIZE;
-			dctx->sset = true;
-		}
-	}
-	return srclen;
-}
-EXPORT_SYMBOL_GPL(crypto_poly1305_setdesckey);
-
-static unsigned int poly1305_blocks(struct poly1305_desc_ctx *dctx,
-				    const u8 *src, unsigned int srclen,
-				    u32 hibit)
-{
-	u32 r0, r1, r2, r3, r4;
-	u32 s1, s2, s3, s4;
-	u32 h0, h1, h2, h3, h4;
-	u64 d0, d1, d2, d3, d4;
-	unsigned int datalen;
-
-	if (unlikely(!dctx->sset)) {
-		datalen = crypto_poly1305_setdesckey(dctx, src, srclen);
-		src += srclen - datalen;
-		srclen = datalen;
-	}
-
-	r0 = dctx->r[0];
-	r1 = dctx->r[1];
-	r2 = dctx->r[2];
-	r3 = dctx->r[3];
-	r4 = dctx->r[4];
-
-	s1 = r1 * 5;
-	s2 = r2 * 5;
-	s3 = r3 * 5;
-	s4 = r4 * 5;
-
-	h0 = dctx->h[0];
-	h1 = dctx->h[1];
-	h2 = dctx->h[2];
-	h3 = dctx->h[3];
-	h4 = dctx->h[4];
-
-	while (likely(srclen >= POLY1305_BLOCK_SIZE)) {
-
-		/* h += m[i] */
-		h0 += (get_unaligned_le32(src +  0) >> 0) & 0x3ffffff;
-		h1 += (get_unaligned_le32(src +  3) >> 2) & 0x3ffffff;
-		h2 += (get_unaligned_le32(src +  6) >> 4) & 0x3ffffff;
-		h3 += (get_unaligned_le32(src +  9) >> 6) & 0x3ffffff;
-		h4 += (get_unaligned_le32(src + 12) >> 8) | hibit;
-
-		/* h *= r */
-		d0 = mlt(h0, r0) + mlt(h1, s4) + mlt(h2, s3) +
-		     mlt(h3, s2) + mlt(h4, s1);
-		d1 = mlt(h0, r1) + mlt(h1, r0) + mlt(h2, s4) +
-		     mlt(h3, s3) + mlt(h4, s2);
-		d2 = mlt(h0, r2) + mlt(h1, r1) + mlt(h2, r0) +
-		     mlt(h3, s4) + mlt(h4, s3);
-		d3 = mlt(h0, r3) + mlt(h1, r2) + mlt(h2, r1) +
-		     mlt(h3, r0) + mlt(h4, s4);
-		d4 = mlt(h0, r4) + mlt(h1, r3) + mlt(h2, r2) +
-		     mlt(h3, r1) + mlt(h4, r0);
-
-		/* (partial) h %= p */
-		d1 += sr(d0, 26);     h0 = and(d0, 0x3ffffff);
-		d2 += sr(d1, 26);     h1 = and(d1, 0x3ffffff);
-		d3 += sr(d2, 26);     h2 = and(d2, 0x3ffffff);
-		d4 += sr(d3, 26);     h3 = and(d3, 0x3ffffff);
-		h0 += sr(d4, 26) * 5; h4 = and(d4, 0x3ffffff);
-		h1 += h0 >> 26;       h0 = h0 & 0x3ffffff;
-
-		src += POLY1305_BLOCK_SIZE;
-		srclen -= POLY1305_BLOCK_SIZE;
-	}
-
-	dctx->h[0] = h0;
-	dctx->h[1] = h1;
-	dctx->h[2] = h2;
-	dctx->h[3] = h3;
-	dctx->h[4] = h4;
-
-	return srclen;
-}
-
-int crypto_poly1305_update(struct shash_desc *desc,
-			   const u8 *src, unsigned int srclen)
-{
-	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
-	unsigned int bytes;
-
-	if (unlikely(dctx->buflen)) {
-		bytes = min(srclen, POLY1305_BLOCK_SIZE - dctx->buflen);
-		memcpy(dctx->buf + dctx->buflen, src, bytes);
-		src += bytes;
-		srclen -= bytes;
-		dctx->buflen += bytes;
-
-		if (dctx->buflen == POLY1305_BLOCK_SIZE) {
-			poly1305_blocks(dctx, dctx->buf,
-					POLY1305_BLOCK_SIZE, 1 << 24);
-			dctx->buflen = 0;
-		}
-	}
-
-	if (likely(srclen >= POLY1305_BLOCK_SIZE)) {
-		bytes = poly1305_blocks(dctx, src, srclen, 1 << 24);
-		src += srclen - bytes;
-		srclen = bytes;
-	}
-
-	if (unlikely(srclen)) {
-		dctx->buflen = srclen;
-		memcpy(dctx->buf, src, srclen);
-	}
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(crypto_poly1305_update);
-
-int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
-{
-	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
-	u32 h0, h1, h2, h3, h4;
-	u32 g0, g1, g2, g3, g4;
-	u32 mask;
-	u64 f = 0;
-
-	if (unlikely(!dctx->sset))
-		return -ENOKEY;
-
-	if (unlikely(dctx->buflen)) {
-		dctx->buf[dctx->buflen++] = 1;
-		memset(dctx->buf + dctx->buflen, 0,
-		       POLY1305_BLOCK_SIZE - dctx->buflen);
-		poly1305_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE, 0);
-	}
-
-	/* fully carry h */
-	h0 = dctx->h[0];
-	h1 = dctx->h[1];
-	h2 = dctx->h[2];
-	h3 = dctx->h[3];
-	h4 = dctx->h[4];
-
-	h2 += (h1 >> 26);     h1 = h1 & 0x3ffffff;
-	h3 += (h2 >> 26);     h2 = h2 & 0x3ffffff;
-	h4 += (h3 >> 26);     h3 = h3 & 0x3ffffff;
-	h0 += (h4 >> 26) * 5; h4 = h4 & 0x3ffffff;
-	h1 += (h0 >> 26);     h0 = h0 & 0x3ffffff;
-
-	/* compute h + -p */
-	g0 = h0 + 5;
-	g1 = h1 + (g0 >> 26);             g0 &= 0x3ffffff;
-	g2 = h2 + (g1 >> 26);             g1 &= 0x3ffffff;
-	g3 = h3 + (g2 >> 26);             g2 &= 0x3ffffff;
-	g4 = h4 + (g3 >> 26) - (1 << 26); g3 &= 0x3ffffff;
-
-	/* select h if h < p, or h + -p if h >= p */
-	mask = (g4 >> ((sizeof(u32) * 8) - 1)) - 1;
-	g0 &= mask;
-	g1 &= mask;
-	g2 &= mask;
-	g3 &= mask;
-	g4 &= mask;
-	mask = ~mask;
-	h0 = (h0 & mask) | g0;
-	h1 = (h1 & mask) | g1;
-	h2 = (h2 & mask) | g2;
-	h3 = (h3 & mask) | g3;
-	h4 = (h4 & mask) | g4;
-
-	/* h = h % (2^128) */
-	h0 = (h0 >>  0) | (h1 << 26);
-	h1 = (h1 >>  6) | (h2 << 20);
-	h2 = (h2 >> 12) | (h3 << 14);
-	h3 = (h3 >> 18) | (h4 <<  8);
-
-	/* mac = (h + s) % (2^128) */
-	f = (f >> 32) + h0 + dctx->s[0]; put_unaligned_le32(f, dst +  0);
-	f = (f >> 32) + h1 + dctx->s[1]; put_unaligned_le32(f, dst +  4);
-	f = (f >> 32) + h2 + dctx->s[2]; put_unaligned_le32(f, dst +  8);
-	f = (f >> 32) + h3 + dctx->s[3]; put_unaligned_le32(f, dst + 12);
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(crypto_poly1305_final);
-
-static struct shash_alg poly1305_alg = {
-	.digestsize	= POLY1305_DIGEST_SIZE,
-	.init		= crypto_poly1305_init,
-	.update		= crypto_poly1305_update,
-	.final		= crypto_poly1305_final,
-	.descsize	= sizeof(struct poly1305_desc_ctx),
-	.base		= {
-		.cra_name		= "poly1305",
-		.cra_driver_name	= "poly1305-generic",
-		.cra_priority		= 100,
-		.cra_blocksize		= POLY1305_BLOCK_SIZE,
-		.cra_module		= THIS_MODULE,
-	},
-};
-
-static int __init poly1305_mod_init(void)
-{
-	return crypto_register_shash(&poly1305_alg);
-}
-
-static void __exit poly1305_mod_exit(void)
-{
-	crypto_unregister_shash(&poly1305_alg);
-}
-
-module_init(poly1305_mod_init);
-module_exit(poly1305_mod_exit);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
-MODULE_DESCRIPTION("Poly1305 authenticator");
-MODULE_ALIAS_CRYPTO("poly1305");
-MODULE_ALIAS_CRYPTO("poly1305-generic");
diff --git a/crypto/poly1305_zinc.c b/crypto/poly1305_zinc.c
new file mode 100644
index 000000000000..630bbb1ba230
--- /dev/null
+++ b/crypto/poly1305_zinc.c
@@ -0,0 +1,92 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/internal/hash.h>
+#include <zinc/poly1305.h>
+#include <linux/crypto.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/simd.h>
+
+struct poly1305_desc_ctx {
+	struct poly1305_ctx ctx;
+	u8 key[POLY1305_KEY_SIZE];
+	unsigned int rem_key_bytes;
+};
+
+static int crypto_poly1305_init(struct shash_desc *desc)
+{
+	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
+	dctx->rem_key_bytes = POLY1305_KEY_SIZE;
+	return 0;
+}
+
+static int crypto_poly1305_update(struct shash_desc *desc, const u8 *src,
+				  unsigned int srclen)
+{
+	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
+	simd_context_t simd_context = simd_get();
+
+	if (unlikely(dctx->rem_key_bytes)) {
+		unsigned int key_bytes = min(srclen, dctx->rem_key_bytes);
+		memcpy(dctx->key + (POLY1305_KEY_SIZE - dctx->rem_key_bytes),
+		       src, key_bytes);
+		src += key_bytes;
+		srclen -= key_bytes;
+		dctx->rem_key_bytes -= key_bytes;
+		if (!dctx->rem_key_bytes) {
+			poly1305_init(&dctx->ctx, dctx->key, simd_context);
+			memzero_explicit(dctx->key, sizeof(dctx->key));
+		}
+	}
+	poly1305_update(&dctx->ctx, src, srclen, simd_context);
+	simd_put(simd_context);
+
+	return 0;
+}
+
+static int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
+{
+	struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
+	simd_context_t simd_context = simd_get();
+	poly1305_final(&dctx->ctx, dst, simd_context);
+	simd_put(simd_context);
+	return 0;
+}
+
+static struct shash_alg poly1305_alg = {
+	.digestsize	= POLY1305_MAC_SIZE,
+	.init		= crypto_poly1305_init,
+	.update		= crypto_poly1305_update,
+	.final		= crypto_poly1305_final,
+	.descsize	= sizeof(struct poly1305_desc_ctx),
+	.base		= {
+		.cra_name		= "poly1305",
+		.cra_driver_name	= "poly1305-software",
+		.cra_priority		= 100,
+		.cra_blocksize		= POLY1305_BLOCK_SIZE,
+		.cra_module		= THIS_MODULE,
+	},
+};
+
+static int __init poly1305_mod_init(void)
+{
+	return crypto_register_shash(&poly1305_alg);
+}
+
+static void __exit poly1305_mod_exit(void)
+{
+	crypto_unregister_shash(&poly1305_alg);
+}
+
+module_init(poly1305_mod_init);
+module_exit(poly1305_mod_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_DESCRIPTION("Poly1305 authenticator");
+MODULE_ALIAS_CRYPTO("poly1305");
+MODULE_ALIAS_CRYPTO("poly1305-software");
diff --git a/include/crypto/poly1305.h b/include/crypto/poly1305.h
deleted file mode 100644
index f718a19da82f..000000000000
--- a/include/crypto/poly1305.h
+++ /dev/null
@@ -1,40 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Common values for the Poly1305 algorithm
- */
-
-#ifndef _CRYPTO_POLY1305_H
-#define _CRYPTO_POLY1305_H
-
-#include <linux/types.h>
-#include <linux/crypto.h>
-
-#define POLY1305_BLOCK_SIZE	16
-#define POLY1305_KEY_SIZE	32
-#define POLY1305_DIGEST_SIZE	16
-
-struct poly1305_desc_ctx {
-	/* key */
-	u32 r[5];
-	/* finalize key */
-	u32 s[4];
-	/* accumulator */
-	u32 h[5];
-	/* partial buffer */
-	u8 buf[POLY1305_BLOCK_SIZE];
-	/* bytes used in partial buffer */
-	unsigned int buflen;
-	/* r key has been set */
-	bool rset;
-	/* s key has been set */
-	bool sset;
-};
-
-int crypto_poly1305_init(struct shash_desc *desc);
-unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx,
-					const u8 *src, unsigned int srclen);
-int crypto_poly1305_update(struct shash_desc *desc,
-			   const u8 *src, unsigned int srclen);
-int crypto_poly1305_final(struct shash_desc *desc, u8 *dst);
-
-#endif
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (16 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-14 17:38   ` Ard Biesheuvel
                     ` (3 more replies)
  2018-09-14 16:22 ` [PATCH net-next v4 19/20] security/keys: rewrite big_key crypto to use Zinc Jason A. Donenfeld
  2018-09-14 16:22 ` [PATCH net-next v4 20/20] net: WireGuard secure network tunnel Jason A. Donenfeld
  19 siblings, 4 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers

Now that ChaCha20 is in Zinc, we can have the crypto API code simply
call into it. The crypto API expects to have a stored key per instance
and independent nonces, so we follow suite and store the key and
initialize the nonce independently.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Eric Biggers <ebiggers@google.com>
---
 arch/arm/configs/exynos_defconfig       |   1 -
 arch/arm/configs/multi_v7_defconfig     |   1 -
 arch/arm/configs/omap2plus_defconfig    |   1 -
 arch/arm/crypto/Kconfig                 |   6 -
 arch/arm/crypto/Makefile                |   2 -
 arch/arm/crypto/chacha20-neon-core.S    | 521 --------------------
 arch/arm/crypto/chacha20-neon-glue.c    | 127 -----
 arch/arm64/configs/defconfig            |   1 -
 arch/arm64/crypto/Kconfig               |   6 -
 arch/arm64/crypto/Makefile              |   3 -
 arch/arm64/crypto/chacha20-neon-core.S  | 450 -----------------
 arch/arm64/crypto/chacha20-neon-glue.c  | 133 -----
 arch/x86/crypto/Makefile                |   3 -
 arch/x86/crypto/chacha20-avx2-x86_64.S  | 448 -----------------
 arch/x86/crypto/chacha20-ssse3-x86_64.S | 630 ------------------------
 arch/x86/crypto/chacha20_glue.c         | 146 ------
 crypto/Kconfig                          |  16 -
 crypto/Makefile                         |   2 +-
 crypto/chacha20_generic.c               | 136 -----
 crypto/chacha20_zinc.c                  | 100 ++++
 crypto/chacha20poly1305.c               |   2 +-
 include/crypto/chacha20.h               |  12 -
 22 files changed, 102 insertions(+), 2645 deletions(-)
 delete mode 100644 arch/arm/crypto/chacha20-neon-core.S
 delete mode 100644 arch/arm/crypto/chacha20-neon-glue.c
 delete mode 100644 arch/arm64/crypto/chacha20-neon-core.S
 delete mode 100644 arch/arm64/crypto/chacha20-neon-glue.c
 delete mode 100644 arch/x86/crypto/chacha20-avx2-x86_64.S
 delete mode 100644 arch/x86/crypto/chacha20-ssse3-x86_64.S
 delete mode 100644 arch/x86/crypto/chacha20_glue.c
 delete mode 100644 crypto/chacha20_generic.c
 create mode 100644 crypto/chacha20_zinc.c

diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig
index 27ea6dfcf2f2..95929b5e7b10 100644
--- a/arch/arm/configs/exynos_defconfig
+++ b/arch/arm/configs/exynos_defconfig
@@ -350,7 +350,6 @@ CONFIG_CRYPTO_SHA1_ARM_NEON=m
 CONFIG_CRYPTO_SHA256_ARM=m
 CONFIG_CRYPTO_SHA512_ARM=m
 CONFIG_CRYPTO_AES_ARM_BS=m
-CONFIG_CRYPTO_CHACHA20_NEON=m
 CONFIG_CRC_CCITT=y
 CONFIG_FONTS=y
 CONFIG_FONT_7x14=y
diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig
index fc33444e94f0..63be07724db3 100644
--- a/arch/arm/configs/multi_v7_defconfig
+++ b/arch/arm/configs/multi_v7_defconfig
@@ -1000,4 +1000,3 @@ CONFIG_CRYPTO_AES_ARM_BS=m
 CONFIG_CRYPTO_AES_ARM_CE=m
 CONFIG_CRYPTO_GHASH_ARM_CE=m
 CONFIG_CRYPTO_CRC32_ARM_CE=m
-CONFIG_CRYPTO_CHACHA20_NEON=m
diff --git a/arch/arm/configs/omap2plus_defconfig b/arch/arm/configs/omap2plus_defconfig
index 6491419b1dad..f585a8ecc336 100644
--- a/arch/arm/configs/omap2plus_defconfig
+++ b/arch/arm/configs/omap2plus_defconfig
@@ -547,7 +547,6 @@ CONFIG_CRYPTO_SHA512_ARM=m
 CONFIG_CRYPTO_AES_ARM=m
 CONFIG_CRYPTO_AES_ARM_BS=m
 CONFIG_CRYPTO_GHASH_ARM_CE=m
-CONFIG_CRYPTO_CHACHA20_NEON=m
 CONFIG_CRC_CCITT=y
 CONFIG_CRC_T10DIF=y
 CONFIG_CRC_ITU_T=y
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 925d1364727a..fb80fd89f0e7 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -115,12 +115,6 @@ config CRYPTO_CRC32_ARM_CE
 	depends on KERNEL_MODE_NEON && CRC32
 	select CRYPTO_HASH
 
-config CRYPTO_CHACHA20_NEON
-	tristate "NEON accelerated ChaCha20 symmetric cipher"
-	depends on KERNEL_MODE_NEON
-	select CRYPTO_BLKCIPHER
-	select CRYPTO_CHACHA20
-
 config CRYPTO_SPECK_NEON
 	tristate "NEON accelerated Speck cipher algorithms"
 	depends on KERNEL_MODE_NEON
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 8de542c48ade..bbfa98447063 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -9,7 +9,6 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
-obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
 obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
 
 ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
@@ -53,7 +52,6 @@ aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
-chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
 speck-neon-y := speck-neon-core.o speck-neon-glue.o
 
 ifdef REGENERATE_ARM_CRYPTO
diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S
deleted file mode 100644
index 451a849ad518..000000000000
--- a/arch/arm/crypto/chacha20-neon-core.S
+++ /dev/null
@@ -1,521 +0,0 @@
-/*
- * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
- *
- * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * Based on:
- * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSE3 functions
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <linux/linkage.h>
-
-	.text
-	.fpu		neon
-	.align		5
-
-ENTRY(chacha20_block_xor_neon)
-	// r0: Input state matrix, s
-	// r1: 1 data block output, o
-	// r2: 1 data block input, i
-
-	//
-	// This function encrypts one ChaCha20 block by loading the state matrix
-	// in four NEON registers. It performs matrix operation on four words in
-	// parallel, but requireds shuffling to rearrange the words after each
-	// round.
-	//
-
-	// x0..3 = s0..3
-	add		ip, r0, #0x20
-	vld1.32		{q0-q1}, [r0]
-	vld1.32		{q2-q3}, [ip]
-
-	vmov		q8, q0
-	vmov		q9, q1
-	vmov		q10, q2
-	vmov		q11, q3
-
-	mov		r3, #10
-
-.Ldoubleround:
-	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
-	vadd.i32	q0, q0, q1
-	veor		q3, q3, q0
-	vrev32.16	q3, q3
-
-	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
-	vadd.i32	q2, q2, q3
-	veor		q4, q1, q2
-	vshl.u32	q1, q4, #12
-	vsri.u32	q1, q4, #20
-
-	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
-	vadd.i32	q0, q0, q1
-	veor		q4, q3, q0
-	vshl.u32	q3, q4, #8
-	vsri.u32	q3, q4, #24
-
-	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
-	vadd.i32	q2, q2, q3
-	veor		q4, q1, q2
-	vshl.u32	q1, q4, #7
-	vsri.u32	q1, q4, #25
-
-	// x1 = shuffle32(x1, MASK(0, 3, 2, 1))
-	vext.8		q1, q1, q1, #4
-	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
-	vext.8		q2, q2, q2, #8
-	// x3 = shuffle32(x3, MASK(2, 1, 0, 3))
-	vext.8		q3, q3, q3, #12
-
-	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
-	vadd.i32	q0, q0, q1
-	veor		q3, q3, q0
-	vrev32.16	q3, q3
-
-	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
-	vadd.i32	q2, q2, q3
-	veor		q4, q1, q2
-	vshl.u32	q1, q4, #12
-	vsri.u32	q1, q4, #20
-
-	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
-	vadd.i32	q0, q0, q1
-	veor		q4, q3, q0
-	vshl.u32	q3, q4, #8
-	vsri.u32	q3, q4, #24
-
-	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
-	vadd.i32	q2, q2, q3
-	veor		q4, q1, q2
-	vshl.u32	q1, q4, #7
-	vsri.u32	q1, q4, #25
-
-	// x1 = shuffle32(x1, MASK(2, 1, 0, 3))
-	vext.8		q1, q1, q1, #12
-	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
-	vext.8		q2, q2, q2, #8
-	// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
-	vext.8		q3, q3, q3, #4
-
-	subs		r3, r3, #1
-	bne		.Ldoubleround
-
-	add		ip, r2, #0x20
-	vld1.8		{q4-q5}, [r2]
-	vld1.8		{q6-q7}, [ip]
-
-	// o0 = i0 ^ (x0 + s0)
-	vadd.i32	q0, q0, q8
-	veor		q0, q0, q4
-
-	// o1 = i1 ^ (x1 + s1)
-	vadd.i32	q1, q1, q9
-	veor		q1, q1, q5
-
-	// o2 = i2 ^ (x2 + s2)
-	vadd.i32	q2, q2, q10
-	veor		q2, q2, q6
-
-	// o3 = i3 ^ (x3 + s3)
-	vadd.i32	q3, q3, q11
-	veor		q3, q3, q7
-
-	add		ip, r1, #0x20
-	vst1.8		{q0-q1}, [r1]
-	vst1.8		{q2-q3}, [ip]
-
-	bx		lr
-ENDPROC(chacha20_block_xor_neon)
-
-	.align		5
-ENTRY(chacha20_4block_xor_neon)
-	push		{r4-r6, lr}
-	mov		ip, sp			// preserve the stack pointer
-	sub		r3, sp, #0x20		// allocate a 32 byte buffer
-	bic		r3, r3, #0x1f		// aligned to 32 bytes
-	mov		sp, r3
-
-	// r0: Input state matrix, s
-	// r1: 4 data blocks output, o
-	// r2: 4 data blocks input, i
-
-	//
-	// This function encrypts four consecutive ChaCha20 blocks by loading
-	// the state matrix in NEON registers four times. The algorithm performs
-	// each operation on the corresponding word of each state matrix, hence
-	// requires no word shuffling. For final XORing step we transpose the
-	// matrix by interleaving 32- and then 64-bit words, which allows us to
-	// do XOR in NEON registers.
-	//
-
-	// x0..15[0-3] = s0..3[0..3]
-	add		r3, r0, #0x20
-	vld1.32		{q0-q1}, [r0]
-	vld1.32		{q2-q3}, [r3]
-
-	adr		r3, CTRINC
-	vdup.32		q15, d7[1]
-	vdup.32		q14, d7[0]
-	vld1.32		{q11}, [r3, :128]
-	vdup.32		q13, d6[1]
-	vdup.32		q12, d6[0]
-	vadd.i32	q12, q12, q11		// x12 += counter values 0-3
-	vdup.32		q11, d5[1]
-	vdup.32		q10, d5[0]
-	vdup.32		q9, d4[1]
-	vdup.32		q8, d4[0]
-	vdup.32		q7, d3[1]
-	vdup.32		q6, d3[0]
-	vdup.32		q5, d2[1]
-	vdup.32		q4, d2[0]
-	vdup.32		q3, d1[1]
-	vdup.32		q2, d1[0]
-	vdup.32		q1, d0[1]
-	vdup.32		q0, d0[0]
-
-	mov		r3, #10
-
-.Ldoubleround4:
-	// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
-	// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
-	// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
-	// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
-	vadd.i32	q0, q0, q4
-	vadd.i32	q1, q1, q5
-	vadd.i32	q2, q2, q6
-	vadd.i32	q3, q3, q7
-
-	veor		q12, q12, q0
-	veor		q13, q13, q1
-	veor		q14, q14, q2
-	veor		q15, q15, q3
-
-	vrev32.16	q12, q12
-	vrev32.16	q13, q13
-	vrev32.16	q14, q14
-	vrev32.16	q15, q15
-
-	// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
-	// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
-	// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
-	// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
-	vadd.i32	q8, q8, q12
-	vadd.i32	q9, q9, q13
-	vadd.i32	q10, q10, q14
-	vadd.i32	q11, q11, q15
-
-	vst1.32		{q8-q9}, [sp, :256]
-
-	veor		q8, q4, q8
-	veor		q9, q5, q9
-	vshl.u32	q4, q8, #12
-	vshl.u32	q5, q9, #12
-	vsri.u32	q4, q8, #20
-	vsri.u32	q5, q9, #20
-
-	veor		q8, q6, q10
-	veor		q9, q7, q11
-	vshl.u32	q6, q8, #12
-	vshl.u32	q7, q9, #12
-	vsri.u32	q6, q8, #20
-	vsri.u32	q7, q9, #20
-
-	// x0 += x4, x12 = rotl32(x12 ^ x0, 8)
-	// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
-	// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
-	// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
-	vadd.i32	q0, q0, q4
-	vadd.i32	q1, q1, q5
-	vadd.i32	q2, q2, q6
-	vadd.i32	q3, q3, q7
-
-	veor		q8, q12, q0
-	veor		q9, q13, q1
-	vshl.u32	q12, q8, #8
-	vshl.u32	q13, q9, #8
-	vsri.u32	q12, q8, #24
-	vsri.u32	q13, q9, #24
-
-	veor		q8, q14, q2
-	veor		q9, q15, q3
-	vshl.u32	q14, q8, #8
-	vshl.u32	q15, q9, #8
-	vsri.u32	q14, q8, #24
-	vsri.u32	q15, q9, #24
-
-	vld1.32		{q8-q9}, [sp, :256]
-
-	// x8 += x12, x4 = rotl32(x4 ^ x8, 7)
-	// x9 += x13, x5 = rotl32(x5 ^ x9, 7)
-	// x10 += x14, x6 = rotl32(x6 ^ x10, 7)
-	// x11 += x15, x7 = rotl32(x7 ^ x11, 7)
-	vadd.i32	q8, q8, q12
-	vadd.i32	q9, q9, q13
-	vadd.i32	q10, q10, q14
-	vadd.i32	q11, q11, q15
-
-	vst1.32		{q8-q9}, [sp, :256]
-
-	veor		q8, q4, q8
-	veor		q9, q5, q9
-	vshl.u32	q4, q8, #7
-	vshl.u32	q5, q9, #7
-	vsri.u32	q4, q8, #25
-	vsri.u32	q5, q9, #25
-
-	veor		q8, q6, q10
-	veor		q9, q7, q11
-	vshl.u32	q6, q8, #7
-	vshl.u32	q7, q9, #7
-	vsri.u32	q6, q8, #25
-	vsri.u32	q7, q9, #25
-
-	vld1.32		{q8-q9}, [sp, :256]
-
-	// x0 += x5, x15 = rotl32(x15 ^ x0, 16)
-	// x1 += x6, x12 = rotl32(x12 ^ x1, 16)
-	// x2 += x7, x13 = rotl32(x13 ^ x2, 16)
-	// x3 += x4, x14 = rotl32(x14 ^ x3, 16)
-	vadd.i32	q0, q0, q5
-	vadd.i32	q1, q1, q6
-	vadd.i32	q2, q2, q7
-	vadd.i32	q3, q3, q4
-
-	veor		q15, q15, q0
-	veor		q12, q12, q1
-	veor		q13, q13, q2
-	veor		q14, q14, q3
-
-	vrev32.16	q15, q15
-	vrev32.16	q12, q12
-	vrev32.16	q13, q13
-	vrev32.16	q14, q14
-
-	// x10 += x15, x5 = rotl32(x5 ^ x10, 12)
-	// x11 += x12, x6 = rotl32(x6 ^ x11, 12)
-	// x8 += x13, x7 = rotl32(x7 ^ x8, 12)
-	// x9 += x14, x4 = rotl32(x4 ^ x9, 12)
-	vadd.i32	q10, q10, q15
-	vadd.i32	q11, q11, q12
-	vadd.i32	q8, q8, q13
-	vadd.i32	q9, q9, q14
-
-	vst1.32		{q8-q9}, [sp, :256]
-
-	veor		q8, q7, q8
-	veor		q9, q4, q9
-	vshl.u32	q7, q8, #12
-	vshl.u32	q4, q9, #12
-	vsri.u32	q7, q8, #20
-	vsri.u32	q4, q9, #20
-
-	veor		q8, q5, q10
-	veor		q9, q6, q11
-	vshl.u32	q5, q8, #12
-	vshl.u32	q6, q9, #12
-	vsri.u32	q5, q8, #20
-	vsri.u32	q6, q9, #20
-
-	// x0 += x5, x15 = rotl32(x15 ^ x0, 8)
-	// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
-	// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
-	// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
-	vadd.i32	q0, q0, q5
-	vadd.i32	q1, q1, q6
-	vadd.i32	q2, q2, q7
-	vadd.i32	q3, q3, q4
-
-	veor		q8, q15, q0
-	veor		q9, q12, q1
-	vshl.u32	q15, q8, #8
-	vshl.u32	q12, q9, #8
-	vsri.u32	q15, q8, #24
-	vsri.u32	q12, q9, #24
-
-	veor		q8, q13, q2
-	veor		q9, q14, q3
-	vshl.u32	q13, q8, #8
-	vshl.u32	q14, q9, #8
-	vsri.u32	q13, q8, #24
-	vsri.u32	q14, q9, #24
-
-	vld1.32		{q8-q9}, [sp, :256]
-
-	// x10 += x15, x5 = rotl32(x5 ^ x10, 7)
-	// x11 += x12, x6 = rotl32(x6 ^ x11, 7)
-	// x8 += x13, x7 = rotl32(x7 ^ x8, 7)
-	// x9 += x14, x4 = rotl32(x4 ^ x9, 7)
-	vadd.i32	q10, q10, q15
-	vadd.i32	q11, q11, q12
-	vadd.i32	q8, q8, q13
-	vadd.i32	q9, q9, q14
-
-	vst1.32		{q8-q9}, [sp, :256]
-
-	veor		q8, q7, q8
-	veor		q9, q4, q9
-	vshl.u32	q7, q8, #7
-	vshl.u32	q4, q9, #7
-	vsri.u32	q7, q8, #25
-	vsri.u32	q4, q9, #25
-
-	veor		q8, q5, q10
-	veor		q9, q6, q11
-	vshl.u32	q5, q8, #7
-	vshl.u32	q6, q9, #7
-	vsri.u32	q5, q8, #25
-	vsri.u32	q6, q9, #25
-
-	subs		r3, r3, #1
-	beq		0f
-
-	vld1.32		{q8-q9}, [sp, :256]
-	b		.Ldoubleround4
-
-	// x0[0-3] += s0[0]
-	// x1[0-3] += s0[1]
-	// x2[0-3] += s0[2]
-	// x3[0-3] += s0[3]
-0:	ldmia		r0!, {r3-r6}
-	vdup.32		q8, r3
-	vdup.32		q9, r4
-	vadd.i32	q0, q0, q8
-	vadd.i32	q1, q1, q9
-	vdup.32		q8, r5
-	vdup.32		q9, r6
-	vadd.i32	q2, q2, q8
-	vadd.i32	q3, q3, q9
-
-	// x4[0-3] += s1[0]
-	// x5[0-3] += s1[1]
-	// x6[0-3] += s1[2]
-	// x7[0-3] += s1[3]
-	ldmia		r0!, {r3-r6}
-	vdup.32		q8, r3
-	vdup.32		q9, r4
-	vadd.i32	q4, q4, q8
-	vadd.i32	q5, q5, q9
-	vdup.32		q8, r5
-	vdup.32		q9, r6
-	vadd.i32	q6, q6, q8
-	vadd.i32	q7, q7, q9
-
-	// interleave 32-bit words in state n, n+1
-	vzip.32		q0, q1
-	vzip.32		q2, q3
-	vzip.32		q4, q5
-	vzip.32		q6, q7
-
-	// interleave 64-bit words in state n, n+2
-	vswp		d1, d4
-	vswp		d3, d6
-	vswp		d9, d12
-	vswp		d11, d14
-
-	// xor with corresponding input, write to output
-	vld1.8		{q8-q9}, [r2]!
-	veor		q8, q8, q0
-	veor		q9, q9, q4
-	vst1.8		{q8-q9}, [r1]!
-
-	vld1.32		{q8-q9}, [sp, :256]
-
-	// x8[0-3] += s2[0]
-	// x9[0-3] += s2[1]
-	// x10[0-3] += s2[2]
-	// x11[0-3] += s2[3]
-	ldmia		r0!, {r3-r6}
-	vdup.32		q0, r3
-	vdup.32		q4, r4
-	vadd.i32	q8, q8, q0
-	vadd.i32	q9, q9, q4
-	vdup.32		q0, r5
-	vdup.32		q4, r6
-	vadd.i32	q10, q10, q0
-	vadd.i32	q11, q11, q4
-
-	// x12[0-3] += s3[0]
-	// x13[0-3] += s3[1]
-	// x14[0-3] += s3[2]
-	// x15[0-3] += s3[3]
-	ldmia		r0!, {r3-r6}
-	vdup.32		q0, r3
-	vdup.32		q4, r4
-	adr		r3, CTRINC
-	vadd.i32	q12, q12, q0
-	vld1.32		{q0}, [r3, :128]
-	vadd.i32	q13, q13, q4
-	vadd.i32	q12, q12, q0		// x12 += counter values 0-3
-
-	vdup.32		q0, r5
-	vdup.32		q4, r6
-	vadd.i32	q14, q14, q0
-	vadd.i32	q15, q15, q4
-
-	// interleave 32-bit words in state n, n+1
-	vzip.32		q8, q9
-	vzip.32		q10, q11
-	vzip.32		q12, q13
-	vzip.32		q14, q15
-
-	// interleave 64-bit words in state n, n+2
-	vswp		d17, d20
-	vswp		d19, d22
-	vswp		d25, d28
-	vswp		d27, d30
-
-	vmov		q4, q1
-
-	vld1.8		{q0-q1}, [r2]!
-	veor		q0, q0, q8
-	veor		q1, q1, q12
-	vst1.8		{q0-q1}, [r1]!
-
-	vld1.8		{q0-q1}, [r2]!
-	veor		q0, q0, q2
-	veor		q1, q1, q6
-	vst1.8		{q0-q1}, [r1]!
-
-	vld1.8		{q0-q1}, [r2]!
-	veor		q0, q0, q10
-	veor		q1, q1, q14
-	vst1.8		{q0-q1}, [r1]!
-
-	vld1.8		{q0-q1}, [r2]!
-	veor		q0, q0, q4
-	veor		q1, q1, q5
-	vst1.8		{q0-q1}, [r1]!
-
-	vld1.8		{q0-q1}, [r2]!
-	veor		q0, q0, q9
-	veor		q1, q1, q13
-	vst1.8		{q0-q1}, [r1]!
-
-	vld1.8		{q0-q1}, [r2]!
-	veor		q0, q0, q3
-	veor		q1, q1, q7
-	vst1.8		{q0-q1}, [r1]!
-
-	vld1.8		{q0-q1}, [r2]
-	veor		q0, q0, q11
-	veor		q1, q1, q15
-	vst1.8		{q0-q1}, [r1]
-
-	mov		sp, ip
-	pop		{r4-r6, pc}
-ENDPROC(chacha20_4block_xor_neon)
-
-	.align		4
-CTRINC:	.word		0, 1, 2, 3
diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c
deleted file mode 100644
index 59a7be08e80c..000000000000
--- a/arch/arm/crypto/chacha20-neon-glue.c
+++ /dev/null
@@ -1,127 +0,0 @@
-/*
- * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
- *
- * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * Based on:
- * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <crypto/algapi.h>
-#include <crypto/chacha20.h>
-#include <crypto/internal/skcipher.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-
-#include <asm/hwcap.h>
-#include <asm/neon.h>
-#include <asm/simd.h>
-
-asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
-asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
-
-static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
-			    unsigned int bytes)
-{
-	u8 buf[CHACHA20_BLOCK_SIZE];
-
-	while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
-		chacha20_4block_xor_neon(state, dst, src);
-		bytes -= CHACHA20_BLOCK_SIZE * 4;
-		src += CHACHA20_BLOCK_SIZE * 4;
-		dst += CHACHA20_BLOCK_SIZE * 4;
-		state[12] += 4;
-	}
-	while (bytes >= CHACHA20_BLOCK_SIZE) {
-		chacha20_block_xor_neon(state, dst, src);
-		bytes -= CHACHA20_BLOCK_SIZE;
-		src += CHACHA20_BLOCK_SIZE;
-		dst += CHACHA20_BLOCK_SIZE;
-		state[12]++;
-	}
-	if (bytes) {
-		memcpy(buf, src, bytes);
-		chacha20_block_xor_neon(state, buf, buf);
-		memcpy(dst, buf, bytes);
-	}
-}
-
-static int chacha20_neon(struct skcipher_request *req)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
-	struct skcipher_walk walk;
-	u32 state[16];
-	int err;
-
-	if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
-		return crypto_chacha20_crypt(req);
-
-	err = skcipher_walk_virt(&walk, req, true);
-
-	crypto_chacha20_init(state, ctx, walk.iv);
-
-	kernel_neon_begin();
-	while (walk.nbytes > 0) {
-		unsigned int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes = round_down(nbytes, walk.stride);
-
-		chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
-				nbytes);
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-	}
-	kernel_neon_end();
-
-	return err;
-}
-
-static struct skcipher_alg alg = {
-	.base.cra_name		= "chacha20",
-	.base.cra_driver_name	= "chacha20-neon",
-	.base.cra_priority	= 300,
-	.base.cra_blocksize	= 1,
-	.base.cra_ctxsize	= sizeof(struct chacha20_ctx),
-	.base.cra_module	= THIS_MODULE,
-
-	.min_keysize		= CHACHA20_KEY_SIZE,
-	.max_keysize		= CHACHA20_KEY_SIZE,
-	.ivsize			= CHACHA20_IV_SIZE,
-	.chunksize		= CHACHA20_BLOCK_SIZE,
-	.walksize		= 4 * CHACHA20_BLOCK_SIZE,
-	.setkey			= crypto_chacha20_setkey,
-	.encrypt		= chacha20_neon,
-	.decrypt		= chacha20_neon,
-};
-
-static int __init chacha20_simd_mod_init(void)
-{
-	if (!(elf_hwcap & HWCAP_NEON))
-		return -ENODEV;
-
-	return crypto_register_skcipher(&alg);
-}
-
-static void __exit chacha20_simd_mod_fini(void)
-{
-	crypto_unregister_skcipher(&alg);
-}
-
-module_init(chacha20_simd_mod_init);
-module_exit(chacha20_simd_mod_fini);
-
-MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
-MODULE_LICENSE("GPL v2");
-MODULE_ALIAS_CRYPTO("chacha20");
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index db8d364f8476..6cc3c8a0ad88 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -709,5 +709,4 @@ CONFIG_CRYPTO_CRCT10DIF_ARM64_CE=m
 CONFIG_CRYPTO_CRC32_ARM64_CE=m
 CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
 CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
-CONFIG_CRYPTO_CHACHA20_NEON=m
 CONFIG_CRYPTO_AES_ARM64_BS=m
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index e3fdb0fd6f70..9db6d775a880 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -105,12 +105,6 @@ config CRYPTO_AES_ARM64_NEON_BLK
 	select CRYPTO_AES
 	select CRYPTO_SIMD
 
-config CRYPTO_CHACHA20_NEON
-	tristate "NEON accelerated ChaCha20 symmetric cipher"
-	depends on KERNEL_MODE_NEON
-	select CRYPTO_BLKCIPHER
-	select CRYPTO_CHACHA20
-
 config CRYPTO_AES_ARM64_BS
 	tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
 	depends on KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index bcafd016618e..507c4bfb86e3 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -53,9 +53,6 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
 sha512-arm64-y := sha512-glue.o sha512-core.o
 
-obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
-chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
-
 obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
 speck-neon-y := speck-neon-core.o speck-neon-glue.o
 
diff --git a/arch/arm64/crypto/chacha20-neon-core.S b/arch/arm64/crypto/chacha20-neon-core.S
deleted file mode 100644
index 13c85e272c2a..000000000000
--- a/arch/arm64/crypto/chacha20-neon-core.S
+++ /dev/null
@@ -1,450 +0,0 @@
-/*
- * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
- *
- * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * Based on:
- * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <linux/linkage.h>
-
-	.text
-	.align		6
-
-ENTRY(chacha20_block_xor_neon)
-	// x0: Input state matrix, s
-	// x1: 1 data block output, o
-	// x2: 1 data block input, i
-
-	//
-	// This function encrypts one ChaCha20 block by loading the state matrix
-	// in four NEON registers. It performs matrix operation on four words in
-	// parallel, but requires shuffling to rearrange the words after each
-	// round.
-	//
-
-	// x0..3 = s0..3
-	adr		x3, ROT8
-	ld1		{v0.4s-v3.4s}, [x0]
-	ld1		{v8.4s-v11.4s}, [x0]
-	ld1		{v12.4s}, [x3]
-
-	mov		x3, #10
-
-.Ldoubleround:
-	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
-	add		v0.4s, v0.4s, v1.4s
-	eor		v3.16b, v3.16b, v0.16b
-	rev32		v3.8h, v3.8h
-
-	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
-	add		v2.4s, v2.4s, v3.4s
-	eor		v4.16b, v1.16b, v2.16b
-	shl		v1.4s, v4.4s, #12
-	sri		v1.4s, v4.4s, #20
-
-	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
-	add		v0.4s, v0.4s, v1.4s
-	eor		v3.16b, v3.16b, v0.16b
-	tbl		v3.16b, {v3.16b}, v12.16b
-
-	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
-	add		v2.4s, v2.4s, v3.4s
-	eor		v4.16b, v1.16b, v2.16b
-	shl		v1.4s, v4.4s, #7
-	sri		v1.4s, v4.4s, #25
-
-	// x1 = shuffle32(x1, MASK(0, 3, 2, 1))
-	ext		v1.16b, v1.16b, v1.16b, #4
-	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
-	ext		v2.16b, v2.16b, v2.16b, #8
-	// x3 = shuffle32(x3, MASK(2, 1, 0, 3))
-	ext		v3.16b, v3.16b, v3.16b, #12
-
-	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
-	add		v0.4s, v0.4s, v1.4s
-	eor		v3.16b, v3.16b, v0.16b
-	rev32		v3.8h, v3.8h
-
-	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
-	add		v2.4s, v2.4s, v3.4s
-	eor		v4.16b, v1.16b, v2.16b
-	shl		v1.4s, v4.4s, #12
-	sri		v1.4s, v4.4s, #20
-
-	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
-	add		v0.4s, v0.4s, v1.4s
-	eor		v3.16b, v3.16b, v0.16b
-	tbl		v3.16b, {v3.16b}, v12.16b
-
-	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
-	add		v2.4s, v2.4s, v3.4s
-	eor		v4.16b, v1.16b, v2.16b
-	shl		v1.4s, v4.4s, #7
-	sri		v1.4s, v4.4s, #25
-
-	// x1 = shuffle32(x1, MASK(2, 1, 0, 3))
-	ext		v1.16b, v1.16b, v1.16b, #12
-	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
-	ext		v2.16b, v2.16b, v2.16b, #8
-	// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
-	ext		v3.16b, v3.16b, v3.16b, #4
-
-	subs		x3, x3, #1
-	b.ne		.Ldoubleround
-
-	ld1		{v4.16b-v7.16b}, [x2]
-
-	// o0 = i0 ^ (x0 + s0)
-	add		v0.4s, v0.4s, v8.4s
-	eor		v0.16b, v0.16b, v4.16b
-
-	// o1 = i1 ^ (x1 + s1)
-	add		v1.4s, v1.4s, v9.4s
-	eor		v1.16b, v1.16b, v5.16b
-
-	// o2 = i2 ^ (x2 + s2)
-	add		v2.4s, v2.4s, v10.4s
-	eor		v2.16b, v2.16b, v6.16b
-
-	// o3 = i3 ^ (x3 + s3)
-	add		v3.4s, v3.4s, v11.4s
-	eor		v3.16b, v3.16b, v7.16b
-
-	st1		{v0.16b-v3.16b}, [x1]
-
-	ret
-ENDPROC(chacha20_block_xor_neon)
-
-	.align		6
-ENTRY(chacha20_4block_xor_neon)
-	// x0: Input state matrix, s
-	// x1: 4 data blocks output, o
-	// x2: 4 data blocks input, i
-
-	//
-	// This function encrypts four consecutive ChaCha20 blocks by loading
-	// the state matrix in NEON registers four times. The algorithm performs
-	// each operation on the corresponding word of each state matrix, hence
-	// requires no word shuffling. For final XORing step we transpose the
-	// matrix by interleaving 32- and then 64-bit words, which allows us to
-	// do XOR in NEON registers.
-	//
-	adr		x3, CTRINC		// ... and ROT8
-	ld1		{v30.4s-v31.4s}, [x3]
-
-	// x0..15[0-3] = s0..3[0..3]
-	mov		x4, x0
-	ld4r		{ v0.4s- v3.4s}, [x4], #16
-	ld4r		{ v4.4s- v7.4s}, [x4], #16
-	ld4r		{ v8.4s-v11.4s}, [x4], #16
-	ld4r		{v12.4s-v15.4s}, [x4]
-
-	// x12 += counter values 0-3
-	add		v12.4s, v12.4s, v30.4s
-
-	mov		x3, #10
-
-.Ldoubleround4:
-	// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
-	// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
-	// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
-	// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
-	add		v0.4s, v0.4s, v4.4s
-	add		v1.4s, v1.4s, v5.4s
-	add		v2.4s, v2.4s, v6.4s
-	add		v3.4s, v3.4s, v7.4s
-
-	eor		v12.16b, v12.16b, v0.16b
-	eor		v13.16b, v13.16b, v1.16b
-	eor		v14.16b, v14.16b, v2.16b
-	eor		v15.16b, v15.16b, v3.16b
-
-	rev32		v12.8h, v12.8h
-	rev32		v13.8h, v13.8h
-	rev32		v14.8h, v14.8h
-	rev32		v15.8h, v15.8h
-
-	// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
-	// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
-	// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
-	// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
-	add		v8.4s, v8.4s, v12.4s
-	add		v9.4s, v9.4s, v13.4s
-	add		v10.4s, v10.4s, v14.4s
-	add		v11.4s, v11.4s, v15.4s
-
-	eor		v16.16b, v4.16b, v8.16b
-	eor		v17.16b, v5.16b, v9.16b
-	eor		v18.16b, v6.16b, v10.16b
-	eor		v19.16b, v7.16b, v11.16b
-
-	shl		v4.4s, v16.4s, #12
-	shl		v5.4s, v17.4s, #12
-	shl		v6.4s, v18.4s, #12
-	shl		v7.4s, v19.4s, #12
-
-	sri		v4.4s, v16.4s, #20
-	sri		v5.4s, v17.4s, #20
-	sri		v6.4s, v18.4s, #20
-	sri		v7.4s, v19.4s, #20
-
-	// x0 += x4, x12 = rotl32(x12 ^ x0, 8)
-	// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
-	// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
-	// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
-	add		v0.4s, v0.4s, v4.4s
-	add		v1.4s, v1.4s, v5.4s
-	add		v2.4s, v2.4s, v6.4s
-	add		v3.4s, v3.4s, v7.4s
-
-	eor		v12.16b, v12.16b, v0.16b
-	eor		v13.16b, v13.16b, v1.16b
-	eor		v14.16b, v14.16b, v2.16b
-	eor		v15.16b, v15.16b, v3.16b
-
-	tbl		v12.16b, {v12.16b}, v31.16b
-	tbl		v13.16b, {v13.16b}, v31.16b
-	tbl		v14.16b, {v14.16b}, v31.16b
-	tbl		v15.16b, {v15.16b}, v31.16b
-
-	// x8 += x12, x4 = rotl32(x4 ^ x8, 7)
-	// x9 += x13, x5 = rotl32(x5 ^ x9, 7)
-	// x10 += x14, x6 = rotl32(x6 ^ x10, 7)
-	// x11 += x15, x7 = rotl32(x7 ^ x11, 7)
-	add		v8.4s, v8.4s, v12.4s
-	add		v9.4s, v9.4s, v13.4s
-	add		v10.4s, v10.4s, v14.4s
-	add		v11.4s, v11.4s, v15.4s
-
-	eor		v16.16b, v4.16b, v8.16b
-	eor		v17.16b, v5.16b, v9.16b
-	eor		v18.16b, v6.16b, v10.16b
-	eor		v19.16b, v7.16b, v11.16b
-
-	shl		v4.4s, v16.4s, #7
-	shl		v5.4s, v17.4s, #7
-	shl		v6.4s, v18.4s, #7
-	shl		v7.4s, v19.4s, #7
-
-	sri		v4.4s, v16.4s, #25
-	sri		v5.4s, v17.4s, #25
-	sri		v6.4s, v18.4s, #25
-	sri		v7.4s, v19.4s, #25
-
-	// x0 += x5, x15 = rotl32(x15 ^ x0, 16)
-	// x1 += x6, x12 = rotl32(x12 ^ x1, 16)
-	// x2 += x7, x13 = rotl32(x13 ^ x2, 16)
-	// x3 += x4, x14 = rotl32(x14 ^ x3, 16)
-	add		v0.4s, v0.4s, v5.4s
-	add		v1.4s, v1.4s, v6.4s
-	add		v2.4s, v2.4s, v7.4s
-	add		v3.4s, v3.4s, v4.4s
-
-	eor		v15.16b, v15.16b, v0.16b
-	eor		v12.16b, v12.16b, v1.16b
-	eor		v13.16b, v13.16b, v2.16b
-	eor		v14.16b, v14.16b, v3.16b
-
-	rev32		v15.8h, v15.8h
-	rev32		v12.8h, v12.8h
-	rev32		v13.8h, v13.8h
-	rev32		v14.8h, v14.8h
-
-	// x10 += x15, x5 = rotl32(x5 ^ x10, 12)
-	// x11 += x12, x6 = rotl32(x6 ^ x11, 12)
-	// x8 += x13, x7 = rotl32(x7 ^ x8, 12)
-	// x9 += x14, x4 = rotl32(x4 ^ x9, 12)
-	add		v10.4s, v10.4s, v15.4s
-	add		v11.4s, v11.4s, v12.4s
-	add		v8.4s, v8.4s, v13.4s
-	add		v9.4s, v9.4s, v14.4s
-
-	eor		v16.16b, v5.16b, v10.16b
-	eor		v17.16b, v6.16b, v11.16b
-	eor		v18.16b, v7.16b, v8.16b
-	eor		v19.16b, v4.16b, v9.16b
-
-	shl		v5.4s, v16.4s, #12
-	shl		v6.4s, v17.4s, #12
-	shl		v7.4s, v18.4s, #12
-	shl		v4.4s, v19.4s, #12
-
-	sri		v5.4s, v16.4s, #20
-	sri		v6.4s, v17.4s, #20
-	sri		v7.4s, v18.4s, #20
-	sri		v4.4s, v19.4s, #20
-
-	// x0 += x5, x15 = rotl32(x15 ^ x0, 8)
-	// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
-	// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
-	// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
-	add		v0.4s, v0.4s, v5.4s
-	add		v1.4s, v1.4s, v6.4s
-	add		v2.4s, v2.4s, v7.4s
-	add		v3.4s, v3.4s, v4.4s
-
-	eor		v15.16b, v15.16b, v0.16b
-	eor		v12.16b, v12.16b, v1.16b
-	eor		v13.16b, v13.16b, v2.16b
-	eor		v14.16b, v14.16b, v3.16b
-
-	tbl		v15.16b, {v15.16b}, v31.16b
-	tbl		v12.16b, {v12.16b}, v31.16b
-	tbl		v13.16b, {v13.16b}, v31.16b
-	tbl		v14.16b, {v14.16b}, v31.16b
-
-	// x10 += x15, x5 = rotl32(x5 ^ x10, 7)
-	// x11 += x12, x6 = rotl32(x6 ^ x11, 7)
-	// x8 += x13, x7 = rotl32(x7 ^ x8, 7)
-	// x9 += x14, x4 = rotl32(x4 ^ x9, 7)
-	add		v10.4s, v10.4s, v15.4s
-	add		v11.4s, v11.4s, v12.4s
-	add		v8.4s, v8.4s, v13.4s
-	add		v9.4s, v9.4s, v14.4s
-
-	eor		v16.16b, v5.16b, v10.16b
-	eor		v17.16b, v6.16b, v11.16b
-	eor		v18.16b, v7.16b, v8.16b
-	eor		v19.16b, v4.16b, v9.16b
-
-	shl		v5.4s, v16.4s, #7
-	shl		v6.4s, v17.4s, #7
-	shl		v7.4s, v18.4s, #7
-	shl		v4.4s, v19.4s, #7
-
-	sri		v5.4s, v16.4s, #25
-	sri		v6.4s, v17.4s, #25
-	sri		v7.4s, v18.4s, #25
-	sri		v4.4s, v19.4s, #25
-
-	subs		x3, x3, #1
-	b.ne		.Ldoubleround4
-
-	ld4r		{v16.4s-v19.4s}, [x0], #16
-	ld4r		{v20.4s-v23.4s}, [x0], #16
-
-	// x12 += counter values 0-3
-	add		v12.4s, v12.4s, v30.4s
-
-	// x0[0-3] += s0[0]
-	// x1[0-3] += s0[1]
-	// x2[0-3] += s0[2]
-	// x3[0-3] += s0[3]
-	add		v0.4s, v0.4s, v16.4s
-	add		v1.4s, v1.4s, v17.4s
-	add		v2.4s, v2.4s, v18.4s
-	add		v3.4s, v3.4s, v19.4s
-
-	ld4r		{v24.4s-v27.4s}, [x0], #16
-	ld4r		{v28.4s-v31.4s}, [x0]
-
-	// x4[0-3] += s1[0]
-	// x5[0-3] += s1[1]
-	// x6[0-3] += s1[2]
-	// x7[0-3] += s1[3]
-	add		v4.4s, v4.4s, v20.4s
-	add		v5.4s, v5.4s, v21.4s
-	add		v6.4s, v6.4s, v22.4s
-	add		v7.4s, v7.4s, v23.4s
-
-	// x8[0-3] += s2[0]
-	// x9[0-3] += s2[1]
-	// x10[0-3] += s2[2]
-	// x11[0-3] += s2[3]
-	add		v8.4s, v8.4s, v24.4s
-	add		v9.4s, v9.4s, v25.4s
-	add		v10.4s, v10.4s, v26.4s
-	add		v11.4s, v11.4s, v27.4s
-
-	// x12[0-3] += s3[0]
-	// x13[0-3] += s3[1]
-	// x14[0-3] += s3[2]
-	// x15[0-3] += s3[3]
-	add		v12.4s, v12.4s, v28.4s
-	add		v13.4s, v13.4s, v29.4s
-	add		v14.4s, v14.4s, v30.4s
-	add		v15.4s, v15.4s, v31.4s
-
-	// interleave 32-bit words in state n, n+1
-	zip1		v16.4s, v0.4s, v1.4s
-	zip2		v17.4s, v0.4s, v1.4s
-	zip1		v18.4s, v2.4s, v3.4s
-	zip2		v19.4s, v2.4s, v3.4s
-	zip1		v20.4s, v4.4s, v5.4s
-	zip2		v21.4s, v4.4s, v5.4s
-	zip1		v22.4s, v6.4s, v7.4s
-	zip2		v23.4s, v6.4s, v7.4s
-	zip1		v24.4s, v8.4s, v9.4s
-	zip2		v25.4s, v8.4s, v9.4s
-	zip1		v26.4s, v10.4s, v11.4s
-	zip2		v27.4s, v10.4s, v11.4s
-	zip1		v28.4s, v12.4s, v13.4s
-	zip2		v29.4s, v12.4s, v13.4s
-	zip1		v30.4s, v14.4s, v15.4s
-	zip2		v31.4s, v14.4s, v15.4s
-
-	// interleave 64-bit words in state n, n+2
-	zip1		v0.2d, v16.2d, v18.2d
-	zip2		v4.2d, v16.2d, v18.2d
-	zip1		v8.2d, v17.2d, v19.2d
-	zip2		v12.2d, v17.2d, v19.2d
-	ld1		{v16.16b-v19.16b}, [x2], #64
-
-	zip1		v1.2d, v20.2d, v22.2d
-	zip2		v5.2d, v20.2d, v22.2d
-	zip1		v9.2d, v21.2d, v23.2d
-	zip2		v13.2d, v21.2d, v23.2d
-	ld1		{v20.16b-v23.16b}, [x2], #64
-
-	zip1		v2.2d, v24.2d, v26.2d
-	zip2		v6.2d, v24.2d, v26.2d
-	zip1		v10.2d, v25.2d, v27.2d
-	zip2		v14.2d, v25.2d, v27.2d
-	ld1		{v24.16b-v27.16b}, [x2], #64
-
-	zip1		v3.2d, v28.2d, v30.2d
-	zip2		v7.2d, v28.2d, v30.2d
-	zip1		v11.2d, v29.2d, v31.2d
-	zip2		v15.2d, v29.2d, v31.2d
-	ld1		{v28.16b-v31.16b}, [x2]
-
-	// xor with corresponding input, write to output
-	eor		v16.16b, v16.16b, v0.16b
-	eor		v17.16b, v17.16b, v1.16b
-	eor		v18.16b, v18.16b, v2.16b
-	eor		v19.16b, v19.16b, v3.16b
-	eor		v20.16b, v20.16b, v4.16b
-	eor		v21.16b, v21.16b, v5.16b
-	st1		{v16.16b-v19.16b}, [x1], #64
-	eor		v22.16b, v22.16b, v6.16b
-	eor		v23.16b, v23.16b, v7.16b
-	eor		v24.16b, v24.16b, v8.16b
-	eor		v25.16b, v25.16b, v9.16b
-	st1		{v20.16b-v23.16b}, [x1], #64
-	eor		v26.16b, v26.16b, v10.16b
-	eor		v27.16b, v27.16b, v11.16b
-	eor		v28.16b, v28.16b, v12.16b
-	st1		{v24.16b-v27.16b}, [x1], #64
-	eor		v29.16b, v29.16b, v13.16b
-	eor		v30.16b, v30.16b, v14.16b
-	eor		v31.16b, v31.16b, v15.16b
-	st1		{v28.16b-v31.16b}, [x1]
-
-	ret
-ENDPROC(chacha20_4block_xor_neon)
-
-CTRINC:	.word		0, 1, 2, 3
-ROT8:	.word		0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f
diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
deleted file mode 100644
index 727579c93ded..000000000000
--- a/arch/arm64/crypto/chacha20-neon-glue.c
+++ /dev/null
@@ -1,133 +0,0 @@
-/*
- * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
- *
- * Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * Based on:
- * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <crypto/algapi.h>
-#include <crypto/chacha20.h>
-#include <crypto/internal/skcipher.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-
-#include <asm/hwcap.h>
-#include <asm/neon.h>
-#include <asm/simd.h>
-
-asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
-asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
-
-static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
-			    unsigned int bytes)
-{
-	u8 buf[CHACHA20_BLOCK_SIZE];
-
-	while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
-		kernel_neon_begin();
-		chacha20_4block_xor_neon(state, dst, src);
-		kernel_neon_end();
-		bytes -= CHACHA20_BLOCK_SIZE * 4;
-		src += CHACHA20_BLOCK_SIZE * 4;
-		dst += CHACHA20_BLOCK_SIZE * 4;
-		state[12] += 4;
-	}
-
-	if (!bytes)
-		return;
-
-	kernel_neon_begin();
-	while (bytes >= CHACHA20_BLOCK_SIZE) {
-		chacha20_block_xor_neon(state, dst, src);
-		bytes -= CHACHA20_BLOCK_SIZE;
-		src += CHACHA20_BLOCK_SIZE;
-		dst += CHACHA20_BLOCK_SIZE;
-		state[12]++;
-	}
-	if (bytes) {
-		memcpy(buf, src, bytes);
-		chacha20_block_xor_neon(state, buf, buf);
-		memcpy(dst, buf, bytes);
-	}
-	kernel_neon_end();
-}
-
-static int chacha20_neon(struct skcipher_request *req)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
-	struct skcipher_walk walk;
-	u32 state[16];
-	int err;
-
-	if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
-		return crypto_chacha20_crypt(req);
-
-	err = skcipher_walk_virt(&walk, req, false);
-
-	crypto_chacha20_init(state, ctx, walk.iv);
-
-	while (walk.nbytes > 0) {
-		unsigned int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes = round_down(nbytes, walk.stride);
-
-		chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
-				nbytes);
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-	}
-
-	return err;
-}
-
-static struct skcipher_alg alg = {
-	.base.cra_name		= "chacha20",
-	.base.cra_driver_name	= "chacha20-neon",
-	.base.cra_priority	= 300,
-	.base.cra_blocksize	= 1,
-	.base.cra_ctxsize	= sizeof(struct chacha20_ctx),
-	.base.cra_module	= THIS_MODULE,
-
-	.min_keysize		= CHACHA20_KEY_SIZE,
-	.max_keysize		= CHACHA20_KEY_SIZE,
-	.ivsize			= CHACHA20_IV_SIZE,
-	.chunksize		= CHACHA20_BLOCK_SIZE,
-	.walksize		= 4 * CHACHA20_BLOCK_SIZE,
-	.setkey			= crypto_chacha20_setkey,
-	.encrypt		= chacha20_neon,
-	.decrypt		= chacha20_neon,
-};
-
-static int __init chacha20_simd_mod_init(void)
-{
-	if (!(elf_hwcap & HWCAP_ASIMD))
-		return -ENODEV;
-
-	return crypto_register_skcipher(&alg);
-}
-
-static void __exit chacha20_simd_mod_fini(void)
-{
-	crypto_unregister_skcipher(&alg);
-}
-
-module_init(chacha20_simd_mod_init);
-module_exit(chacha20_simd_mod_fini);
-
-MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
-MODULE_LICENSE("GPL v2");
-MODULE_ALIAS_CRYPTO("chacha20");
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index cf830219846b..419212c31246 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -23,7 +23,6 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
 obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
 obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
 obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o
-obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o
 obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o
 obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
@@ -76,7 +75,6 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
 blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
 twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
 twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o
-chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
 serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
 
 aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
@@ -99,7 +97,6 @@ endif
 
 ifeq ($(avx2_supported),yes)
 	camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o
-	chacha20-x86_64-y += chacha20-avx2-x86_64.o
 	serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
 
 	morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
diff --git a/arch/x86/crypto/chacha20-avx2-x86_64.S b/arch/x86/crypto/chacha20-avx2-x86_64.S
deleted file mode 100644
index f3cd26f48332..000000000000
--- a/arch/x86/crypto/chacha20-avx2-x86_64.S
+++ /dev/null
@@ -1,448 +0,0 @@
-/*
- * ChaCha20 256-bit cipher algorithm, RFC7539, x64 AVX2 functions
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <linux/linkage.h>
-
-.section	.rodata.cst32.ROT8, "aM", @progbits, 32
-.align 32
-ROT8:	.octa 0x0e0d0c0f0a09080b0605040702010003
-	.octa 0x0e0d0c0f0a09080b0605040702010003
-
-.section	.rodata.cst32.ROT16, "aM", @progbits, 32
-.align 32
-ROT16:	.octa 0x0d0c0f0e09080b0a0504070601000302
-	.octa 0x0d0c0f0e09080b0a0504070601000302
-
-.section	.rodata.cst32.CTRINC, "aM", @progbits, 32
-.align 32
-CTRINC:	.octa 0x00000003000000020000000100000000
-	.octa 0x00000007000000060000000500000004
-
-.text
-
-ENTRY(chacha20_8block_xor_avx2)
-	# %rdi: Input state matrix, s
-	# %rsi: 8 data blocks output, o
-	# %rdx: 8 data blocks input, i
-
-	# This function encrypts eight consecutive ChaCha20 blocks by loading
-	# the state matrix in AVX registers eight times. As we need some
-	# scratch registers, we save the first four registers on the stack. The
-	# algorithm performs each operation on the corresponding word of each
-	# state matrix, hence requires no word shuffling. For final XORing step
-	# we transpose the matrix by interleaving 32-, 64- and then 128-bit
-	# words, which allows us to do XOR in AVX registers. 8/16-bit word
-	# rotation is done with the slightly better performing byte shuffling,
-	# 7/12-bit word rotation uses traditional shift+OR.
-
-	vzeroupper
-	# 4 * 32 byte stack, 32-byte aligned
-	lea		8(%rsp),%r10
-	and		$~31, %rsp
-	sub		$0x80, %rsp
-
-	# x0..15[0-7] = s[0..15]
-	vpbroadcastd	0x00(%rdi),%ymm0
-	vpbroadcastd	0x04(%rdi),%ymm1
-	vpbroadcastd	0x08(%rdi),%ymm2
-	vpbroadcastd	0x0c(%rdi),%ymm3
-	vpbroadcastd	0x10(%rdi),%ymm4
-	vpbroadcastd	0x14(%rdi),%ymm5
-	vpbroadcastd	0x18(%rdi),%ymm6
-	vpbroadcastd	0x1c(%rdi),%ymm7
-	vpbroadcastd	0x20(%rdi),%ymm8
-	vpbroadcastd	0x24(%rdi),%ymm9
-	vpbroadcastd	0x28(%rdi),%ymm10
-	vpbroadcastd	0x2c(%rdi),%ymm11
-	vpbroadcastd	0x30(%rdi),%ymm12
-	vpbroadcastd	0x34(%rdi),%ymm13
-	vpbroadcastd	0x38(%rdi),%ymm14
-	vpbroadcastd	0x3c(%rdi),%ymm15
-	# x0..3 on stack
-	vmovdqa		%ymm0,0x00(%rsp)
-	vmovdqa		%ymm1,0x20(%rsp)
-	vmovdqa		%ymm2,0x40(%rsp)
-	vmovdqa		%ymm3,0x60(%rsp)
-
-	vmovdqa		CTRINC(%rip),%ymm1
-	vmovdqa		ROT8(%rip),%ymm2
-	vmovdqa		ROT16(%rip),%ymm3
-
-	# x12 += counter values 0-3
-	vpaddd		%ymm1,%ymm12,%ymm12
-
-	mov		$10,%ecx
-
-.Ldoubleround8:
-	# x0 += x4, x12 = rotl32(x12 ^ x0, 16)
-	vpaddd		0x00(%rsp),%ymm4,%ymm0
-	vmovdqa		%ymm0,0x00(%rsp)
-	vpxor		%ymm0,%ymm12,%ymm12
-	vpshufb		%ymm3,%ymm12,%ymm12
-	# x1 += x5, x13 = rotl32(x13 ^ x1, 16)
-	vpaddd		0x20(%rsp),%ymm5,%ymm0
-	vmovdqa		%ymm0,0x20(%rsp)
-	vpxor		%ymm0,%ymm13,%ymm13
-	vpshufb		%ymm3,%ymm13,%ymm13
-	# x2 += x6, x14 = rotl32(x14 ^ x2, 16)
-	vpaddd		0x40(%rsp),%ymm6,%ymm0
-	vmovdqa		%ymm0,0x40(%rsp)
-	vpxor		%ymm0,%ymm14,%ymm14
-	vpshufb		%ymm3,%ymm14,%ymm14
-	# x3 += x7, x15 = rotl32(x15 ^ x3, 16)
-	vpaddd		0x60(%rsp),%ymm7,%ymm0
-	vmovdqa		%ymm0,0x60(%rsp)
-	vpxor		%ymm0,%ymm15,%ymm15
-	vpshufb		%ymm3,%ymm15,%ymm15
-
-	# x8 += x12, x4 = rotl32(x4 ^ x8, 12)
-	vpaddd		%ymm12,%ymm8,%ymm8
-	vpxor		%ymm8,%ymm4,%ymm4
-	vpslld		$12,%ymm4,%ymm0
-	vpsrld		$20,%ymm4,%ymm4
-	vpor		%ymm0,%ymm4,%ymm4
-	# x9 += x13, x5 = rotl32(x5 ^ x9, 12)
-	vpaddd		%ymm13,%ymm9,%ymm9
-	vpxor		%ymm9,%ymm5,%ymm5
-	vpslld		$12,%ymm5,%ymm0
-	vpsrld		$20,%ymm5,%ymm5
-	vpor		%ymm0,%ymm5,%ymm5
-	# x10 += x14, x6 = rotl32(x6 ^ x10, 12)
-	vpaddd		%ymm14,%ymm10,%ymm10
-	vpxor		%ymm10,%ymm6,%ymm6
-	vpslld		$12,%ymm6,%ymm0
-	vpsrld		$20,%ymm6,%ymm6
-	vpor		%ymm0,%ymm6,%ymm6
-	# x11 += x15, x7 = rotl32(x7 ^ x11, 12)
-	vpaddd		%ymm15,%ymm11,%ymm11
-	vpxor		%ymm11,%ymm7,%ymm7
-	vpslld		$12,%ymm7,%ymm0
-	vpsrld		$20,%ymm7,%ymm7
-	vpor		%ymm0,%ymm7,%ymm7
-
-	# x0 += x4, x12 = rotl32(x12 ^ x0, 8)
-	vpaddd		0x00(%rsp),%ymm4,%ymm0
-	vmovdqa		%ymm0,0x00(%rsp)
-	vpxor		%ymm0,%ymm12,%ymm12
-	vpshufb		%ymm2,%ymm12,%ymm12
-	# x1 += x5, x13 = rotl32(x13 ^ x1, 8)
-	vpaddd		0x20(%rsp),%ymm5,%ymm0
-	vmovdqa		%ymm0,0x20(%rsp)
-	vpxor		%ymm0,%ymm13,%ymm13
-	vpshufb		%ymm2,%ymm13,%ymm13
-	# x2 += x6, x14 = rotl32(x14 ^ x2, 8)
-	vpaddd		0x40(%rsp),%ymm6,%ymm0
-	vmovdqa		%ymm0,0x40(%rsp)
-	vpxor		%ymm0,%ymm14,%ymm14
-	vpshufb		%ymm2,%ymm14,%ymm14
-	# x3 += x7, x15 = rotl32(x15 ^ x3, 8)
-	vpaddd		0x60(%rsp),%ymm7,%ymm0
-	vmovdqa		%ymm0,0x60(%rsp)
-	vpxor		%ymm0,%ymm15,%ymm15
-	vpshufb		%ymm2,%ymm15,%ymm15
-
-	# x8 += x12, x4 = rotl32(x4 ^ x8, 7)
-	vpaddd		%ymm12,%ymm8,%ymm8
-	vpxor		%ymm8,%ymm4,%ymm4
-	vpslld		$7,%ymm4,%ymm0
-	vpsrld		$25,%ymm4,%ymm4
-	vpor		%ymm0,%ymm4,%ymm4
-	# x9 += x13, x5 = rotl32(x5 ^ x9, 7)
-	vpaddd		%ymm13,%ymm9,%ymm9
-	vpxor		%ymm9,%ymm5,%ymm5
-	vpslld		$7,%ymm5,%ymm0
-	vpsrld		$25,%ymm5,%ymm5
-	vpor		%ymm0,%ymm5,%ymm5
-	# x10 += x14, x6 = rotl32(x6 ^ x10, 7)
-	vpaddd		%ymm14,%ymm10,%ymm10
-	vpxor		%ymm10,%ymm6,%ymm6
-	vpslld		$7,%ymm6,%ymm0
-	vpsrld		$25,%ymm6,%ymm6
-	vpor		%ymm0,%ymm6,%ymm6
-	# x11 += x15, x7 = rotl32(x7 ^ x11, 7)
-	vpaddd		%ymm15,%ymm11,%ymm11
-	vpxor		%ymm11,%ymm7,%ymm7
-	vpslld		$7,%ymm7,%ymm0
-	vpsrld		$25,%ymm7,%ymm7
-	vpor		%ymm0,%ymm7,%ymm7
-
-	# x0 += x5, x15 = rotl32(x15 ^ x0, 16)
-	vpaddd		0x00(%rsp),%ymm5,%ymm0
-	vmovdqa		%ymm0,0x00(%rsp)
-	vpxor		%ymm0,%ymm15,%ymm15
-	vpshufb		%ymm3,%ymm15,%ymm15
-	# x1 += x6, x12 = rotl32(x12 ^ x1, 16)%ymm0
-	vpaddd		0x20(%rsp),%ymm6,%ymm0
-	vmovdqa		%ymm0,0x20(%rsp)
-	vpxor		%ymm0,%ymm12,%ymm12
-	vpshufb		%ymm3,%ymm12,%ymm12
-	# x2 += x7, x13 = rotl32(x13 ^ x2, 16)
-	vpaddd		0x40(%rsp),%ymm7,%ymm0
-	vmovdqa		%ymm0,0x40(%rsp)
-	vpxor		%ymm0,%ymm13,%ymm13
-	vpshufb		%ymm3,%ymm13,%ymm13
-	# x3 += x4, x14 = rotl32(x14 ^ x3, 16)
-	vpaddd		0x60(%rsp),%ymm4,%ymm0
-	vmovdqa		%ymm0,0x60(%rsp)
-	vpxor		%ymm0,%ymm14,%ymm14
-	vpshufb		%ymm3,%ymm14,%ymm14
-
-	# x10 += x15, x5 = rotl32(x5 ^ x10, 12)
-	vpaddd		%ymm15,%ymm10,%ymm10
-	vpxor		%ymm10,%ymm5,%ymm5
-	vpslld		$12,%ymm5,%ymm0
-	vpsrld		$20,%ymm5,%ymm5
-	vpor		%ymm0,%ymm5,%ymm5
-	# x11 += x12, x6 = rotl32(x6 ^ x11, 12)
-	vpaddd		%ymm12,%ymm11,%ymm11
-	vpxor		%ymm11,%ymm6,%ymm6
-	vpslld		$12,%ymm6,%ymm0
-	vpsrld		$20,%ymm6,%ymm6
-	vpor		%ymm0,%ymm6,%ymm6
-	# x8 += x13, x7 = rotl32(x7 ^ x8, 12)
-	vpaddd		%ymm13,%ymm8,%ymm8
-	vpxor		%ymm8,%ymm7,%ymm7
-	vpslld		$12,%ymm7,%ymm0
-	vpsrld		$20,%ymm7,%ymm7
-	vpor		%ymm0,%ymm7,%ymm7
-	# x9 += x14, x4 = rotl32(x4 ^ x9, 12)
-	vpaddd		%ymm14,%ymm9,%ymm9
-	vpxor		%ymm9,%ymm4,%ymm4
-	vpslld		$12,%ymm4,%ymm0
-	vpsrld		$20,%ymm4,%ymm4
-	vpor		%ymm0,%ymm4,%ymm4
-
-	# x0 += x5, x15 = rotl32(x15 ^ x0, 8)
-	vpaddd		0x00(%rsp),%ymm5,%ymm0
-	vmovdqa		%ymm0,0x00(%rsp)
-	vpxor		%ymm0,%ymm15,%ymm15
-	vpshufb		%ymm2,%ymm15,%ymm15
-	# x1 += x6, x12 = rotl32(x12 ^ x1, 8)
-	vpaddd		0x20(%rsp),%ymm6,%ymm0
-	vmovdqa		%ymm0,0x20(%rsp)
-	vpxor		%ymm0,%ymm12,%ymm12
-	vpshufb		%ymm2,%ymm12,%ymm12
-	# x2 += x7, x13 = rotl32(x13 ^ x2, 8)
-	vpaddd		0x40(%rsp),%ymm7,%ymm0
-	vmovdqa		%ymm0,0x40(%rsp)
-	vpxor		%ymm0,%ymm13,%ymm13
-	vpshufb		%ymm2,%ymm13,%ymm13
-	# x3 += x4, x14 = rotl32(x14 ^ x3, 8)
-	vpaddd		0x60(%rsp),%ymm4,%ymm0
-	vmovdqa		%ymm0,0x60(%rsp)
-	vpxor		%ymm0,%ymm14,%ymm14
-	vpshufb		%ymm2,%ymm14,%ymm14
-
-	# x10 += x15, x5 = rotl32(x5 ^ x10, 7)
-	vpaddd		%ymm15,%ymm10,%ymm10
-	vpxor		%ymm10,%ymm5,%ymm5
-	vpslld		$7,%ymm5,%ymm0
-	vpsrld		$25,%ymm5,%ymm5
-	vpor		%ymm0,%ymm5,%ymm5
-	# x11 += x12, x6 = rotl32(x6 ^ x11, 7)
-	vpaddd		%ymm12,%ymm11,%ymm11
-	vpxor		%ymm11,%ymm6,%ymm6
-	vpslld		$7,%ymm6,%ymm0
-	vpsrld		$25,%ymm6,%ymm6
-	vpor		%ymm0,%ymm6,%ymm6
-	# x8 += x13, x7 = rotl32(x7 ^ x8, 7)
-	vpaddd		%ymm13,%ymm8,%ymm8
-	vpxor		%ymm8,%ymm7,%ymm7
-	vpslld		$7,%ymm7,%ymm0
-	vpsrld		$25,%ymm7,%ymm7
-	vpor		%ymm0,%ymm7,%ymm7
-	# x9 += x14, x4 = rotl32(x4 ^ x9, 7)
-	vpaddd		%ymm14,%ymm9,%ymm9
-	vpxor		%ymm9,%ymm4,%ymm4
-	vpslld		$7,%ymm4,%ymm0
-	vpsrld		$25,%ymm4,%ymm4
-	vpor		%ymm0,%ymm4,%ymm4
-
-	dec		%ecx
-	jnz		.Ldoubleround8
-
-	# x0..15[0-3] += s[0..15]
-	vpbroadcastd	0x00(%rdi),%ymm0
-	vpaddd		0x00(%rsp),%ymm0,%ymm0
-	vmovdqa		%ymm0,0x00(%rsp)
-	vpbroadcastd	0x04(%rdi),%ymm0
-	vpaddd		0x20(%rsp),%ymm0,%ymm0
-	vmovdqa		%ymm0,0x20(%rsp)
-	vpbroadcastd	0x08(%rdi),%ymm0
-	vpaddd		0x40(%rsp),%ymm0,%ymm0
-	vmovdqa		%ymm0,0x40(%rsp)
-	vpbroadcastd	0x0c(%rdi),%ymm0
-	vpaddd		0x60(%rsp),%ymm0,%ymm0
-	vmovdqa		%ymm0,0x60(%rsp)
-	vpbroadcastd	0x10(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm4,%ymm4
-	vpbroadcastd	0x14(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm5,%ymm5
-	vpbroadcastd	0x18(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm6,%ymm6
-	vpbroadcastd	0x1c(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm7,%ymm7
-	vpbroadcastd	0x20(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm8,%ymm8
-	vpbroadcastd	0x24(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm9,%ymm9
-	vpbroadcastd	0x28(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm10,%ymm10
-	vpbroadcastd	0x2c(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm11,%ymm11
-	vpbroadcastd	0x30(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm12,%ymm12
-	vpbroadcastd	0x34(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm13,%ymm13
-	vpbroadcastd	0x38(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm14,%ymm14
-	vpbroadcastd	0x3c(%rdi),%ymm0
-	vpaddd		%ymm0,%ymm15,%ymm15
-
-	# x12 += counter values 0-3
-	vpaddd		%ymm1,%ymm12,%ymm12
-
-	# interleave 32-bit words in state n, n+1
-	vmovdqa		0x00(%rsp),%ymm0
-	vmovdqa		0x20(%rsp),%ymm1
-	vpunpckldq	%ymm1,%ymm0,%ymm2
-	vpunpckhdq	%ymm1,%ymm0,%ymm1
-	vmovdqa		%ymm2,0x00(%rsp)
-	vmovdqa		%ymm1,0x20(%rsp)
-	vmovdqa		0x40(%rsp),%ymm0
-	vmovdqa		0x60(%rsp),%ymm1
-	vpunpckldq	%ymm1,%ymm0,%ymm2
-	vpunpckhdq	%ymm1,%ymm0,%ymm1
-	vmovdqa		%ymm2,0x40(%rsp)
-	vmovdqa		%ymm1,0x60(%rsp)
-	vmovdqa		%ymm4,%ymm0
-	vpunpckldq	%ymm5,%ymm0,%ymm4
-	vpunpckhdq	%ymm5,%ymm0,%ymm5
-	vmovdqa		%ymm6,%ymm0
-	vpunpckldq	%ymm7,%ymm0,%ymm6
-	vpunpckhdq	%ymm7,%ymm0,%ymm7
-	vmovdqa		%ymm8,%ymm0
-	vpunpckldq	%ymm9,%ymm0,%ymm8
-	vpunpckhdq	%ymm9,%ymm0,%ymm9
-	vmovdqa		%ymm10,%ymm0
-	vpunpckldq	%ymm11,%ymm0,%ymm10
-	vpunpckhdq	%ymm11,%ymm0,%ymm11
-	vmovdqa		%ymm12,%ymm0
-	vpunpckldq	%ymm13,%ymm0,%ymm12
-	vpunpckhdq	%ymm13,%ymm0,%ymm13
-	vmovdqa		%ymm14,%ymm0
-	vpunpckldq	%ymm15,%ymm0,%ymm14
-	vpunpckhdq	%ymm15,%ymm0,%ymm15
-
-	# interleave 64-bit words in state n, n+2
-	vmovdqa		0x00(%rsp),%ymm0
-	vmovdqa		0x40(%rsp),%ymm2
-	vpunpcklqdq	%ymm2,%ymm0,%ymm1
-	vpunpckhqdq	%ymm2,%ymm0,%ymm2
-	vmovdqa		%ymm1,0x00(%rsp)
-	vmovdqa		%ymm2,0x40(%rsp)
-	vmovdqa		0x20(%rsp),%ymm0
-	vmovdqa		0x60(%rsp),%ymm2
-	vpunpcklqdq	%ymm2,%ymm0,%ymm1
-	vpunpckhqdq	%ymm2,%ymm0,%ymm2
-	vmovdqa		%ymm1,0x20(%rsp)
-	vmovdqa		%ymm2,0x60(%rsp)
-	vmovdqa		%ymm4,%ymm0
-	vpunpcklqdq	%ymm6,%ymm0,%ymm4
-	vpunpckhqdq	%ymm6,%ymm0,%ymm6
-	vmovdqa		%ymm5,%ymm0
-	vpunpcklqdq	%ymm7,%ymm0,%ymm5
-	vpunpckhqdq	%ymm7,%ymm0,%ymm7
-	vmovdqa		%ymm8,%ymm0
-	vpunpcklqdq	%ymm10,%ymm0,%ymm8
-	vpunpckhqdq	%ymm10,%ymm0,%ymm10
-	vmovdqa		%ymm9,%ymm0
-	vpunpcklqdq	%ymm11,%ymm0,%ymm9
-	vpunpckhqdq	%ymm11,%ymm0,%ymm11
-	vmovdqa		%ymm12,%ymm0
-	vpunpcklqdq	%ymm14,%ymm0,%ymm12
-	vpunpckhqdq	%ymm14,%ymm0,%ymm14
-	vmovdqa		%ymm13,%ymm0
-	vpunpcklqdq	%ymm15,%ymm0,%ymm13
-	vpunpckhqdq	%ymm15,%ymm0,%ymm15
-
-	# interleave 128-bit words in state n, n+4
-	vmovdqa		0x00(%rsp),%ymm0
-	vperm2i128	$0x20,%ymm4,%ymm0,%ymm1
-	vperm2i128	$0x31,%ymm4,%ymm0,%ymm4
-	vmovdqa		%ymm1,0x00(%rsp)
-	vmovdqa		0x20(%rsp),%ymm0
-	vperm2i128	$0x20,%ymm5,%ymm0,%ymm1
-	vperm2i128	$0x31,%ymm5,%ymm0,%ymm5
-	vmovdqa		%ymm1,0x20(%rsp)
-	vmovdqa		0x40(%rsp),%ymm0
-	vperm2i128	$0x20,%ymm6,%ymm0,%ymm1
-	vperm2i128	$0x31,%ymm6,%ymm0,%ymm6
-	vmovdqa		%ymm1,0x40(%rsp)
-	vmovdqa		0x60(%rsp),%ymm0
-	vperm2i128	$0x20,%ymm7,%ymm0,%ymm1
-	vperm2i128	$0x31,%ymm7,%ymm0,%ymm7
-	vmovdqa		%ymm1,0x60(%rsp)
-	vperm2i128	$0x20,%ymm12,%ymm8,%ymm0
-	vperm2i128	$0x31,%ymm12,%ymm8,%ymm12
-	vmovdqa		%ymm0,%ymm8
-	vperm2i128	$0x20,%ymm13,%ymm9,%ymm0
-	vperm2i128	$0x31,%ymm13,%ymm9,%ymm13
-	vmovdqa		%ymm0,%ymm9
-	vperm2i128	$0x20,%ymm14,%ymm10,%ymm0
-	vperm2i128	$0x31,%ymm14,%ymm10,%ymm14
-	vmovdqa		%ymm0,%ymm10
-	vperm2i128	$0x20,%ymm15,%ymm11,%ymm0
-	vperm2i128	$0x31,%ymm15,%ymm11,%ymm15
-	vmovdqa		%ymm0,%ymm11
-
-	# xor with corresponding input, write to output
-	vmovdqa		0x00(%rsp),%ymm0
-	vpxor		0x0000(%rdx),%ymm0,%ymm0
-	vmovdqu		%ymm0,0x0000(%rsi)
-	vmovdqa		0x20(%rsp),%ymm0
-	vpxor		0x0080(%rdx),%ymm0,%ymm0
-	vmovdqu		%ymm0,0x0080(%rsi)
-	vmovdqa		0x40(%rsp),%ymm0
-	vpxor		0x0040(%rdx),%ymm0,%ymm0
-	vmovdqu		%ymm0,0x0040(%rsi)
-	vmovdqa		0x60(%rsp),%ymm0
-	vpxor		0x00c0(%rdx),%ymm0,%ymm0
-	vmovdqu		%ymm0,0x00c0(%rsi)
-	vpxor		0x0100(%rdx),%ymm4,%ymm4
-	vmovdqu		%ymm4,0x0100(%rsi)
-	vpxor		0x0180(%rdx),%ymm5,%ymm5
-	vmovdqu		%ymm5,0x00180(%rsi)
-	vpxor		0x0140(%rdx),%ymm6,%ymm6
-	vmovdqu		%ymm6,0x0140(%rsi)
-	vpxor		0x01c0(%rdx),%ymm7,%ymm7
-	vmovdqu		%ymm7,0x01c0(%rsi)
-	vpxor		0x0020(%rdx),%ymm8,%ymm8
-	vmovdqu		%ymm8,0x0020(%rsi)
-	vpxor		0x00a0(%rdx),%ymm9,%ymm9
-	vmovdqu		%ymm9,0x00a0(%rsi)
-	vpxor		0x0060(%rdx),%ymm10,%ymm10
-	vmovdqu		%ymm10,0x0060(%rsi)
-	vpxor		0x00e0(%rdx),%ymm11,%ymm11
-	vmovdqu		%ymm11,0x00e0(%rsi)
-	vpxor		0x0120(%rdx),%ymm12,%ymm12
-	vmovdqu		%ymm12,0x0120(%rsi)
-	vpxor		0x01a0(%rdx),%ymm13,%ymm13
-	vmovdqu		%ymm13,0x01a0(%rsi)
-	vpxor		0x0160(%rdx),%ymm14,%ymm14
-	vmovdqu		%ymm14,0x0160(%rsi)
-	vpxor		0x01e0(%rdx),%ymm15,%ymm15
-	vmovdqu		%ymm15,0x01e0(%rsi)
-
-	vzeroupper
-	lea		-8(%r10),%rsp
-	ret
-ENDPROC(chacha20_8block_xor_avx2)
diff --git a/arch/x86/crypto/chacha20-ssse3-x86_64.S b/arch/x86/crypto/chacha20-ssse3-x86_64.S
deleted file mode 100644
index 512a2b500fd1..000000000000
--- a/arch/x86/crypto/chacha20-ssse3-x86_64.S
+++ /dev/null
@@ -1,630 +0,0 @@
-/*
- * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <linux/linkage.h>
-
-.section	.rodata.cst16.ROT8, "aM", @progbits, 16
-.align 16
-ROT8:	.octa 0x0e0d0c0f0a09080b0605040702010003
-.section	.rodata.cst16.ROT16, "aM", @progbits, 16
-.align 16
-ROT16:	.octa 0x0d0c0f0e09080b0a0504070601000302
-.section	.rodata.cst16.CTRINC, "aM", @progbits, 16
-.align 16
-CTRINC:	.octa 0x00000003000000020000000100000000
-
-.text
-
-ENTRY(chacha20_block_xor_ssse3)
-	# %rdi: Input state matrix, s
-	# %rsi: 1 data block output, o
-	# %rdx: 1 data block input, i
-
-	# This function encrypts one ChaCha20 block by loading the state matrix
-	# in four SSE registers. It performs matrix operation on four words in
-	# parallel, but requireds shuffling to rearrange the words after each
-	# round. 8/16-bit word rotation is done with the slightly better
-	# performing SSSE3 byte shuffling, 7/12-bit word rotation uses
-	# traditional shift+OR.
-
-	# x0..3 = s0..3
-	movdqa		0x00(%rdi),%xmm0
-	movdqa		0x10(%rdi),%xmm1
-	movdqa		0x20(%rdi),%xmm2
-	movdqa		0x30(%rdi),%xmm3
-	movdqa		%xmm0,%xmm8
-	movdqa		%xmm1,%xmm9
-	movdqa		%xmm2,%xmm10
-	movdqa		%xmm3,%xmm11
-
-	movdqa		ROT8(%rip),%xmm4
-	movdqa		ROT16(%rip),%xmm5
-
-	mov	$10,%ecx
-
-.Ldoubleround:
-
-	# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
-	paddd		%xmm1,%xmm0
-	pxor		%xmm0,%xmm3
-	pshufb		%xmm5,%xmm3
-
-	# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
-	paddd		%xmm3,%xmm2
-	pxor		%xmm2,%xmm1
-	movdqa		%xmm1,%xmm6
-	pslld		$12,%xmm6
-	psrld		$20,%xmm1
-	por		%xmm6,%xmm1
-
-	# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
-	paddd		%xmm1,%xmm0
-	pxor		%xmm0,%xmm3
-	pshufb		%xmm4,%xmm3
-
-	# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
-	paddd		%xmm3,%xmm2
-	pxor		%xmm2,%xmm1
-	movdqa		%xmm1,%xmm7
-	pslld		$7,%xmm7
-	psrld		$25,%xmm1
-	por		%xmm7,%xmm1
-
-	# x1 = shuffle32(x1, MASK(0, 3, 2, 1))
-	pshufd		$0x39,%xmm1,%xmm1
-	# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
-	pshufd		$0x4e,%xmm2,%xmm2
-	# x3 = shuffle32(x3, MASK(2, 1, 0, 3))
-	pshufd		$0x93,%xmm3,%xmm3
-
-	# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
-	paddd		%xmm1,%xmm0
-	pxor		%xmm0,%xmm3
-	pshufb		%xmm5,%xmm3
-
-	# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
-	paddd		%xmm3,%xmm2
-	pxor		%xmm2,%xmm1
-	movdqa		%xmm1,%xmm6
-	pslld		$12,%xmm6
-	psrld		$20,%xmm1
-	por		%xmm6,%xmm1
-
-	# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
-	paddd		%xmm1,%xmm0
-	pxor		%xmm0,%xmm3
-	pshufb		%xmm4,%xmm3
-
-	# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
-	paddd		%xmm3,%xmm2
-	pxor		%xmm2,%xmm1
-	movdqa		%xmm1,%xmm7
-	pslld		$7,%xmm7
-	psrld		$25,%xmm1
-	por		%xmm7,%xmm1
-
-	# x1 = shuffle32(x1, MASK(2, 1, 0, 3))
-	pshufd		$0x93,%xmm1,%xmm1
-	# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
-	pshufd		$0x4e,%xmm2,%xmm2
-	# x3 = shuffle32(x3, MASK(0, 3, 2, 1))
-	pshufd		$0x39,%xmm3,%xmm3
-
-	dec		%ecx
-	jnz		.Ldoubleround
-
-	# o0 = i0 ^ (x0 + s0)
-	movdqu		0x00(%rdx),%xmm4
-	paddd		%xmm8,%xmm0
-	pxor		%xmm4,%xmm0
-	movdqu		%xmm0,0x00(%rsi)
-	# o1 = i1 ^ (x1 + s1)
-	movdqu		0x10(%rdx),%xmm5
-	paddd		%xmm9,%xmm1
-	pxor		%xmm5,%xmm1
-	movdqu		%xmm1,0x10(%rsi)
-	# o2 = i2 ^ (x2 + s2)
-	movdqu		0x20(%rdx),%xmm6
-	paddd		%xmm10,%xmm2
-	pxor		%xmm6,%xmm2
-	movdqu		%xmm2,0x20(%rsi)
-	# o3 = i3 ^ (x3 + s3)
-	movdqu		0x30(%rdx),%xmm7
-	paddd		%xmm11,%xmm3
-	pxor		%xmm7,%xmm3
-	movdqu		%xmm3,0x30(%rsi)
-
-	ret
-ENDPROC(chacha20_block_xor_ssse3)
-
-ENTRY(chacha20_4block_xor_ssse3)
-	# %rdi: Input state matrix, s
-	# %rsi: 4 data blocks output, o
-	# %rdx: 4 data blocks input, i
-
-	# This function encrypts four consecutive ChaCha20 blocks by loading the
-	# the state matrix in SSE registers four times. As we need some scratch
-	# registers, we save the first four registers on the stack. The
-	# algorithm performs each operation on the corresponding word of each
-	# state matrix, hence requires no word shuffling. For final XORing step
-	# we transpose the matrix by interleaving 32- and then 64-bit words,
-	# which allows us to do XOR in SSE registers. 8/16-bit word rotation is
-	# done with the slightly better performing SSSE3 byte shuffling,
-	# 7/12-bit word rotation uses traditional shift+OR.
-
-	lea		8(%rsp),%r10
-	sub		$0x80,%rsp
-	and		$~63,%rsp
-
-	# x0..15[0-3] = s0..3[0..3]
-	movq		0x00(%rdi),%xmm1
-	pshufd		$0x00,%xmm1,%xmm0
-	pshufd		$0x55,%xmm1,%xmm1
-	movq		0x08(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	movq		0x10(%rdi),%xmm5
-	pshufd		$0x00,%xmm5,%xmm4
-	pshufd		$0x55,%xmm5,%xmm5
-	movq		0x18(%rdi),%xmm7
-	pshufd		$0x00,%xmm7,%xmm6
-	pshufd		$0x55,%xmm7,%xmm7
-	movq		0x20(%rdi),%xmm9
-	pshufd		$0x00,%xmm9,%xmm8
-	pshufd		$0x55,%xmm9,%xmm9
-	movq		0x28(%rdi),%xmm11
-	pshufd		$0x00,%xmm11,%xmm10
-	pshufd		$0x55,%xmm11,%xmm11
-	movq		0x30(%rdi),%xmm13
-	pshufd		$0x00,%xmm13,%xmm12
-	pshufd		$0x55,%xmm13,%xmm13
-	movq		0x38(%rdi),%xmm15
-	pshufd		$0x00,%xmm15,%xmm14
-	pshufd		$0x55,%xmm15,%xmm15
-	# x0..3 on stack
-	movdqa		%xmm0,0x00(%rsp)
-	movdqa		%xmm1,0x10(%rsp)
-	movdqa		%xmm2,0x20(%rsp)
-	movdqa		%xmm3,0x30(%rsp)
-
-	movdqa		CTRINC(%rip),%xmm1
-	movdqa		ROT8(%rip),%xmm2
-	movdqa		ROT16(%rip),%xmm3
-
-	# x12 += counter values 0-3
-	paddd		%xmm1,%xmm12
-
-	mov		$10,%ecx
-
-.Ldoubleround4:
-	# x0 += x4, x12 = rotl32(x12 ^ x0, 16)
-	movdqa		0x00(%rsp),%xmm0
-	paddd		%xmm4,%xmm0
-	movdqa		%xmm0,0x00(%rsp)
-	pxor		%xmm0,%xmm12
-	pshufb		%xmm3,%xmm12
-	# x1 += x5, x13 = rotl32(x13 ^ x1, 16)
-	movdqa		0x10(%rsp),%xmm0
-	paddd		%xmm5,%xmm0
-	movdqa		%xmm0,0x10(%rsp)
-	pxor		%xmm0,%xmm13
-	pshufb		%xmm3,%xmm13
-	# x2 += x6, x14 = rotl32(x14 ^ x2, 16)
-	movdqa		0x20(%rsp),%xmm0
-	paddd		%xmm6,%xmm0
-	movdqa		%xmm0,0x20(%rsp)
-	pxor		%xmm0,%xmm14
-	pshufb		%xmm3,%xmm14
-	# x3 += x7, x15 = rotl32(x15 ^ x3, 16)
-	movdqa		0x30(%rsp),%xmm0
-	paddd		%xmm7,%xmm0
-	movdqa		%xmm0,0x30(%rsp)
-	pxor		%xmm0,%xmm15
-	pshufb		%xmm3,%xmm15
-
-	# x8 += x12, x4 = rotl32(x4 ^ x8, 12)
-	paddd		%xmm12,%xmm8
-	pxor		%xmm8,%xmm4
-	movdqa		%xmm4,%xmm0
-	pslld		$12,%xmm0
-	psrld		$20,%xmm4
-	por		%xmm0,%xmm4
-	# x9 += x13, x5 = rotl32(x5 ^ x9, 12)
-	paddd		%xmm13,%xmm9
-	pxor		%xmm9,%xmm5
-	movdqa		%xmm5,%xmm0
-	pslld		$12,%xmm0
-	psrld		$20,%xmm5
-	por		%xmm0,%xmm5
-	# x10 += x14, x6 = rotl32(x6 ^ x10, 12)
-	paddd		%xmm14,%xmm10
-	pxor		%xmm10,%xmm6
-	movdqa		%xmm6,%xmm0
-	pslld		$12,%xmm0
-	psrld		$20,%xmm6
-	por		%xmm0,%xmm6
-	# x11 += x15, x7 = rotl32(x7 ^ x11, 12)
-	paddd		%xmm15,%xmm11
-	pxor		%xmm11,%xmm7
-	movdqa		%xmm7,%xmm0
-	pslld		$12,%xmm0
-	psrld		$20,%xmm7
-	por		%xmm0,%xmm7
-
-	# x0 += x4, x12 = rotl32(x12 ^ x0, 8)
-	movdqa		0x00(%rsp),%xmm0
-	paddd		%xmm4,%xmm0
-	movdqa		%xmm0,0x00(%rsp)
-	pxor		%xmm0,%xmm12
-	pshufb		%xmm2,%xmm12
-	# x1 += x5, x13 = rotl32(x13 ^ x1, 8)
-	movdqa		0x10(%rsp),%xmm0
-	paddd		%xmm5,%xmm0
-	movdqa		%xmm0,0x10(%rsp)
-	pxor		%xmm0,%xmm13
-	pshufb		%xmm2,%xmm13
-	# x2 += x6, x14 = rotl32(x14 ^ x2, 8)
-	movdqa		0x20(%rsp),%xmm0
-	paddd		%xmm6,%xmm0
-	movdqa		%xmm0,0x20(%rsp)
-	pxor		%xmm0,%xmm14
-	pshufb		%xmm2,%xmm14
-	# x3 += x7, x15 = rotl32(x15 ^ x3, 8)
-	movdqa		0x30(%rsp),%xmm0
-	paddd		%xmm7,%xmm0
-	movdqa		%xmm0,0x30(%rsp)
-	pxor		%xmm0,%xmm15
-	pshufb		%xmm2,%xmm15
-
-	# x8 += x12, x4 = rotl32(x4 ^ x8, 7)
-	paddd		%xmm12,%xmm8
-	pxor		%xmm8,%xmm4
-	movdqa		%xmm4,%xmm0
-	pslld		$7,%xmm0
-	psrld		$25,%xmm4
-	por		%xmm0,%xmm4
-	# x9 += x13, x5 = rotl32(x5 ^ x9, 7)
-	paddd		%xmm13,%xmm9
-	pxor		%xmm9,%xmm5
-	movdqa		%xmm5,%xmm0
-	pslld		$7,%xmm0
-	psrld		$25,%xmm5
-	por		%xmm0,%xmm5
-	# x10 += x14, x6 = rotl32(x6 ^ x10, 7)
-	paddd		%xmm14,%xmm10
-	pxor		%xmm10,%xmm6
-	movdqa		%xmm6,%xmm0
-	pslld		$7,%xmm0
-	psrld		$25,%xmm6
-	por		%xmm0,%xmm6
-	# x11 += x15, x7 = rotl32(x7 ^ x11, 7)
-	paddd		%xmm15,%xmm11
-	pxor		%xmm11,%xmm7
-	movdqa		%xmm7,%xmm0
-	pslld		$7,%xmm0
-	psrld		$25,%xmm7
-	por		%xmm0,%xmm7
-
-	# x0 += x5, x15 = rotl32(x15 ^ x0, 16)
-	movdqa		0x00(%rsp),%xmm0
-	paddd		%xmm5,%xmm0
-	movdqa		%xmm0,0x00(%rsp)
-	pxor		%xmm0,%xmm15
-	pshufb		%xmm3,%xmm15
-	# x1 += x6, x12 = rotl32(x12 ^ x1, 16)
-	movdqa		0x10(%rsp),%xmm0
-	paddd		%xmm6,%xmm0
-	movdqa		%xmm0,0x10(%rsp)
-	pxor		%xmm0,%xmm12
-	pshufb		%xmm3,%xmm12
-	# x2 += x7, x13 = rotl32(x13 ^ x2, 16)
-	movdqa		0x20(%rsp),%xmm0
-	paddd		%xmm7,%xmm0
-	movdqa		%xmm0,0x20(%rsp)
-	pxor		%xmm0,%xmm13
-	pshufb		%xmm3,%xmm13
-	# x3 += x4, x14 = rotl32(x14 ^ x3, 16)
-	movdqa		0x30(%rsp),%xmm0
-	paddd		%xmm4,%xmm0
-	movdqa		%xmm0,0x30(%rsp)
-	pxor		%xmm0,%xmm14
-	pshufb		%xmm3,%xmm14
-
-	# x10 += x15, x5 = rotl32(x5 ^ x10, 12)
-	paddd		%xmm15,%xmm10
-	pxor		%xmm10,%xmm5
-	movdqa		%xmm5,%xmm0
-	pslld		$12,%xmm0
-	psrld		$20,%xmm5
-	por		%xmm0,%xmm5
-	# x11 += x12, x6 = rotl32(x6 ^ x11, 12)
-	paddd		%xmm12,%xmm11
-	pxor		%xmm11,%xmm6
-	movdqa		%xmm6,%xmm0
-	pslld		$12,%xmm0
-	psrld		$20,%xmm6
-	por		%xmm0,%xmm6
-	# x8 += x13, x7 = rotl32(x7 ^ x8, 12)
-	paddd		%xmm13,%xmm8
-	pxor		%xmm8,%xmm7
-	movdqa		%xmm7,%xmm0
-	pslld		$12,%xmm0
-	psrld		$20,%xmm7
-	por		%xmm0,%xmm7
-	# x9 += x14, x4 = rotl32(x4 ^ x9, 12)
-	paddd		%xmm14,%xmm9
-	pxor		%xmm9,%xmm4
-	movdqa		%xmm4,%xmm0
-	pslld		$12,%xmm0
-	psrld		$20,%xmm4
-	por		%xmm0,%xmm4
-
-	# x0 += x5, x15 = rotl32(x15 ^ x0, 8)
-	movdqa		0x00(%rsp),%xmm0
-	paddd		%xmm5,%xmm0
-	movdqa		%xmm0,0x00(%rsp)
-	pxor		%xmm0,%xmm15
-	pshufb		%xmm2,%xmm15
-	# x1 += x6, x12 = rotl32(x12 ^ x1, 8)
-	movdqa		0x10(%rsp),%xmm0
-	paddd		%xmm6,%xmm0
-	movdqa		%xmm0,0x10(%rsp)
-	pxor		%xmm0,%xmm12
-	pshufb		%xmm2,%xmm12
-	# x2 += x7, x13 = rotl32(x13 ^ x2, 8)
-	movdqa		0x20(%rsp),%xmm0
-	paddd		%xmm7,%xmm0
-	movdqa		%xmm0,0x20(%rsp)
-	pxor		%xmm0,%xmm13
-	pshufb		%xmm2,%xmm13
-	# x3 += x4, x14 = rotl32(x14 ^ x3, 8)
-	movdqa		0x30(%rsp),%xmm0
-	paddd		%xmm4,%xmm0
-	movdqa		%xmm0,0x30(%rsp)
-	pxor		%xmm0,%xmm14
-	pshufb		%xmm2,%xmm14
-
-	# x10 += x15, x5 = rotl32(x5 ^ x10, 7)
-	paddd		%xmm15,%xmm10
-	pxor		%xmm10,%xmm5
-	movdqa		%xmm5,%xmm0
-	pslld		$7,%xmm0
-	psrld		$25,%xmm5
-	por		%xmm0,%xmm5
-	# x11 += x12, x6 = rotl32(x6 ^ x11, 7)
-	paddd		%xmm12,%xmm11
-	pxor		%xmm11,%xmm6
-	movdqa		%xmm6,%xmm0
-	pslld		$7,%xmm0
-	psrld		$25,%xmm6
-	por		%xmm0,%xmm6
-	# x8 += x13, x7 = rotl32(x7 ^ x8, 7)
-	paddd		%xmm13,%xmm8
-	pxor		%xmm8,%xmm7
-	movdqa		%xmm7,%xmm0
-	pslld		$7,%xmm0
-	psrld		$25,%xmm7
-	por		%xmm0,%xmm7
-	# x9 += x14, x4 = rotl32(x4 ^ x9, 7)
-	paddd		%xmm14,%xmm9
-	pxor		%xmm9,%xmm4
-	movdqa		%xmm4,%xmm0
-	pslld		$7,%xmm0
-	psrld		$25,%xmm4
-	por		%xmm0,%xmm4
-
-	dec		%ecx
-	jnz		.Ldoubleround4
-
-	# x0[0-3] += s0[0]
-	# x1[0-3] += s0[1]
-	movq		0x00(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	paddd		0x00(%rsp),%xmm2
-	movdqa		%xmm2,0x00(%rsp)
-	paddd		0x10(%rsp),%xmm3
-	movdqa		%xmm3,0x10(%rsp)
-	# x2[0-3] += s0[2]
-	# x3[0-3] += s0[3]
-	movq		0x08(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	paddd		0x20(%rsp),%xmm2
-	movdqa		%xmm2,0x20(%rsp)
-	paddd		0x30(%rsp),%xmm3
-	movdqa		%xmm3,0x30(%rsp)
-
-	# x4[0-3] += s1[0]
-	# x5[0-3] += s1[1]
-	movq		0x10(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	paddd		%xmm2,%xmm4
-	paddd		%xmm3,%xmm5
-	# x6[0-3] += s1[2]
-	# x7[0-3] += s1[3]
-	movq		0x18(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	paddd		%xmm2,%xmm6
-	paddd		%xmm3,%xmm7
-
-	# x8[0-3] += s2[0]
-	# x9[0-3] += s2[1]
-	movq		0x20(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	paddd		%xmm2,%xmm8
-	paddd		%xmm3,%xmm9
-	# x10[0-3] += s2[2]
-	# x11[0-3] += s2[3]
-	movq		0x28(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	paddd		%xmm2,%xmm10
-	paddd		%xmm3,%xmm11
-
-	# x12[0-3] += s3[0]
-	# x13[0-3] += s3[1]
-	movq		0x30(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	paddd		%xmm2,%xmm12
-	paddd		%xmm3,%xmm13
-	# x14[0-3] += s3[2]
-	# x15[0-3] += s3[3]
-	movq		0x38(%rdi),%xmm3
-	pshufd		$0x00,%xmm3,%xmm2
-	pshufd		$0x55,%xmm3,%xmm3
-	paddd		%xmm2,%xmm14
-	paddd		%xmm3,%xmm15
-
-	# x12 += counter values 0-3
-	paddd		%xmm1,%xmm12
-
-	# interleave 32-bit words in state n, n+1
-	movdqa		0x00(%rsp),%xmm0
-	movdqa		0x10(%rsp),%xmm1
-	movdqa		%xmm0,%xmm2
-	punpckldq	%xmm1,%xmm2
-	punpckhdq	%xmm1,%xmm0
-	movdqa		%xmm2,0x00(%rsp)
-	movdqa		%xmm0,0x10(%rsp)
-	movdqa		0x20(%rsp),%xmm0
-	movdqa		0x30(%rsp),%xmm1
-	movdqa		%xmm0,%xmm2
-	punpckldq	%xmm1,%xmm2
-	punpckhdq	%xmm1,%xmm0
-	movdqa		%xmm2,0x20(%rsp)
-	movdqa		%xmm0,0x30(%rsp)
-	movdqa		%xmm4,%xmm0
-	punpckldq	%xmm5,%xmm4
-	punpckhdq	%xmm5,%xmm0
-	movdqa		%xmm0,%xmm5
-	movdqa		%xmm6,%xmm0
-	punpckldq	%xmm7,%xmm6
-	punpckhdq	%xmm7,%xmm0
-	movdqa		%xmm0,%xmm7
-	movdqa		%xmm8,%xmm0
-	punpckldq	%xmm9,%xmm8
-	punpckhdq	%xmm9,%xmm0
-	movdqa		%xmm0,%xmm9
-	movdqa		%xmm10,%xmm0
-	punpckldq	%xmm11,%xmm10
-	punpckhdq	%xmm11,%xmm0
-	movdqa		%xmm0,%xmm11
-	movdqa		%xmm12,%xmm0
-	punpckldq	%xmm13,%xmm12
-	punpckhdq	%xmm13,%xmm0
-	movdqa		%xmm0,%xmm13
-	movdqa		%xmm14,%xmm0
-	punpckldq	%xmm15,%xmm14
-	punpckhdq	%xmm15,%xmm0
-	movdqa		%xmm0,%xmm15
-
-	# interleave 64-bit words in state n, n+2
-	movdqa		0x00(%rsp),%xmm0
-	movdqa		0x20(%rsp),%xmm1
-	movdqa		%xmm0,%xmm2
-	punpcklqdq	%xmm1,%xmm2
-	punpckhqdq	%xmm1,%xmm0
-	movdqa		%xmm2,0x00(%rsp)
-	movdqa		%xmm0,0x20(%rsp)
-	movdqa		0x10(%rsp),%xmm0
-	movdqa		0x30(%rsp),%xmm1
-	movdqa		%xmm0,%xmm2
-	punpcklqdq	%xmm1,%xmm2
-	punpckhqdq	%xmm1,%xmm0
-	movdqa		%xmm2,0x10(%rsp)
-	movdqa		%xmm0,0x30(%rsp)
-	movdqa		%xmm4,%xmm0
-	punpcklqdq	%xmm6,%xmm4
-	punpckhqdq	%xmm6,%xmm0
-	movdqa		%xmm0,%xmm6
-	movdqa		%xmm5,%xmm0
-	punpcklqdq	%xmm7,%xmm5
-	punpckhqdq	%xmm7,%xmm0
-	movdqa		%xmm0,%xmm7
-	movdqa		%xmm8,%xmm0
-	punpcklqdq	%xmm10,%xmm8
-	punpckhqdq	%xmm10,%xmm0
-	movdqa		%xmm0,%xmm10
-	movdqa		%xmm9,%xmm0
-	punpcklqdq	%xmm11,%xmm9
-	punpckhqdq	%xmm11,%xmm0
-	movdqa		%xmm0,%xmm11
-	movdqa		%xmm12,%xmm0
-	punpcklqdq	%xmm14,%xmm12
-	punpckhqdq	%xmm14,%xmm0
-	movdqa		%xmm0,%xmm14
-	movdqa		%xmm13,%xmm0
-	punpcklqdq	%xmm15,%xmm13
-	punpckhqdq	%xmm15,%xmm0
-	movdqa		%xmm0,%xmm15
-
-	# xor with corresponding input, write to output
-	movdqa		0x00(%rsp),%xmm0
-	movdqu		0x00(%rdx),%xmm1
-	pxor		%xmm1,%xmm0
-	movdqu		%xmm0,0x00(%rsi)
-	movdqa		0x10(%rsp),%xmm0
-	movdqu		0x80(%rdx),%xmm1
-	pxor		%xmm1,%xmm0
-	movdqu		%xmm0,0x80(%rsi)
-	movdqa		0x20(%rsp),%xmm0
-	movdqu		0x40(%rdx),%xmm1
-	pxor		%xmm1,%xmm0
-	movdqu		%xmm0,0x40(%rsi)
-	movdqa		0x30(%rsp),%xmm0
-	movdqu		0xc0(%rdx),%xmm1
-	pxor		%xmm1,%xmm0
-	movdqu		%xmm0,0xc0(%rsi)
-	movdqu		0x10(%rdx),%xmm1
-	pxor		%xmm1,%xmm4
-	movdqu		%xmm4,0x10(%rsi)
-	movdqu		0x90(%rdx),%xmm1
-	pxor		%xmm1,%xmm5
-	movdqu		%xmm5,0x90(%rsi)
-	movdqu		0x50(%rdx),%xmm1
-	pxor		%xmm1,%xmm6
-	movdqu		%xmm6,0x50(%rsi)
-	movdqu		0xd0(%rdx),%xmm1
-	pxor		%xmm1,%xmm7
-	movdqu		%xmm7,0xd0(%rsi)
-	movdqu		0x20(%rdx),%xmm1
-	pxor		%xmm1,%xmm8
-	movdqu		%xmm8,0x20(%rsi)
-	movdqu		0xa0(%rdx),%xmm1
-	pxor		%xmm1,%xmm9
-	movdqu		%xmm9,0xa0(%rsi)
-	movdqu		0x60(%rdx),%xmm1
-	pxor		%xmm1,%xmm10
-	movdqu		%xmm10,0x60(%rsi)
-	movdqu		0xe0(%rdx),%xmm1
-	pxor		%xmm1,%xmm11
-	movdqu		%xmm11,0xe0(%rsi)
-	movdqu		0x30(%rdx),%xmm1
-	pxor		%xmm1,%xmm12
-	movdqu		%xmm12,0x30(%rsi)
-	movdqu		0xb0(%rdx),%xmm1
-	pxor		%xmm1,%xmm13
-	movdqu		%xmm13,0xb0(%rsi)
-	movdqu		0x70(%rdx),%xmm1
-	pxor		%xmm1,%xmm14
-	movdqu		%xmm14,0x70(%rsi)
-	movdqu		0xf0(%rdx),%xmm1
-	pxor		%xmm1,%xmm15
-	movdqu		%xmm15,0xf0(%rsi)
-
-	lea		-8(%r10),%rsp
-	ret
-ENDPROC(chacha20_4block_xor_ssse3)
diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
deleted file mode 100644
index dce7c5d39c2f..000000000000
--- a/arch/x86/crypto/chacha20_glue.c
+++ /dev/null
@@ -1,146 +0,0 @@
-/*
- * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <crypto/algapi.h>
-#include <crypto/chacha20.h>
-#include <crypto/internal/skcipher.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <asm/fpu/api.h>
-#include <asm/simd.h>
-
-#define CHACHA20_STATE_ALIGN 16
-
-asmlinkage void chacha20_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
-asmlinkage void chacha20_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
-#ifdef CONFIG_AS_AVX2
-asmlinkage void chacha20_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src);
-static bool chacha20_use_avx2;
-#endif
-
-static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
-			    unsigned int bytes)
-{
-	u8 buf[CHACHA20_BLOCK_SIZE];
-
-#ifdef CONFIG_AS_AVX2
-	if (chacha20_use_avx2) {
-		while (bytes >= CHACHA20_BLOCK_SIZE * 8) {
-			chacha20_8block_xor_avx2(state, dst, src);
-			bytes -= CHACHA20_BLOCK_SIZE * 8;
-			src += CHACHA20_BLOCK_SIZE * 8;
-			dst += CHACHA20_BLOCK_SIZE * 8;
-			state[12] += 8;
-		}
-	}
-#endif
-	while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
-		chacha20_4block_xor_ssse3(state, dst, src);
-		bytes -= CHACHA20_BLOCK_SIZE * 4;
-		src += CHACHA20_BLOCK_SIZE * 4;
-		dst += CHACHA20_BLOCK_SIZE * 4;
-		state[12] += 4;
-	}
-	while (bytes >= CHACHA20_BLOCK_SIZE) {
-		chacha20_block_xor_ssse3(state, dst, src);
-		bytes -= CHACHA20_BLOCK_SIZE;
-		src += CHACHA20_BLOCK_SIZE;
-		dst += CHACHA20_BLOCK_SIZE;
-		state[12]++;
-	}
-	if (bytes) {
-		memcpy(buf, src, bytes);
-		chacha20_block_xor_ssse3(state, buf, buf);
-		memcpy(dst, buf, bytes);
-	}
-}
-
-static int chacha20_simd(struct skcipher_request *req)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
-	u32 *state, state_buf[16 + 2] __aligned(8);
-	struct skcipher_walk walk;
-	int err;
-
-	BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
-	state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
-
-	if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
-		return crypto_chacha20_crypt(req);
-
-	err = skcipher_walk_virt(&walk, req, true);
-
-	crypto_chacha20_init(state, ctx, walk.iv);
-
-	kernel_fpu_begin();
-
-	while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
-		chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
-				rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
-		err = skcipher_walk_done(&walk,
-					 walk.nbytes % CHACHA20_BLOCK_SIZE);
-	}
-
-	if (walk.nbytes) {
-		chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
-				walk.nbytes);
-		err = skcipher_walk_done(&walk, 0);
-	}
-
-	kernel_fpu_end();
-
-	return err;
-}
-
-static struct skcipher_alg alg = {
-	.base.cra_name		= "chacha20",
-	.base.cra_driver_name	= "chacha20-simd",
-	.base.cra_priority	= 300,
-	.base.cra_blocksize	= 1,
-	.base.cra_ctxsize	= sizeof(struct chacha20_ctx),
-	.base.cra_module	= THIS_MODULE,
-
-	.min_keysize		= CHACHA20_KEY_SIZE,
-	.max_keysize		= CHACHA20_KEY_SIZE,
-	.ivsize			= CHACHA20_IV_SIZE,
-	.chunksize		= CHACHA20_BLOCK_SIZE,
-	.setkey			= crypto_chacha20_setkey,
-	.encrypt		= chacha20_simd,
-	.decrypt		= chacha20_simd,
-};
-
-static int __init chacha20_simd_mod_init(void)
-{
-	if (!boot_cpu_has(X86_FEATURE_SSSE3))
-		return -ENODEV;
-
-#ifdef CONFIG_AS_AVX2
-	chacha20_use_avx2 = boot_cpu_has(X86_FEATURE_AVX) &&
-			    boot_cpu_has(X86_FEATURE_AVX2) &&
-			    cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
-#endif
-	return crypto_register_skcipher(&alg);
-}
-
-static void __exit chacha20_simd_mod_fini(void)
-{
-	crypto_unregister_skcipher(&alg);
-}
-
-module_init(chacha20_simd_mod_init);
-module_exit(chacha20_simd_mod_fini);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
-MODULE_DESCRIPTION("chacha20 cipher algorithm, SIMD accelerated");
-MODULE_ALIAS_CRYPTO("chacha20");
-MODULE_ALIAS_CRYPTO("chacha20-simd");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 47859a0f8052..93cd4d199447 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1433,22 +1433,6 @@ config CRYPTO_CHACHA20
 
 	  ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
 	  Bernstein and further specified in RFC7539 for use in IETF protocols.
-	  This is the portable C implementation of ChaCha20.
-
-	  See also:
-	  <http://cr.yp.to/chacha/chacha-20080128.pdf>
-
-config CRYPTO_CHACHA20_X86_64
-	tristate "ChaCha20 cipher algorithm (x86_64/SSSE3/AVX2)"
-	depends on X86 && 64BIT
-	select CRYPTO_BLKCIPHER
-	select CRYPTO_CHACHA20
-	help
-	  ChaCha20 cipher algorithm, RFC7539.
-
-	  ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
-	  Bernstein and further specified in RFC7539 for use in IETF protocols.
-	  This is the x86_64 assembler implementation using SIMD instructions.
 
 	  See also:
 	  <http://cr.yp.to/chacha/chacha-20080128.pdf>
diff --git a/crypto/Makefile b/crypto/Makefile
index 5e60348d02e2..587103b87890 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -117,7 +117,7 @@ obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
 obj-$(CONFIG_CRYPTO_SEED) += seed.o
 obj-$(CONFIG_CRYPTO_SPECK) += speck.o
 obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
-obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
+obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_zinc.o
 obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_zinc.o
 obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
 obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c
deleted file mode 100644
index e451c3cb6a56..000000000000
--- a/crypto/chacha20_generic.c
+++ /dev/null
@@ -1,136 +0,0 @@
-/*
- * ChaCha20 256-bit cipher algorithm, RFC7539
- *
- * Copyright (C) 2015 Martin Willi
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- */
-
-#include <asm/unaligned.h>
-#include <crypto/algapi.h>
-#include <crypto/chacha20.h>
-#include <crypto/internal/skcipher.h>
-#include <linux/module.h>
-
-static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src,
-			     unsigned int bytes)
-{
-	u32 stream[CHACHA20_BLOCK_WORDS];
-
-	if (dst != src)
-		memcpy(dst, src, bytes);
-
-	while (bytes >= CHACHA20_BLOCK_SIZE) {
-		chacha20_block(state, stream);
-		crypto_xor(dst, (const u8 *)stream, CHACHA20_BLOCK_SIZE);
-		bytes -= CHACHA20_BLOCK_SIZE;
-		dst += CHACHA20_BLOCK_SIZE;
-	}
-	if (bytes) {
-		chacha20_block(state, stream);
-		crypto_xor(dst, (const u8 *)stream, bytes);
-	}
-}
-
-void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv)
-{
-	state[0]  = 0x61707865; /* "expa" */
-	state[1]  = 0x3320646e; /* "nd 3" */
-	state[2]  = 0x79622d32; /* "2-by" */
-	state[3]  = 0x6b206574; /* "te k" */
-	state[4]  = ctx->key[0];
-	state[5]  = ctx->key[1];
-	state[6]  = ctx->key[2];
-	state[7]  = ctx->key[3];
-	state[8]  = ctx->key[4];
-	state[9]  = ctx->key[5];
-	state[10] = ctx->key[6];
-	state[11] = ctx->key[7];
-	state[12] = get_unaligned_le32(iv +  0);
-	state[13] = get_unaligned_le32(iv +  4);
-	state[14] = get_unaligned_le32(iv +  8);
-	state[15] = get_unaligned_le32(iv + 12);
-}
-EXPORT_SYMBOL_GPL(crypto_chacha20_init);
-
-int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
-			   unsigned int keysize)
-{
-	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int i;
-
-	if (keysize != CHACHA20_KEY_SIZE)
-		return -EINVAL;
-
-	for (i = 0; i < ARRAY_SIZE(ctx->key); i++)
-		ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(crypto_chacha20_setkey);
-
-int crypto_chacha20_crypt(struct skcipher_request *req)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
-	struct skcipher_walk walk;
-	u32 state[16];
-	int err;
-
-	err = skcipher_walk_virt(&walk, req, true);
-
-	crypto_chacha20_init(state, ctx, walk.iv);
-
-	while (walk.nbytes > 0) {
-		unsigned int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes = round_down(nbytes, walk.stride);
-
-		chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
-				 nbytes);
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-	}
-
-	return err;
-}
-EXPORT_SYMBOL_GPL(crypto_chacha20_crypt);
-
-static struct skcipher_alg alg = {
-	.base.cra_name		= "chacha20",
-	.base.cra_driver_name	= "chacha20-generic",
-	.base.cra_priority	= 100,
-	.base.cra_blocksize	= 1,
-	.base.cra_ctxsize	= sizeof(struct chacha20_ctx),
-	.base.cra_module	= THIS_MODULE,
-
-	.min_keysize		= CHACHA20_KEY_SIZE,
-	.max_keysize		= CHACHA20_KEY_SIZE,
-	.ivsize			= CHACHA20_IV_SIZE,
-	.chunksize		= CHACHA20_BLOCK_SIZE,
-	.setkey			= crypto_chacha20_setkey,
-	.encrypt		= crypto_chacha20_crypt,
-	.decrypt		= crypto_chacha20_crypt,
-};
-
-static int __init chacha20_generic_mod_init(void)
-{
-	return crypto_register_skcipher(&alg);
-}
-
-static void __exit chacha20_generic_mod_fini(void)
-{
-	crypto_unregister_skcipher(&alg);
-}
-
-module_init(chacha20_generic_mod_init);
-module_exit(chacha20_generic_mod_fini);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
-MODULE_DESCRIPTION("chacha20 cipher algorithm");
-MODULE_ALIAS_CRYPTO("chacha20");
-MODULE_ALIAS_CRYPTO("chacha20-generic");
diff --git a/crypto/chacha20_zinc.c b/crypto/chacha20_zinc.c
new file mode 100644
index 000000000000..5df88fdee066
--- /dev/null
+++ b/crypto/chacha20_zinc.c
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include <asm/unaligned.h>
+#include <crypto/algapi.h>
+#include <crypto/internal/skcipher.h>
+#include <zinc/chacha20.h>
+#include <linux/module.h>
+
+struct chacha20_key_ctx {
+	u32 key[8];
+};
+
+static int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
+				  unsigned int keysize)
+{
+	struct chacha20_key_ctx *key_ctx = crypto_skcipher_ctx(tfm);
+	int i;
+
+	if (keysize != CHACHA20_KEY_SIZE)
+		return -EINVAL;
+
+	for (i = 0; i < ARRAY_SIZE(key_ctx->key); ++i)
+		key_ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
+
+	return 0;
+}
+
+static int crypto_chacha20_crypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct chacha20_key_ctx *key_ctx = crypto_skcipher_ctx(tfm);
+	struct chacha20_ctx ctx;
+	struct skcipher_walk walk;
+	simd_context_t simd_context;
+	int err, i;
+
+	err = skcipher_walk_virt(&walk, req, true);
+	if (unlikely(err))
+		return err;
+
+	memcpy(ctx.key, key_ctx->key, sizeof(ctx.key));
+	for (i = 0; i < ARRAY_SIZE(ctx.counter); ++i)
+		ctx.counter[i] = get_unaligned_le32(walk.iv + i * sizeof(u32));
+
+	simd_context = simd_get();
+	while (walk.nbytes > 0) {
+		unsigned int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes = round_down(nbytes, walk.stride);
+
+		chacha20(&ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes,
+			 simd_context);
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+		simd_context = simd_relax(simd_context);
+	}
+	simd_put(simd_context);
+
+	return err;
+}
+
+static struct skcipher_alg alg = {
+	.base.cra_name		= "chacha20",
+	.base.cra_driver_name	= "chacha20-software",
+	.base.cra_priority	= 100,
+	.base.cra_blocksize	= 1,
+	.base.cra_ctxsize	= sizeof(struct chacha20_key_ctx),
+	.base.cra_module	= THIS_MODULE,
+
+	.min_keysize		= CHACHA20_KEY_SIZE,
+	.max_keysize		= CHACHA20_KEY_SIZE,
+	.ivsize			= CHACHA20_IV_SIZE,
+	.chunksize		= CHACHA20_BLOCK_SIZE,
+	.setkey			= crypto_chacha20_setkey,
+	.encrypt		= crypto_chacha20_crypt,
+	.decrypt		= crypto_chacha20_crypt,
+};
+
+static int __init chacha20_mod_init(void)
+{
+	return crypto_register_skcipher(&alg);
+}
+
+static void __exit chacha20_mod_exit(void)
+{
+	crypto_unregister_skcipher(&alg);
+}
+
+module_init(chacha20_mod_init);
+module_exit(chacha20_mod_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_DESCRIPTION("ChaCha20 stream cipher");
+MODULE_ALIAS_CRYPTO("chacha20");
+MODULE_ALIAS_CRYPTO("chacha20-software");
diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c
index bf523797bef3..b26adb9ed898 100644
--- a/crypto/chacha20poly1305.c
+++ b/crypto/chacha20poly1305.c
@@ -13,7 +13,7 @@
 #include <crypto/internal/hash.h>
 #include <crypto/internal/skcipher.h>
 #include <crypto/scatterwalk.h>
-#include <crypto/chacha20.h>
+#include <zinc/chacha20.h>
 #include <zinc/poly1305.h>
 #include <linux/err.h>
 #include <linux/init.h>
diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h
index b83d66073db0..3b92f58f3891 100644
--- a/include/crypto/chacha20.h
+++ b/include/crypto/chacha20.h
@@ -6,23 +6,11 @@
 #ifndef _CRYPTO_CHACHA20_H
 #define _CRYPTO_CHACHA20_H
 
-#include <crypto/skcipher.h>
-#include <linux/types.h>
-#include <linux/crypto.h>
-
 #define CHACHA20_IV_SIZE	16
 #define CHACHA20_KEY_SIZE	32
 #define CHACHA20_BLOCK_SIZE	64
 #define CHACHA20_BLOCK_WORDS	(CHACHA20_BLOCK_SIZE / sizeof(u32))
 
-struct chacha20_ctx {
-	u32 key[8];
-};
-
 void chacha20_block(u32 *state, u32 *stream);
-void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv);
-int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
-			   unsigned int keysize);
-int crypto_chacha20_crypt(struct skcipher_request *req);
 
 #endif
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 19/20] security/keys: rewrite big_key crypto to use Zinc
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (17 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 18/20] crypto: port ChaCha20 " Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-15 23:52   ` kbuild test robot
  2018-09-14 16:22 ` [PATCH net-next v4 20/20] net: WireGuard secure network tunnel Jason A. Donenfeld
  19 siblings, 1 reply; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh
  Cc: Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers, David Howells

A while back, I noticed that the crypto and crypto API usage in big_keys
were entirely broken in multiple ways, so I rewrote it. Now, I'm
rewriting it again, but this time using Zinc's ChaCha20Poly1305
function. This makes the file considerably more simple; the diffstat
alone should justify this commit. It also should be faster, since it no
longer requires a mutex around the "aead api object" (nor allocations),
allowing us to encrypt multiple items in parallel. We also benefit from
being able to pass any type of pointer to Zinc, so we can get rid of the
ridiculously complex custom page allocator that big_key really doesn't
need.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Samuel Neves <sneves@dei.uc.pt>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: David Howells <dhowells@redhat.com>
---
 security/keys/Kconfig   |   4 +-
 security/keys/big_key.c | 230 +++++-----------------------------------
 2 files changed, 28 insertions(+), 206 deletions(-)

diff --git a/security/keys/Kconfig b/security/keys/Kconfig
index 6462e6654ccf..66ff26298fb3 100644
--- a/security/keys/Kconfig
+++ b/security/keys/Kconfig
@@ -45,9 +45,7 @@ config BIG_KEYS
 	bool "Large payload keys"
 	depends on KEYS
 	depends on TMPFS
-	select CRYPTO
-	select CRYPTO_AES
-	select CRYPTO_GCM
+	select ZINC_CHACHA20POLY1305
 	help
 	  This option provides support for holding large keys within the kernel
 	  (for example Kerberos ticket caches).  The data may be stored out to
diff --git a/security/keys/big_key.c b/security/keys/big_key.c
index 2806e70d7f8f..934497ecbf65 100644
--- a/security/keys/big_key.c
+++ b/security/keys/big_key.c
@@ -1,6 +1,6 @@
 /* Large capacity key type
  *
- * Copyright (C) 2017 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ * Copyright (C) 2017-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
  * Copyright (C) 2013 Red Hat, Inc. All Rights Reserved.
  * Written by David Howells (dhowells@redhat.com)
  *
@@ -16,20 +16,10 @@
 #include <linux/file.h>
 #include <linux/shmem_fs.h>
 #include <linux/err.h>
-#include <linux/scatterlist.h>
 #include <linux/random.h>
-#include <linux/vmalloc.h>
 #include <keys/user-type.h>
 #include <keys/big_key-type.h>
-#include <crypto/aead.h>
-#include <crypto/gcm.h>
-
-struct big_key_buf {
-	unsigned int		nr_pages;
-	void			*virt;
-	struct scatterlist	*sg;
-	struct page		*pages[];
-};
+#include <zinc/chacha20poly1305.h>
 
 /*
  * Layout of key payload words.
@@ -41,14 +31,6 @@ enum {
 	big_key_len,
 };
 
-/*
- * Crypto operation with big_key data
- */
-enum big_key_op {
-	BIG_KEY_ENC,
-	BIG_KEY_DEC,
-};
-
 /*
  * If the data is under this limit, there's no point creating a shm file to
  * hold it as the permanently resident metadata for the shmem fs will be at
@@ -56,16 +38,6 @@ enum big_key_op {
  */
 #define BIG_KEY_FILE_THRESHOLD (sizeof(struct inode) + sizeof(struct dentry))
 
-/*
- * Key size for big_key data encryption
- */
-#define ENC_KEY_SIZE 32
-
-/*
- * Authentication tag length
- */
-#define ENC_AUTHTAG_SIZE 16
-
 /*
  * big_key defined keys take an arbitrary string as the description and an
  * arbitrary blob of data as the payload
@@ -79,136 +51,20 @@ struct key_type key_type_big_key = {
 	.destroy		= big_key_destroy,
 	.describe		= big_key_describe,
 	.read			= big_key_read,
-	/* no ->update(); don't add it without changing big_key_crypt() nonce */
+	/* no ->update(); don't add it without changing chacha20poly1305's nonce */
 };
 
-/*
- * Crypto names for big_key data authenticated encryption
- */
-static const char big_key_alg_name[] = "gcm(aes)";
-#define BIG_KEY_IV_SIZE		GCM_AES_IV_SIZE
-
-/*
- * Crypto algorithms for big_key data authenticated encryption
- */
-static struct crypto_aead *big_key_aead;
-
-/*
- * Since changing the key affects the entire object, we need a mutex.
- */
-static DEFINE_MUTEX(big_key_aead_lock);
-
-/*
- * Encrypt/decrypt big_key data
- */
-static int big_key_crypt(enum big_key_op op, struct big_key_buf *buf, size_t datalen, u8 *key)
-{
-	int ret;
-	struct aead_request *aead_req;
-	/* We always use a zero nonce. The reason we can get away with this is
-	 * because we're using a different randomly generated key for every
-	 * different encryption. Notably, too, key_type_big_key doesn't define
-	 * an .update function, so there's no chance we'll wind up reusing the
-	 * key to encrypt updated data. Simply put: one key, one encryption.
-	 */
-	u8 zero_nonce[BIG_KEY_IV_SIZE];
-
-	aead_req = aead_request_alloc(big_key_aead, GFP_KERNEL);
-	if (!aead_req)
-		return -ENOMEM;
-
-	memset(zero_nonce, 0, sizeof(zero_nonce));
-	aead_request_set_crypt(aead_req, buf->sg, buf->sg, datalen, zero_nonce);
-	aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
-	aead_request_set_ad(aead_req, 0);
-
-	mutex_lock(&big_key_aead_lock);
-	if (crypto_aead_setkey(big_key_aead, key, ENC_KEY_SIZE)) {
-		ret = -EAGAIN;
-		goto error;
-	}
-	if (op == BIG_KEY_ENC)
-		ret = crypto_aead_encrypt(aead_req);
-	else
-		ret = crypto_aead_decrypt(aead_req);
-error:
-	mutex_unlock(&big_key_aead_lock);
-	aead_request_free(aead_req);
-	return ret;
-}
-
-/*
- * Free up the buffer.
- */
-static void big_key_free_buffer(struct big_key_buf *buf)
-{
-	unsigned int i;
-
-	if (buf->virt) {
-		memset(buf->virt, 0, buf->nr_pages * PAGE_SIZE);
-		vunmap(buf->virt);
-	}
-
-	for (i = 0; i < buf->nr_pages; i++)
-		if (buf->pages[i])
-			__free_page(buf->pages[i]);
-
-	kfree(buf);
-}
-
-/*
- * Allocate a buffer consisting of a set of pages with a virtual mapping
- * applied over them.
- */
-static void *big_key_alloc_buffer(size_t len)
-{
-	struct big_key_buf *buf;
-	unsigned int npg = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-	unsigned int i, l;
-
-	buf = kzalloc(sizeof(struct big_key_buf) +
-		      sizeof(struct page) * npg +
-		      sizeof(struct scatterlist) * npg,
-		      GFP_KERNEL);
-	if (!buf)
-		return NULL;
-
-	buf->nr_pages = npg;
-	buf->sg = (void *)(buf->pages + npg);
-	sg_init_table(buf->sg, npg);
-
-	for (i = 0; i < buf->nr_pages; i++) {
-		buf->pages[i] = alloc_page(GFP_KERNEL);
-		if (!buf->pages[i])
-			goto nomem;
-
-		l = min_t(size_t, len, PAGE_SIZE);
-		sg_set_page(&buf->sg[i], buf->pages[i], l, 0);
-		len -= l;
-	}
-
-	buf->virt = vmap(buf->pages, buf->nr_pages, VM_MAP, PAGE_KERNEL);
-	if (!buf->virt)
-		goto nomem;
-
-	return buf;
-
-nomem:
-	big_key_free_buffer(buf);
-	return NULL;
-}
-
 /*
  * Preparse a big key
  */
 int big_key_preparse(struct key_preparsed_payload *prep)
 {
-	struct big_key_buf *buf;
 	struct path *path = (struct path *)&prep->payload.data[big_key_path];
 	struct file *file;
-	u8 *enckey;
+	u8 *buf, *enckey;
 	ssize_t written;
-	size_t datalen = prep->datalen, enclen = datalen + ENC_AUTHTAG_SIZE;
+	size_t datalen = prep->datalen;
+	size_t enclen = datalen + CHACHA20POLY1305_AUTHTAGLEN;
 	int ret;
 
 	if (datalen <= 0 || datalen > 1024 * 1024 || !prep->data)
@@ -224,28 +80,28 @@ int big_key_preparse(struct key_preparsed_payload *prep)
 		 * to be swapped out if needed.
 		 *
 		 * File content is stored encrypted with randomly generated key.
+		 * Since the key is random for each file, we can set the nonce
+		 * to zero, provided we never define a ->update() call.
 		 */
 		loff_t pos = 0;
 
-		buf = big_key_alloc_buffer(enclen);
+		buf = kvmalloc(enclen, GFP_KERNEL);
 		if (!buf)
 			return -ENOMEM;
-		memcpy(buf->virt, prep->data, datalen);
 
 		/* generate random key */
-		enckey = kmalloc(ENC_KEY_SIZE, GFP_KERNEL);
+		enckey = kmalloc(CHACHA20POLY1305_KEYLEN, GFP_KERNEL);
 		if (!enckey) {
 			ret = -ENOMEM;
 			goto error;
 		}
-		ret = get_random_bytes_wait(enckey, ENC_KEY_SIZE);
+		ret = get_random_bytes_wait(enckey, CHACHA20POLY1305_KEYLEN);
 		if (unlikely(ret))
 			goto err_enckey;
 
-		/* encrypt aligned data */
-		ret = big_key_crypt(BIG_KEY_ENC, buf, datalen, enckey);
-		if (ret)
-			goto err_enckey;
+		/* encrypt data */
+		chacha20poly1305_encrypt(buf, prep->data, datalen, NULL, 0,
+					 0, enckey);
 
 		/* save aligned data to file */
 		file = shmem_kernel_file_setup("", enclen, 0);
@@ -254,7 +110,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
 			goto err_enckey;
 		}
 
-		written = kernel_write(file, buf->virt, enclen, &pos);
+		written = kernel_write(file, buf, enclen, &pos);
 		if (written != enclen) {
 			ret = written;
 			if (written >= 0)
@@ -269,7 +125,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
 		*path = file->f_path;
 		path_get(path);
 		fput(file);
-		big_key_free_buffer(buf);
+		kvfree(buf);
 	} else {
 		/* Just store the data in a buffer */
 		void *data = kmalloc(datalen, GFP_KERNEL);
@@ -287,7 +143,7 @@ int big_key_preparse(struct key_preparsed_payload *prep)
 err_enckey:
 	kzfree(enckey);
 error:
-	big_key_free_buffer(buf);
+	kvfree(buf);
 	return ret;
 }
 
@@ -365,14 +221,13 @@ long big_key_read(const struct key *key, char __user *buffer, size_t buflen)
 		return datalen;
 
 	if (datalen > BIG_KEY_FILE_THRESHOLD) {
-		struct big_key_buf *buf;
 		struct path *path = (struct path *)&key->payload.data[big_key_path];
 		struct file *file;
-		u8 *enckey = (u8 *)key->payload.data[big_key_data];
-		size_t enclen = datalen + ENC_AUTHTAG_SIZE;
+		u8 *buf, *enckey = (u8 *)key->payload.data[big_key_data];
+		size_t enclen = datalen + CHACHA20POLY1305_AUTHTAGLEN;
 		loff_t pos = 0;
 
-		buf = big_key_alloc_buffer(enclen);
+		buf = kvmalloc(enclen, GFP_KERNEL);
 		if (!buf)
 			return -ENOMEM;
 
@@ -383,26 +238,27 @@ long big_key_read(const struct key *key, char __user *buffer, size_t buflen)
 		}
 
 		/* read file to kernel and decrypt */
-		ret = kernel_read(file, buf->virt, enclen, &pos);
+		ret = kernel_read(file, buf, enclen, &pos);
 		if (ret >= 0 && ret != enclen) {
 			ret = -EIO;
 			goto err_fput;
 		}
 
-		ret = big_key_crypt(BIG_KEY_DEC, buf, enclen, enckey);
-		if (ret)
+		ret = chacha20poly1305_decrypt(buf, buf, enclen, NULL, 0, 0,
+					       enckey) ? 0 : -EINVAL;
+		if (unlikely(ret))
 			goto err_fput;
 
 		ret = datalen;
 
 		/* copy decrypted data to user */
-		if (copy_to_user(buffer, buf->virt, datalen) != 0)
+		if (copy_to_user(buffer, buf, datalen) != 0)
 			ret = -EFAULT;
 
 err_fput:
 		fput(file);
 error:
-		big_key_free_buffer(buf);
+		kvfree(buf);
 	} else {
 		ret = datalen;
 		if (copy_to_user(buffer, key->payload.data[big_key_data],
@@ -418,39 +274,7 @@ long big_key_read(const struct key *key, char __user *buffer, size_t buflen)
  */
 static int __init big_key_init(void)
 {
-	int ret;
-
-	/* init block cipher */
-	big_key_aead = crypto_alloc_aead(big_key_alg_name, 0, CRYPTO_ALG_ASYNC);
-	if (IS_ERR(big_key_aead)) {
-		ret = PTR_ERR(big_key_aead);
-		pr_err("Can't alloc crypto: %d\n", ret);
-		return ret;
-	}
-
-	if (unlikely(crypto_aead_ivsize(big_key_aead) != BIG_KEY_IV_SIZE)) {
-		WARN(1, "big key algorithm changed?");
-		ret = -EINVAL;
-		goto free_aead;
-	}
-
-	ret = crypto_aead_setauthsize(big_key_aead, ENC_AUTHTAG_SIZE);
-	if (ret < 0) {
-		pr_err("Can't set crypto auth tag len: %d\n", ret);
-		goto free_aead;
-	}
-
-	ret = register_key_type(&key_type_big_key);
-	if (ret < 0) {
-		pr_err("Can't register type: %d\n", ret);
-		goto free_aead;
-	}
-
-	return 0;
-
-free_aead:
-	crypto_free_aead(big_key_aead);
-	return ret;
+	return register_key_type(&key_type_big_key);
 }
 
 late_initcall(big_key_init);
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH net-next v4 20/20] net: WireGuard secure network tunnel
  2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
                   ` (18 preceding siblings ...)
  2018-09-14 16:22 ` [PATCH net-next v4 19/20] security/keys: rewrite big_key crypto to use Zinc Jason A. Donenfeld
@ 2018-09-14 16:22 ` Jason A. Donenfeld
  2018-09-15 22:17   ` kbuild test robot
  2018-09-15 23:01   ` kbuild test robot
  19 siblings, 2 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 16:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-crypto, davem, gregkh; +Cc: Jason A. Donenfeld

WireGuard is a layer 3 secure networking tunnel made specifically for
the kernel, that aims to be much simpler and easier to audit than IPsec.
Extensive documentation and description of the protocol and
considerations, along with formal proofs of the cryptography, are
available at:

  * https://www.wireguard.com/
  * https://www.wireguard.com/papers/wireguard.pdf

This commit implements WireGuard as a simple network device driver,
accessible in the usual RTNL way used by virtual network drivers. It
makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of
networking subsystem APIs. It has a somewhat novel multicore queueing
system designed for maximum throughput and minimal latency of encryption
operations, but it is implemented modestly using workqueues and NAPI.
Configuration is done via generic Netlink, and following a review from
the Netlink maintainer a year ago, several high profile userspace
have already implemented the API.

This commit also comes with several different tests, both in-kernel
tests and out-of-kernel tests based on network namespaces, taking profit
of the fact that sockets used by WireGuard intentionally stay in the
namespace the WireGuard interface was originally created, exactly like
the semantics of userspace tun devices. See wireguard.com/netns/ for
pictures and examples.

The source code is fairly short, but rather than combining everything
into a single file, WireGuard is developed as cleanly separable files,
making auditing and comprehension easier. Things are laid out as
follows:

  * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the
    cryptographic aspects of the protocol, and are mostly data-only in
    nature, taking in buffers of bytes and spitting out buffers of
    bytes. They also handle reference counting for their various shared
    pieces of data, like keys and key lists.

  * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for
    ratelimiting certain types of cryptographic operations in accordance
    with particular WireGuard semantics.

  * allowedips.[ch], hashtables.[ch]: The main lookup structures of
    WireGuard, the former being trie-like with particular semantics, an
    integral part of the design of the protocol, and the latter just
    being nice helper functions around the specific hashtables we use.

  * device.[ch]: Implementation of functions for the netdevice and for
    rtnl, responsible for maintaining the life of a given interface and
    wiring it up to the rest of WireGuard.

  * peer.[ch]: Each interface has a list of peers, with helper functions
    available here for creation, destruction, and reference counting.

  * socket.[ch]: Implementation of functions related to udp_socket and
    the general set of kernel socket APIs, for sending and receiving
    ciphertext UDP packets, and taking care of WireGuard-specific sticky
    socket routing semantics for the automatic roaming.

  * netlink.[ch]: Userspace API entry point for configuring WireGuard
    peers and devices. The API has been implemented by several userspace
    tools and network management utility, and the WireGuard project
    distributes the basic wg(8) tool.

  * queueing.[ch]: Shared function on the rx and tx path for handling
    the various queues used in the multicore algorithms.

  * send.c: Handles encrypting outgoing packets in parallel on
    multiple cores, before sending them in order on a single core, via
    workqueues and ring buffers. Also handles sending handshake and cookie
    messages as part of the protocol, in parallel.

  * receive.c: Handles decrypting incoming packets in parallel on
    multiple cores, before passing them off in order to be ingested via
    the rest of the networking subsystem with GRO via the typical NAPI
    poll function. Also handles receiving handshake and cookie messages
    as part of the protocol, in parallel.

  * timers.[ch]: Uses the timer wheel to implement protocol particular
    event timeouts, and gives a set of very simple event-driven entry
    point functions for callers.

  * main.c, version.h: Initialization and deinitialization of the module.

  * selftest/*.h: Runtime unit tests for some of the most security
    sensitive functions.

  * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing
    script using network namespaces.

This commit aims to be as self-contained as possible, implementing
WireGuard as a standalone module not needing much special handling or
coordination from the network subsystem. I expect for future
optimizations to the network stack to positively improve WireGuard, and
vice-versa, but for the time being, this exists as intentionally
standalone.

We introduce a menu option for CONFIG_WIREGUARD, as well as providing a
verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: David Miller <davem@davemloft.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
---
 MAINTAINERS                                  |   8 +
 drivers/net/Kconfig                          |  30 +
 drivers/net/Makefile                         |   1 +
 drivers/net/wireguard/Makefile               |  18 +
 drivers/net/wireguard/allowedips.c           | 404 ++++++++++
 drivers/net/wireguard/allowedips.h           |  55 ++
 drivers/net/wireguard/cookie.c               | 234 ++++++
 drivers/net/wireguard/cookie.h               |  59 ++
 drivers/net/wireguard/device.c               | 438 +++++++++++
 drivers/net/wireguard/device.h               |  65 ++
 drivers/net/wireguard/hashtables.c           | 209 +++++
 drivers/net/wireguard/hashtables.h           |  63 ++
 drivers/net/wireguard/main.c                 |  65 ++
 drivers/net/wireguard/messages.h             | 128 +++
 drivers/net/wireguard/netlink.c              | 605 ++++++++++++++
 drivers/net/wireguard/netlink.h              |  12 +
 drivers/net/wireguard/noise.c                | 784 +++++++++++++++++++
 drivers/net/wireguard/noise.h                | 129 +++
 drivers/net/wireguard/peer.c                 | 191 +++++
 drivers/net/wireguard/peer.h                 |  87 ++
 drivers/net/wireguard/queueing.c             |  52 ++
 drivers/net/wireguard/queueing.h             | 193 +++++
 drivers/net/wireguard/ratelimiter.c          | 220 ++++++
 drivers/net/wireguard/ratelimiter.h          |  19 +
 drivers/net/wireguard/receive.c              | 597 ++++++++++++++
 drivers/net/wireguard/selftest/allowedips.h  | 656 ++++++++++++++++
 drivers/net/wireguard/selftest/counter.h     | 103 +++
 drivers/net/wireguard/selftest/ratelimiter.h | 174 ++++
 drivers/net/wireguard/send.c                 | 420 ++++++++++
 drivers/net/wireguard/socket.c               | 435 ++++++++++
 drivers/net/wireguard/socket.h               |  44 ++
 drivers/net/wireguard/timers.c               | 256 ++++++
 drivers/net/wireguard/timers.h               |  30 +
 drivers/net/wireguard/version.h              |   1 +
 include/uapi/linux/wireguard.h               | 190 +++++
 tools/testing/selftests/wireguard/netns.sh   | 499 ++++++++++++
 36 files changed, 7474 insertions(+)
 create mode 100644 drivers/net/wireguard/Makefile
 create mode 100644 drivers/net/wireguard/allowedips.c
 create mode 100644 drivers/net/wireguard/allowedips.h
 create mode 100644 drivers/net/wireguard/cookie.c
 create mode 100644 drivers/net/wireguard/cookie.h
 create mode 100644 drivers/net/wireguard/device.c
 create mode 100644 drivers/net/wireguard/device.h
 create mode 100644 drivers/net/wireguard/hashtables.c
 create mode 100644 drivers/net/wireguard/hashtables.h
 create mode 100644 drivers/net/wireguard/main.c
 create mode 100644 drivers/net/wireguard/messages.h
 create mode 100644 drivers/net/wireguard/netlink.c
 create mode 100644 drivers/net/wireguard/netlink.h
 create mode 100644 drivers/net/wireguard/noise.c
 create mode 100644 drivers/net/wireguard/noise.h
 create mode 100644 drivers/net/wireguard/peer.c
 create mode 100644 drivers/net/wireguard/peer.h
 create mode 100644 drivers/net/wireguard/queueing.c
 create mode 100644 drivers/net/wireguard/queueing.h
 create mode 100644 drivers/net/wireguard/ratelimiter.c
 create mode 100644 drivers/net/wireguard/ratelimiter.h
 create mode 100644 drivers/net/wireguard/receive.c
 create mode 100644 drivers/net/wireguard/selftest/allowedips.h
 create mode 100644 drivers/net/wireguard/selftest/counter.h
 create mode 100644 drivers/net/wireguard/selftest/ratelimiter.h
 create mode 100644 drivers/net/wireguard/send.c
 create mode 100644 drivers/net/wireguard/socket.c
 create mode 100644 drivers/net/wireguard/socket.h
 create mode 100644 drivers/net/wireguard/timers.c
 create mode 100644 drivers/net/wireguard/timers.h
 create mode 100644 drivers/net/wireguard/version.h
 create mode 100644 include/uapi/linux/wireguard.h
 create mode 100755 tools/testing/selftests/wireguard/netns.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index d2092e52320d..2043437adf0b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15813,6 +15813,14 @@ L:	linux-gpio@vger.kernel.org
 S:	Maintained
 F:	drivers/gpio/gpio-ws16c48.c
 
+WIREGUARD SECURE NETWORK TUNNEL
+M:	Jason A. Donenfeld <Jason@zx2c4.com>
+S:	Maintained
+F:	drivers/net/wireguard/
+F:	tools/testing/selftests/wireguard/
+L:	wireguard@lists.zx2c4.com
+L:	netdev@vger.kernel.org
+
 WISTRON LAPTOP BUTTON DRIVER
 M:	Miloslav Trmac <mitr@volny.cz>
 S:	Maintained
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index d03775100f7d..aa631fe3b395 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -70,6 +70,36 @@ config DUMMY
 	  To compile this driver as a module, choose M here: the module
 	  will be called dummy.
 
+config WIREGUARD
+	tristate "WireGuard secure network tunnel"
+	depends on NET && INET
+	select NET_UDP_TUNNEL
+	select DST_CACHE
+	select ZINC_CHACHA20POLY1305
+	select ZINC_BLAKE2S
+	select ZINC_CURVE25519
+	default m
+	help
+	  WireGuard is a secure, fast, and easy to use replacement for IPSec
+	  that uses modern cryptography and clever networking tricks. It's
+	  designed to be fairly general purpose and abstract enough to fit most
+	  use cases, while at the same time remaining extremely simple to
+	  configure. See www.wireguard.com for more info.
+
+	  It's safe to say Y or M here, as the driver is very lightweight and
+	  is only in use when an administrator chooses to add an interface.
+
+config WIREGUARD_DEBUG
+	bool "Debugging checks and verbose messages"
+	depends on WIREGUARD
+	help
+	  This will write log messages for handshake and other events
+	  that occur for a WireGuard interface. It will also perform some
+	  extra validation checks and unit tests at various points. This is
+	  only useful for debugging.
+
+	  Say N here unless you know what you're doing.
+
 config EQUALIZER
 	tristate "EQL (serial line load balancing) support"
 	---help---
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 21cde7e78621..f0acd11a143d 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_RIONET) += rionet.o
 obj-$(CONFIG_NET_TEAM) += team/
 obj-$(CONFIG_TUN) += tun.o
 obj-$(CONFIG_TAP) += tap.o
+obj-$(CONFIG_WIREGUARD) += wireguard/
 obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
diff --git a/drivers/net/wireguard/Makefile b/drivers/net/wireguard/Makefile
new file mode 100644
index 000000000000..d8856255bc9d
--- /dev/null
+++ b/drivers/net/wireguard/Makefile
@@ -0,0 +1,18 @@
+ccflags-y := -O3
+ccflags-y += -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt'
+ccflags-$(CONFIG_WIREGUARD_DEBUG) += -DDEBUG
+wireguard-y := main.o
+wireguard-y += noise.o
+wireguard-y += device.o
+wireguard-y += peer.o
+wireguard-y += timers.o
+wireguard-y += queueing.o
+wireguard-y += send.o
+wireguard-y += receive.o
+wireguard-y += socket.o
+wireguard-y += hashtables.o
+wireguard-y += allowedips.o
+wireguard-y += ratelimiter.o
+wireguard-y += cookie.o
+wireguard-y += netlink.o
+obj-$(CONFIG_WIREGUARD) := wireguard.o
diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
new file mode 100644
index 000000000000..fab15ad50170
--- /dev/null
+++ b/drivers/net/wireguard/allowedips.c
@@ -0,0 +1,404 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "allowedips.h"
+#include "peer.h"
+
+struct allowedips_node {
+	struct wireguard_peer __rcu *peer;
+	struct rcu_head rcu;
+	struct allowedips_node __rcu *bit[2];
+	/* While it may seem scandalous that we waste space for v4,
+	 * we're alloc'ing to the nearest power of 2 anyway, so this
+	 * doesn't actually make a difference.
+	 */
+	u8 bits[16] __aligned(__alignof(u64));
+	u8 cidr, bit_at_a, bit_at_b;
+};
+
+static __always_inline void swap_endian(u8 *dst, const u8 *src, u8 bits)
+{
+	if (bits == 32)
+		*(u32 *)dst = be32_to_cpu(*(const __be32 *)src);
+	else if (bits == 128) {
+		((u64 *)dst)[0] = be64_to_cpu(((const __be64 *)src)[0]);
+		((u64 *)dst)[1] = be64_to_cpu(((const __be64 *)src)[1]);
+	}
+}
+
+static void copy_and_assign_cidr(struct allowedips_node *node, const u8 *src,
+				 u8 cidr, u8 bits)
+{
+	node->cidr = cidr;
+	node->bit_at_a = cidr / 8U;
+#ifdef __LITTLE_ENDIAN
+	node->bit_at_a ^= (bits / 8U - 1U) % 8U;
+#endif
+	node->bit_at_b = 7U - (cidr % 8U);
+	memcpy(node->bits, src, bits / 8U);
+}
+
+#define choose_node(parent, key)                                               \
+	parent->bit[(key[parent->bit_at_a] >> parent->bit_at_b) & 1]
+
+static void node_free_rcu(struct rcu_head *rcu)
+{
+	kfree(container_of(rcu, struct allowedips_node, rcu));
+}
+
+#define push_rcu(stack, p, len) ({                                             \
+		if (rcu_access_pointer(p)) {                                   \
+			BUG_ON(len >= 128);                                    \
+			stack[len++] = rcu_dereference_raw(p);                 \
+		}                                                              \
+		true;                                                          \
+	})
+static void root_free_rcu(struct rcu_head *rcu)
+{
+	struct allowedips_node *node, *stack[128] = {
+		container_of(rcu, struct allowedips_node, rcu) };
+	unsigned int len = 1;
+
+	while (len > 0 && (node = stack[--len]) &&
+	       push_rcu(stack, node->bit[0], len) &&
+	       push_rcu(stack, node->bit[1], len))
+		kfree(node);
+}
+
+static int
+walk_by_peer(struct allowedips_node __rcu *top, u8 bits,
+	     struct allowedips_cursor *cursor, struct wireguard_peer *peer,
+	     int (*func)(void *ctx, const u8 *ip, u8 cidr, int family),
+	     void *ctx, struct mutex *lock)
+{
+	const int address_family = bits == 32 ? AF_INET : AF_INET6;
+	u8 ip[16] __aligned(__alignof(u64));
+	struct allowedips_node *node;
+	int ret;
+
+	if (!rcu_access_pointer(top))
+		return 0;
+
+	if (!cursor->len)
+		push_rcu(cursor->stack, top, cursor->len);
+
+	for (; cursor->len > 0 && (node = cursor->stack[cursor->len - 1]);
+	     --cursor->len, push_rcu(cursor->stack, node->bit[0], cursor->len),
+	     push_rcu(cursor->stack, node->bit[1], cursor->len)) {
+		const unsigned int cidr_bytes = DIV_ROUND_UP(node->cidr, 8U);
+
+		if (rcu_dereference_protected(node->peer,
+					      lockdep_is_held(lock)) != peer)
+			continue;
+
+		swap_endian(ip, node->bits, bits);
+		memset(ip + cidr_bytes, 0, bits / 8U - cidr_bytes);
+		if (node->cidr)
+			ip[cidr_bytes - 1U] &= ~0U << (-node->cidr % 8U);
+
+		ret = func(ctx, ip, node->cidr, address_family);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+#undef push_rcu
+
+#define ref(p) rcu_access_pointer(p)
+#define deref(p) rcu_dereference_protected(*p, lockdep_is_held(lock))
+#define push(p) ({                                                             \
+		BUG_ON(len >= 128);                                            \
+		stack[len++] = p;                                              \
+	})
+static void walk_remove_by_peer(struct allowedips_node __rcu **top,
+				struct wireguard_peer *peer, struct mutex *lock)
+{
+	struct allowedips_node __rcu **stack[128], **nptr;
+	struct allowedips_node *node, *prev;
+	unsigned int len;
+
+	if (unlikely(!peer || !ref(*top)))
+		return;
+
+	for (prev = NULL, len = 0, push(top); len > 0; prev = node) {
+		nptr = stack[len - 1];
+		node = deref(nptr);
+		if (!node) {
+			--len;
+			continue;
+		}
+		if (!prev || ref(prev->bit[0]) == node ||
+		    ref(prev->bit[1]) == node) {
+			if (ref(node->bit[0]))
+				push(&node->bit[0]);
+			else if (ref(node->bit[1]))
+				push(&node->bit[1]);
+		} else if (ref(node->bit[0]) == prev) {
+			if (ref(node->bit[1]))
+				push(&node->bit[1]);
+		} else {
+			if (rcu_dereference_protected(node->peer,
+				lockdep_is_held(lock)) == peer) {
+				RCU_INIT_POINTER(node->peer, NULL);
+				if (!node->bit[0] || !node->bit[1]) {
+					rcu_assign_pointer(*nptr,
+					deref(&node->bit[!ref(node->bit[0])]));
+					call_rcu_bh(&node->rcu, node_free_rcu);
+					node = deref(nptr);
+				}
+			}
+			--len;
+		}
+	}
+}
+#undef ref
+#undef deref
+#undef push
+
+static __always_inline unsigned int fls128(u64 a, u64 b)
+{
+	return a ? fls64(a) + 64U : fls64(b);
+}
+
+static __always_inline u8 common_bits(const struct allowedips_node *node,
+				      const u8 *key, u8 bits)
+{
+	if (bits == 32)
+		return 32U - fls(*(const u32 *)node->bits ^ *(const u32 *)key);
+	else if (bits == 128)
+		return 128U - fls128(
+			*(const u64 *)&node->bits[0] ^ *(const u64 *)&key[0],
+			*(const u64 *)&node->bits[8] ^ *(const u64 *)&key[8]);
+	return 0;
+}
+
+/* This could be much faster if it actually just compared the common bits
+ * properly, by precomputing a mask bswap(~0 << (32 - cidr)), and the rest, but
+ * it turns out that common_bits is already super fast on modern processors,
+ * even taking into account the unfortunate bswap. So, we just inline it like
+ * this instead.
+ */
+#define prefix_matches(node, key, bits)                                        \
+	(common_bits(node, key, bits) >= node->cidr)
+
+static __always_inline struct allowedips_node *
+find_node(struct allowedips_node *trie, u8 bits, const u8 *key)
+{
+	struct allowedips_node *node = trie, *found = NULL;
+
+	while (node && prefix_matches(node, key, bits)) {
+		if (rcu_access_pointer(node->peer))
+			found = node;
+		if (node->cidr == bits)
+			break;
+		node = rcu_dereference_bh(choose_node(node, key));
+	}
+	return found;
+}
+
+/* Returns a strong reference to a peer */
+static __always_inline struct wireguard_peer *
+lookup(struct allowedips_node __rcu *root, u8 bits, const void *be_ip)
+{
+	u8 ip[16] __aligned(__alignof(u64));
+	struct wireguard_peer *peer = NULL;
+	struct allowedips_node *node;
+
+	swap_endian(ip, be_ip, bits);
+
+	rcu_read_lock_bh();
+retry:
+	node = find_node(rcu_dereference_bh(root), bits, ip);
+	if (node) {
+		peer = peer_get_maybe_zero(rcu_dereference_bh(node->peer));
+		if (!peer)
+			goto retry;
+	}
+	rcu_read_unlock_bh();
+	return peer;
+}
+
+__attribute__((nonnull(1))) static inline bool
+node_placement(struct allowedips_node __rcu *trie, const u8 *key, u8 cidr,
+	       u8 bits, struct allowedips_node **rnode, struct mutex *lock)
+{
+	struct allowedips_node *node = rcu_dereference_protected(trie,
+						lockdep_is_held(lock));
+	struct allowedips_node *parent = NULL;
+	bool exact = false;
+
+	while (node && node->cidr <= cidr && prefix_matches(node, key, bits)) {
+		parent = node;
+		if (parent->cidr == cidr) {
+			exact = true;
+			break;
+		}
+		node = rcu_dereference_protected(choose_node(parent, key),
+						 lockdep_is_held(lock));
+	}
+	*rnode = parent;
+	return exact;
+}
+
+static int add(struct allowedips_node __rcu **trie, u8 bits, const u8 *be_key,
+	       u8 cidr, struct wireguard_peer *peer, struct mutex *lock)
+{
+	struct allowedips_node *node, *parent, *down, *newnode;
+	u8 key[16] __aligned(__alignof(u64));
+
+	if (unlikely(cidr > bits || !peer))
+		return -EINVAL;
+
+	swap_endian(key, be_key, bits);
+
+	if (!rcu_access_pointer(*trie)) {
+		node = kzalloc(sizeof(*node), GFP_KERNEL);
+		if (unlikely(!node))
+			return -ENOMEM;
+		RCU_INIT_POINTER(node->peer, peer);
+		copy_and_assign_cidr(node, key, cidr, bits);
+		rcu_assign_pointer(*trie, node);
+		return 0;
+	}
+	if (node_placement(*trie, key, cidr, bits, &node, lock)) {
+		rcu_assign_pointer(node->peer, peer);
+		return 0;
+	}
+
+	newnode = kzalloc(sizeof(*newnode), GFP_KERNEL);
+	if (unlikely(!newnode))
+		return -ENOMEM;
+	RCU_INIT_POINTER(newnode->peer, peer);
+	copy_and_assign_cidr(newnode, key, cidr, bits);
+
+	if (!node)
+		down = rcu_dereference_protected(*trie, lockdep_is_held(lock));
+	else {
+		down = rcu_dereference_protected(choose_node(node, key),
+						 lockdep_is_held(lock));
+		if (!down) {
+			rcu_assign_pointer(choose_node(node, key), newnode);
+			return 0;
+		}
+	}
+	cidr = min(cidr, common_bits(down, key, bits));
+	parent = node;
+
+	if (newnode->cidr == cidr) {
+		rcu_assign_pointer(choose_node(newnode, down->bits), down);
+		if (!parent)
+			rcu_assign_pointer(*trie, newnode);
+		else
+			rcu_assign_pointer(choose_node(parent, newnode->bits),
+					   newnode);
+	} else {
+		node = kzalloc(sizeof(*node), GFP_KERNEL);
+		if (unlikely(!node)) {
+			kfree(newnode);
+			return -ENOMEM;
+		}
+		copy_and_assign_cidr(node, newnode->bits, cidr, bits);
+
+		rcu_assign_pointer(choose_node(node, down->bits), down);
+		rcu_assign_pointer(choose_node(node, newnode->bits), newnode);
+		if (!parent)
+			rcu_assign_pointer(*trie, node);
+		else
+			rcu_assign_pointer(choose_node(parent, node->bits),
+					   node);
+	}
+	return 0;
+}
+
+void allowedips_init(struct allowedips *table)
+{
+	table->root4 = table->root6 = NULL;
+	table->seq = 1;
+}
+
+void allowedips_free(struct allowedips *table, struct mutex *lock)
+{
+	struct allowedips_node __rcu *old4 = table->root4, *old6 = table->root6;
+	++table->seq;
+	RCU_INIT_POINTER(table->root4, NULL);
+	RCU_INIT_POINTER(table->root6, NULL);
+	if (rcu_access_pointer(old4))
+		call_rcu_bh(&rcu_dereference_protected(old4,
+				lockdep_is_held(lock))->rcu, root_free_rcu);
+	if (rcu_access_pointer(old6))
+		call_rcu_bh(&rcu_dereference_protected(old6,
+				lockdep_is_held(lock))->rcu, root_free_rcu);
+}
+
+int allowedips_insert_v4(struct allowedips *table, const struct in_addr *ip,
+			 u8 cidr, struct wireguard_peer *peer,
+			 struct mutex *lock)
+{
+	++table->seq;
+	return add(&table->root4, 32, (const u8 *)ip, cidr, peer, lock);
+}
+
+int allowedips_insert_v6(struct allowedips *table, const struct in6_addr *ip,
+			 u8 cidr, struct wireguard_peer *peer,
+			 struct mutex *lock)
+{
+	++table->seq;
+	return add(&table->root6, 128, (const u8 *)ip, cidr, peer, lock);
+}
+
+void allowedips_remove_by_peer(struct allowedips *table,
+			       struct wireguard_peer *peer, struct mutex *lock)
+{
+	++table->seq;
+	walk_remove_by_peer(&table->root4, peer, lock);
+	walk_remove_by_peer(&table->root6, peer, lock);
+}
+
+int allowedips_walk_by_peer(struct allowedips *table,
+			    struct allowedips_cursor *cursor,
+			    struct wireguard_peer *peer,
+			    int (*func)(void *ctx, const u8 *ip, u8 cidr, int family),
+			    void *ctx, struct mutex *lock)
+{
+	int ret;
+
+	if (!cursor->seq)
+		cursor->seq = table->seq;
+	else if (cursor->seq != table->seq)
+		return 0;
+
+	if (!cursor->second_half) {
+		ret = walk_by_peer(table->root4, 32, cursor, peer, func, ctx, lock);
+		if (ret)
+			return ret;
+		cursor->len = 0;
+		cursor->second_half = true;
+	}
+	return walk_by_peer(table->root6, 128, cursor, peer, func, ctx, lock);
+}
+
+/* Returns a strong reference to a peer */
+struct wireguard_peer *allowedips_lookup_dst(struct allowedips *table,
+					     struct sk_buff *skb)
+{
+	if (skb->protocol == htons(ETH_P_IP))
+		return lookup(table->root4, 32, &ip_hdr(skb)->daddr);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		return lookup(table->root6, 128, &ipv6_hdr(skb)->daddr);
+	return NULL;
+}
+
+/* Returns a strong reference to a peer */
+struct wireguard_peer *allowedips_lookup_src(struct allowedips *table,
+					     struct sk_buff *skb)
+{
+	if (skb->protocol == htons(ETH_P_IP))
+		return lookup(table->root4, 32, &ip_hdr(skb)->saddr);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		return lookup(table->root6, 128, &ipv6_hdr(skb)->saddr);
+	return NULL;
+}
+
+#include "selftest/allowedips.h"
diff --git a/drivers/net/wireguard/allowedips.h b/drivers/net/wireguard/allowedips.h
new file mode 100644
index 000000000000..d5ba1bee595e
--- /dev/null
+++ b/drivers/net/wireguard/allowedips.h
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_ALLOWEDIPS_H
+#define _WG_ALLOWEDIPS_H
+
+#include <linux/mutex.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+struct wireguard_peer;
+struct allowedips_node;
+
+struct allowedips {
+	struct allowedips_node __rcu *root4;
+	struct allowedips_node __rcu *root6;
+	u64 seq;
+};
+
+struct allowedips_cursor {
+	u64 seq;
+	struct allowedips_node *stack[128];
+	unsigned int len;
+	bool second_half;
+};
+
+void allowedips_init(struct allowedips *table);
+void allowedips_free(struct allowedips *table, struct mutex *mutex);
+int allowedips_insert_v4(struct allowedips *table, const struct in_addr *ip,
+			 u8 cidr, struct wireguard_peer *peer,
+			 struct mutex *lock);
+int allowedips_insert_v6(struct allowedips *table, const struct in6_addr *ip,
+			 u8 cidr, struct wireguard_peer *peer,
+			 struct mutex *lock);
+void allowedips_remove_by_peer(struct allowedips *table,
+			       struct wireguard_peer *peer, struct mutex *lock);
+int allowedips_walk_by_peer(struct allowedips *table,
+			    struct allowedips_cursor *cursor,
+			    struct wireguard_peer *peer,
+			    int (*func)(void *ctx, const u8 *ip, u8 cidr, int family),
+			    void *ctx, struct mutex *lock);
+
+/* These return a strong reference to a peer: */
+struct wireguard_peer *allowedips_lookup_dst(struct allowedips *table,
+					     struct sk_buff *skb);
+struct wireguard_peer *allowedips_lookup_src(struct allowedips *table,
+					     struct sk_buff *skb);
+
+#ifdef DEBUG
+bool allowedips_selftest(void);
+#endif
+
+#endif /* _WG_ALLOWEDIPS_H */
diff --git a/drivers/net/wireguard/cookie.c b/drivers/net/wireguard/cookie.c
new file mode 100644
index 000000000000..d0739622341a
--- /dev/null
+++ b/drivers/net/wireguard/cookie.c
@@ -0,0 +1,234 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "cookie.h"
+#include "peer.h"
+#include "device.h"
+#include "messages.h"
+#include "ratelimiter.h"
+#include "timers.h"
+
+#include <zinc/blake2s.h>
+#include <zinc/chacha20poly1305.h>
+
+#include <net/ipv6.h>
+#include <crypto/algapi.h>
+
+void cookie_checker_init(struct cookie_checker *checker,
+			 struct wireguard_device *wg)
+{
+	init_rwsem(&checker->secret_lock);
+	checker->secret_birthdate = ktime_get_boot_fast_ns();
+	get_random_bytes(checker->secret, NOISE_HASH_LEN);
+	checker->device = wg;
+}
+
+enum { COOKIE_KEY_LABEL_LEN = 8 };
+static const u8 mac1_key_label[COOKIE_KEY_LABEL_LEN] = "mac1----";
+static const u8 cookie_key_label[COOKIE_KEY_LABEL_LEN] = "cookie--";
+
+static void precompute_key(u8 key[NOISE_SYMMETRIC_KEY_LEN],
+			   const u8 pubkey[NOISE_PUBLIC_KEY_LEN],
+			   const u8 label[COOKIE_KEY_LABEL_LEN])
+{
+	struct blake2s_state blake;
+
+	blake2s_init(&blake, NOISE_SYMMETRIC_KEY_LEN);
+	blake2s_update(&blake, label, COOKIE_KEY_LABEL_LEN);
+	blake2s_update(&blake, pubkey, NOISE_PUBLIC_KEY_LEN);
+	blake2s_final(&blake, key, NOISE_SYMMETRIC_KEY_LEN);
+}
+
+/* Must hold peer->handshake.static_identity->lock */
+void cookie_checker_precompute_device_keys(struct cookie_checker *checker)
+{
+	if (likely(checker->device->static_identity.has_identity)) {
+		precompute_key(checker->cookie_encryption_key,
+			       checker->device->static_identity.static_public,
+			       cookie_key_label);
+		precompute_key(checker->message_mac1_key,
+			       checker->device->static_identity.static_public,
+			       mac1_key_label);
+	} else {
+		memset(checker->cookie_encryption_key, 0,
+		       NOISE_SYMMETRIC_KEY_LEN);
+		memset(checker->message_mac1_key, 0, NOISE_SYMMETRIC_KEY_LEN);
+	}
+}
+
+void cookie_checker_precompute_peer_keys(struct wireguard_peer *peer)
+{
+	precompute_key(peer->latest_cookie.cookie_decryption_key,
+		       peer->handshake.remote_static, cookie_key_label);
+	precompute_key(peer->latest_cookie.message_mac1_key,
+		       peer->handshake.remote_static, mac1_key_label);
+}
+
+void cookie_init(struct cookie *cookie)
+{
+	memset(cookie, 0, sizeof(*cookie));
+	init_rwsem(&cookie->lock);
+}
+
+static void compute_mac1(u8 mac1[COOKIE_LEN], const void *message, size_t len,
+			 const u8 key[NOISE_SYMMETRIC_KEY_LEN])
+{
+	len = len - sizeof(struct message_macs) +
+	      offsetof(struct message_macs, mac1);
+	blake2s(mac1, message, key, COOKIE_LEN, len, NOISE_SYMMETRIC_KEY_LEN);
+}
+
+static void compute_mac2(u8 mac2[COOKIE_LEN], const void *message, size_t len,
+			 const u8 cookie[COOKIE_LEN])
+{
+	len = len - sizeof(struct message_macs) +
+	      offsetof(struct message_macs, mac2);
+	blake2s(mac2, message, cookie, COOKIE_LEN, len, COOKIE_LEN);
+}
+
+static void make_cookie(u8 cookie[COOKIE_LEN], struct sk_buff *skb,
+			struct cookie_checker *checker)
+{
+	struct blake2s_state state;
+
+	if (has_expired(checker->secret_birthdate, COOKIE_SECRET_MAX_AGE)) {
+		down_write(&checker->secret_lock);
+		checker->secret_birthdate = ktime_get_boot_fast_ns();
+		get_random_bytes(checker->secret, NOISE_HASH_LEN);
+		up_write(&checker->secret_lock);
+	}
+
+	down_read(&checker->secret_lock);
+
+	blake2s_init_key(&state, COOKIE_LEN, checker->secret, NOISE_HASH_LEN);
+	if (skb->protocol == htons(ETH_P_IP))
+		blake2s_update(&state, (u8 *)&ip_hdr(skb)->saddr,
+			       sizeof(struct in_addr));
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		blake2s_update(&state, (u8 *)&ipv6_hdr(skb)->saddr,
+			       sizeof(struct in6_addr));
+	blake2s_update(&state, (u8 *)&udp_hdr(skb)->source, sizeof(__be16));
+	blake2s_final(&state, cookie, COOKIE_LEN);
+
+	up_read(&checker->secret_lock);
+}
+
+enum cookie_mac_state cookie_validate_packet(struct cookie_checker *checker,
+					     struct sk_buff *skb,
+					     bool check_cookie)
+{
+	struct message_macs *macs = (struct message_macs *)
+		(skb->data + skb->len - sizeof(*macs));
+	enum cookie_mac_state ret;
+	u8 computed_mac[COOKIE_LEN];
+	u8 cookie[COOKIE_LEN];
+
+	ret = INVALID_MAC;
+	compute_mac1(computed_mac, skb->data, skb->len,
+		     checker->message_mac1_key);
+	if (crypto_memneq(computed_mac, macs->mac1, COOKIE_LEN))
+		goto out;
+
+	ret = VALID_MAC_BUT_NO_COOKIE;
+
+	if (!check_cookie)
+		goto out;
+
+	make_cookie(cookie, skb, checker);
+
+	compute_mac2(computed_mac, skb->data, skb->len, cookie);
+	if (crypto_memneq(computed_mac, macs->mac2, COOKIE_LEN))
+		goto out;
+
+	ret = VALID_MAC_WITH_COOKIE_BUT_RATELIMITED;
+	if (!ratelimiter_allow(skb, dev_net(checker->device->dev)))
+		goto out;
+
+	ret = VALID_MAC_WITH_COOKIE;
+
+out:
+	return ret;
+}
+
+void cookie_add_mac_to_packet(void *message, size_t len,
+			      struct wireguard_peer *peer)
+{
+	struct message_macs *macs = (struct message_macs *)
+		((u8 *)message + len - sizeof(*macs));
+
+	down_write(&peer->latest_cookie.lock);
+	compute_mac1(macs->mac1, message, len,
+		     peer->latest_cookie.message_mac1_key);
+	memcpy(peer->latest_cookie.last_mac1_sent, macs->mac1, COOKIE_LEN);
+	peer->latest_cookie.have_sent_mac1 = true;
+	up_write(&peer->latest_cookie.lock);
+
+	down_read(&peer->latest_cookie.lock);
+	if (peer->latest_cookie.is_valid &&
+	    !has_expired(peer->latest_cookie.birthdate,
+			 COOKIE_SECRET_MAX_AGE - COOKIE_SECRET_LATENCY))
+		compute_mac2(macs->mac2, message, len,
+			     peer->latest_cookie.cookie);
+	else
+		memset(macs->mac2, 0, COOKIE_LEN);
+	up_read(&peer->latest_cookie.lock);
+}
+
+void cookie_message_create(struct message_handshake_cookie *dst,
+			   struct sk_buff *skb, __le32 index,
+			   struct cookie_checker *checker)
+{
+	struct message_macs *macs = (struct message_macs *)
+		((u8 *)skb->data + skb->len - sizeof(*macs));
+	u8 cookie[COOKIE_LEN];
+
+	dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE);
+	dst->receiver_index = index;
+	get_random_bytes_wait(dst->nonce, COOKIE_NONCE_LEN);
+
+	make_cookie(cookie, skb, checker);
+	xchacha20poly1305_encrypt(dst->encrypted_cookie, cookie, COOKIE_LEN,
+				  macs->mac1, COOKIE_LEN, dst->nonce,
+				  checker->cookie_encryption_key);
+}
+
+void cookie_message_consume(struct message_handshake_cookie *src,
+			    struct wireguard_device *wg)
+{
+	struct wireguard_peer *peer = NULL;
+	u8 cookie[COOKIE_LEN];
+	bool ret;
+
+	if (unlikely(!index_hashtable_lookup(&wg->index_hashtable,
+					     INDEX_HASHTABLE_HANDSHAKE |
+					     INDEX_HASHTABLE_KEYPAIR,
+					     src->receiver_index, &peer)))
+		return;
+
+	down_read(&peer->latest_cookie.lock);
+	if (unlikely(!peer->latest_cookie.have_sent_mac1)) {
+		up_read(&peer->latest_cookie.lock);
+		goto out;
+	}
+	ret = xchacha20poly1305_decrypt(
+		cookie, src->encrypted_cookie, sizeof(src->encrypted_cookie),
+		peer->latest_cookie.last_mac1_sent, COOKIE_LEN, src->nonce,
+		peer->latest_cookie.cookie_decryption_key);
+	up_read(&peer->latest_cookie.lock);
+
+	if (ret) {
+		down_write(&peer->latest_cookie.lock);
+		memcpy(peer->latest_cookie.cookie, cookie, COOKIE_LEN);
+		peer->latest_cookie.birthdate = ktime_get_boot_fast_ns();
+		peer->latest_cookie.is_valid = true;
+		peer->latest_cookie.have_sent_mac1 = false;
+		up_write(&peer->latest_cookie.lock);
+	} else
+		net_dbg_ratelimited("%s: Could not decrypt invalid cookie response\n",
+				    wg->dev->name);
+
+out:
+	peer_put(peer);
+}
diff --git a/drivers/net/wireguard/cookie.h b/drivers/net/wireguard/cookie.h
new file mode 100644
index 000000000000..7802f6158d66
--- /dev/null
+++ b/drivers/net/wireguard/cookie.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_COOKIE_H
+#define _WG_COOKIE_H
+
+#include "messages.h"
+#include <linux/rwsem.h>
+
+struct wireguard_peer;
+
+struct cookie_checker {
+	u8 secret[NOISE_HASH_LEN];
+	u8 cookie_encryption_key[NOISE_SYMMETRIC_KEY_LEN];
+	u8 message_mac1_key[NOISE_SYMMETRIC_KEY_LEN];
+	u64 secret_birthdate;
+	struct rw_semaphore secret_lock;
+	struct wireguard_device *device;
+};
+
+struct cookie {
+	u64 birthdate;
+	bool is_valid;
+	u8 cookie[COOKIE_LEN];
+	bool have_sent_mac1;
+	u8 last_mac1_sent[COOKIE_LEN];
+	u8 cookie_decryption_key[NOISE_SYMMETRIC_KEY_LEN];
+	u8 message_mac1_key[NOISE_SYMMETRIC_KEY_LEN];
+	struct rw_semaphore lock;
+};
+
+enum cookie_mac_state {
+	INVALID_MAC,
+	VALID_MAC_BUT_NO_COOKIE,
+	VALID_MAC_WITH_COOKIE_BUT_RATELIMITED,
+	VALID_MAC_WITH_COOKIE
+};
+
+void cookie_checker_init(struct cookie_checker *checker,
+			 struct wireguard_device *wg);
+void cookie_checker_precompute_device_keys(struct cookie_checker *checker);
+void cookie_checker_precompute_peer_keys(struct wireguard_peer *peer);
+void cookie_init(struct cookie *cookie);
+
+enum cookie_mac_state cookie_validate_packet(struct cookie_checker *checker,
+					     struct sk_buff *skb,
+					     bool check_cookie);
+void cookie_add_mac_to_packet(void *message, size_t len,
+			      struct wireguard_peer *peer);
+
+void cookie_message_create(struct message_handshake_cookie *src,
+			   struct sk_buff *skb, __le32 index,
+			   struct cookie_checker *checker);
+void cookie_message_consume(struct message_handshake_cookie *src,
+			    struct wireguard_device *wg);
+
+#endif /* _WG_COOKIE_H */
diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
new file mode 100644
index 000000000000..a87a39e25e94
--- /dev/null
+++ b/drivers/net/wireguard/device.c
@@ -0,0 +1,438 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "queueing.h"
+#include "socket.h"
+#include "timers.h"
+#include "device.h"
+#include "ratelimiter.h"
+#include "peer.h"
+#include "messages.h"
+
+#include <linux/module.h>
+#include <linux/rtnetlink.h>
+#include <linux/inet.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/if_arp.h>
+#include <linux/icmp.h>
+#include <linux/suspend.h>
+#include <net/icmp.h>
+#include <net/rtnetlink.h>
+#include <net/ip_tunnels.h>
+#include <net/addrconf.h>
+
+static LIST_HEAD(device_list);
+
+static int open(struct net_device *dev)
+{
+	struct in_device *dev_v4 = __in_dev_get_rtnl(dev);
+	struct wireguard_device *wg = netdev_priv(dev);
+	struct inet6_dev *dev_v6 = __in6_dev_get(dev);
+	struct wireguard_peer *peer;
+	int ret;
+
+	if (dev_v4) {
+		/* At some point we might put this check near the ip_rt_send_
+		 * redirect call of ip_forward in net/ipv4/ip_forward.c, similar
+		 * to the current secpath check.
+		 */
+		IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false);
+		IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false;
+	}
+	if (dev_v6)
+		dev_v6->cnf.addr_gen_mode = IN6_ADDR_GEN_MODE_NONE;
+
+	ret = socket_init(wg, wg->incoming_port);
+	if (ret < 0)
+		return ret;
+	mutex_lock(&wg->device_update_lock);
+	list_for_each_entry (peer, &wg->peer_list, peer_list) {
+		packet_send_staged_packets(peer);
+		if (peer->persistent_keepalive_interval)
+			packet_send_keepalive(peer);
+	}
+	mutex_unlock(&wg->device_update_lock);
+	return 0;
+}
+
+#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID)
+static int pm_notification(struct notifier_block *nb, unsigned long action,
+			   void *data)
+{
+	struct wireguard_device *wg;
+	struct wireguard_peer *peer;
+
+	if (action != PM_HIBERNATION_PREPARE && action != PM_SUSPEND_PREPARE)
+		return 0;
+
+	rtnl_lock();
+	list_for_each_entry (wg, &device_list, device_list) {
+		mutex_lock(&wg->device_update_lock);
+		list_for_each_entry (peer, &wg->peer_list, peer_list) {
+			noise_handshake_clear(&peer->handshake);
+			noise_keypairs_clear(&peer->keypairs);
+			if (peer->timers_enabled)
+				del_timer(&peer->timer_zero_key_material);
+		}
+		mutex_unlock(&wg->device_update_lock);
+	}
+	rtnl_unlock();
+	rcu_barrier_bh();
+	return 0;
+}
+static struct notifier_block pm_notifier = { .notifier_call = pm_notification };
+#endif
+
+static int stop(struct net_device *dev)
+{
+	struct wireguard_device *wg = netdev_priv(dev);
+	struct wireguard_peer *peer;
+
+	mutex_lock(&wg->device_update_lock);
+	list_for_each_entry (peer, &wg->peer_list, peer_list) {
+		skb_queue_purge(&peer->staged_packet_queue);
+		timers_stop(peer);
+		noise_handshake_clear(&peer->handshake);
+		noise_keypairs_clear(&peer->keypairs);
+		atomic64_set(&peer->last_sent_handshake,
+			     ktime_get_boot_fast_ns() -
+				     (u64)(REKEY_TIMEOUT + 1) * NSEC_PER_SEC);
+	}
+	mutex_unlock(&wg->device_update_lock);
+	skb_queue_purge(&wg->incoming_handshakes);
+	socket_reinit(wg, NULL, NULL);
+	return 0;
+}
+
+static netdev_tx_t xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct wireguard_device *wg = netdev_priv(dev);
+	struct wireguard_peer *peer;
+	struct sk_buff *next;
+	struct sk_buff_head packets;
+	sa_family_t family;
+	u32 mtu;
+	int ret;
+
+	if (unlikely(skb_examine_untrusted_ip_hdr(skb) != skb->protocol)) {
+		ret = -EPROTONOSUPPORT;
+		net_dbg_ratelimited("%s: Invalid IP packet\n", dev->name);
+		goto err;
+	}
+
+	peer = allowedips_lookup_dst(&wg->peer_allowedips, skb);
+	if (unlikely(!peer)) {
+		ret = -ENOKEY;
+		if (skb->protocol == htons(ETH_P_IP))
+			net_dbg_ratelimited("%s: No peer has allowed IPs matching %pI4\n",
+					    dev->name, &ip_hdr(skb)->daddr);
+		else if (skb->protocol == htons(ETH_P_IPV6))
+			net_dbg_ratelimited("%s: No peer has allowed IPs matching %pI6\n",
+					    dev->name, &ipv6_hdr(skb)->daddr);
+		goto err;
+	}
+
+	family = READ_ONCE(peer->endpoint.addr.sa_family);
+	if (unlikely(family != AF_INET && family != AF_INET6)) {
+		ret = -EDESTADDRREQ;
+		net_dbg_ratelimited("%s: No valid endpoint has been configured or discovered for peer %llu\n",
+				    dev->name, peer->internal_id);
+		goto err_peer;
+	}
+
+	mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu;
+
+	__skb_queue_head_init(&packets);
+	if (!skb_is_gso(skb))
+		skb->next = NULL;
+	else {
+		struct sk_buff *segs = skb_gso_segment(skb, 0);
+
+		if (unlikely(IS_ERR(segs))) {
+			ret = PTR_ERR(segs);
+			goto err_peer;
+		}
+		dev_kfree_skb(skb);
+		skb = segs;
+	}
+	do {
+		next = skb->next;
+		skb->next = skb->prev = NULL;
+
+		skb = skb_share_check(skb, GFP_ATOMIC);
+		if (unlikely(!skb))
+			continue;
+
+		/* We only need to keep the original dst around for icmp,
+		 * so at this point we're in a position to drop it.
+		 */
+		skb_dst_drop(skb);
+
+		PACKET_CB(skb)->mtu = mtu;
+
+		__skb_queue_tail(&packets, skb);
+	} while ((skb = next) != NULL);
+
+	spin_lock_bh(&peer->staged_packet_queue.lock);
+	/* If the queue is getting too big, we start removing the oldest packets
+	 * until it's small again. We do this before adding the new packet, so
+	 * we don't remove GSO segments that are in excess.
+	 */
+	while (skb_queue_len(&peer->staged_packet_queue) > MAX_STAGED_PACKETS)
+		dev_kfree_skb(__skb_dequeue(&peer->staged_packet_queue));
+	skb_queue_splice_tail(&packets, &peer->staged_packet_queue);
+	spin_unlock_bh(&peer->staged_packet_queue.lock);
+
+	packet_send_staged_packets(peer);
+
+	peer_put(peer);
+	return NETDEV_TX_OK;
+
+err_peer:
+	peer_put(peer);
+err:
+	++dev->stats.tx_errors;
+	if (skb->protocol == htons(ETH_P_IP))
+		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0);
+	kfree_skb(skb);
+	return ret;
+}
+
+static const struct net_device_ops netdev_ops = {
+	.ndo_open		= open,
+	.ndo_stop		= stop,
+	.ndo_start_xmit		= xmit,
+	.ndo_get_stats64	= ip_tunnel_get_stats64
+};
+
+static void destruct(struct net_device *dev)
+{
+	struct wireguard_device *wg = netdev_priv(dev);
+
+	rtnl_lock();
+	list_del(&wg->device_list);
+	rtnl_unlock();
+	mutex_lock(&wg->device_update_lock);
+	wg->incoming_port = 0;
+	socket_reinit(wg, NULL, NULL);
+	allowedips_free(&wg->peer_allowedips, &wg->device_update_lock);
+	/* The final references are cleared in the below calls to destroy_workqueue. */
+	peer_remove_all(wg);
+	destroy_workqueue(wg->handshake_receive_wq);
+	destroy_workqueue(wg->handshake_send_wq);
+	destroy_workqueue(wg->packet_crypt_wq);
+	packet_queue_free(&wg->decrypt_queue, true);
+	packet_queue_free(&wg->encrypt_queue, true);
+	rcu_barrier_bh(); /* Wait for all the peers to be actually freed. */
+	ratelimiter_uninit();
+	memzero_explicit(&wg->static_identity, sizeof(wg->static_identity));
+	skb_queue_purge(&wg->incoming_handshakes);
+	free_percpu(dev->tstats);
+	free_percpu(wg->incoming_handshakes_worker);
+	if (wg->have_creating_net_ref)
+		put_net(wg->creating_net);
+	mutex_unlock(&wg->device_update_lock);
+
+	pr_debug("%s: Interface deleted\n", dev->name);
+	free_netdev(dev);
+}
+
+static const struct device_type device_type = { .name = KBUILD_MODNAME };
+
+static void setup(struct net_device *dev)
+{
+	struct wireguard_device *wg = netdev_priv(dev);
+	enum { WG_NETDEV_FEATURES = NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
+				    NETIF_F_SG | NETIF_F_GSO |
+				    NETIF_F_GSO_SOFTWARE | NETIF_F_HIGHDMA };
+
+	dev->netdev_ops = &netdev_ops;
+	dev->hard_header_len = 0;
+	dev->addr_len = 0;
+	dev->needed_headroom = DATA_PACKET_HEAD_ROOM;
+	dev->needed_tailroom = noise_encrypted_len(MESSAGE_PADDING_MULTIPLE);
+	dev->type = ARPHRD_NONE;
+	dev->flags = IFF_POINTOPOINT | IFF_NOARP;
+	dev->priv_flags |= IFF_NO_QUEUE;
+	dev->features |= NETIF_F_LLTX;
+	dev->features |= WG_NETDEV_FEATURES;
+	dev->hw_features |= WG_NETDEV_FEATURES;
+	dev->hw_enc_features |= WG_NETDEV_FEATURES;
+	dev->mtu = ETH_DATA_LEN - MESSAGE_MINIMUM_LENGTH -
+		   sizeof(struct udphdr) -
+		   max(sizeof(struct ipv6hdr), sizeof(struct iphdr));
+
+	SET_NETDEV_DEVTYPE(dev, &device_type);
+
+	/* We need to keep the dst around in case of icmp replies. */
+	netif_keep_dst(dev);
+
+	memset(wg, 0, sizeof(*wg));
+	wg->dev = dev;
+}
+
+static int newlink(struct net *src_net, struct net_device *dev,
+		   struct nlattr *tb[], struct nlattr *data[],
+		   struct netlink_ext_ack *extack)
+{
+	int ret = -ENOMEM;
+	struct wireguard_device *wg = netdev_priv(dev);
+
+	wg->creating_net = src_net;
+	init_rwsem(&wg->static_identity.lock);
+	mutex_init(&wg->socket_update_lock);
+	mutex_init(&wg->device_update_lock);
+	skb_queue_head_init(&wg->incoming_handshakes);
+	pubkey_hashtable_init(&wg->peer_hashtable);
+	index_hashtable_init(&wg->index_hashtable);
+	allowedips_init(&wg->peer_allowedips);
+	cookie_checker_init(&wg->cookie_checker, wg);
+	INIT_LIST_HEAD(&wg->peer_list);
+	wg->device_update_gen = 1;
+
+	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+	if (!dev->tstats)
+		goto error_1;
+
+	wg->incoming_handshakes_worker = packet_alloc_percpu_multicore_worker(
+		packet_handshake_receive_worker, wg);
+	if (!wg->incoming_handshakes_worker)
+		goto error_2;
+
+	wg->handshake_receive_wq = alloc_workqueue("wg-kex-%s",
+			WQ_CPU_INTENSIVE | WQ_FREEZABLE, 0, dev->name);
+	if (!wg->handshake_receive_wq)
+		goto error_3;
+
+	wg->handshake_send_wq = alloc_workqueue("wg-kex-%s",
+			WQ_UNBOUND | WQ_FREEZABLE, 0, dev->name);
+	if (!wg->handshake_send_wq)
+		goto error_4;
+
+	wg->packet_crypt_wq = alloc_workqueue("wg-crypt-%s",
+			WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0, dev->name);
+	if (!wg->packet_crypt_wq)
+		goto error_5;
+
+	if (packet_queue_init(&wg->encrypt_queue, packet_encrypt_worker, true,
+			      MAX_QUEUED_PACKETS) < 0)
+		goto error_6;
+
+	if (packet_queue_init(&wg->decrypt_queue, packet_decrypt_worker, true,
+			      MAX_QUEUED_PACKETS) < 0)
+		goto error_7;
+
+	ret = ratelimiter_init();
+	if (ret < 0)
+		goto error_8;
+
+	ret = register_netdevice(dev);
+	if (ret < 0)
+		goto error_9;
+
+	list_add(&wg->device_list, &device_list);
+
+	/* We wait until the end to assign priv_destructor, so that
+	 * register_netdevice doesn't call it for us if it fails.
+	 */
+	dev->priv_destructor = destruct;
+
+	pr_debug("%s: Interface created\n", dev->name);
+	return ret;
+
+error_9:
+	ratelimiter_uninit();
+error_8:
+	packet_queue_free(&wg->decrypt_queue, true);
+error_7:
+	packet_queue_free(&wg->encrypt_queue, true);
+error_6:
+	destroy_workqueue(wg->packet_crypt_wq);
+error_5:
+	destroy_workqueue(wg->handshake_send_wq);
+error_4:
+	destroy_workqueue(wg->handshake_receive_wq);
+error_3:
+	free_percpu(wg->incoming_handshakes_worker);
+error_2:
+	free_percpu(dev->tstats);
+error_1:
+	return ret;
+}
+
+static struct rtnl_link_ops link_ops __read_mostly = {
+	.kind			= KBUILD_MODNAME,
+	.priv_size		= sizeof(struct wireguard_device),
+	.setup			= setup,
+	.newlink		= newlink,
+};
+
+static int netdevice_notification(struct notifier_block *nb,
+				  unsigned long action, void *data)
+{
+	struct net_device *dev = ((struct netdev_notifier_info *)data)->dev;
+	struct wireguard_device *wg = netdev_priv(dev);
+
+	ASSERT_RTNL();
+
+	if (action != NETDEV_REGISTER || dev->netdev_ops != &netdev_ops)
+		return 0;
+
+	if (dev_net(dev) == wg->creating_net && wg->have_creating_net_ref) {
+		put_net(wg->creating_net);
+		wg->have_creating_net_ref = false;
+	} else if (dev_net(dev) != wg->creating_net &&
+		   !wg->have_creating_net_ref) {
+		wg->have_creating_net_ref = true;
+		get_net(wg->creating_net);
+	}
+	return 0;
+}
+
+static struct notifier_block netdevice_notifier = {
+	.notifier_call = netdevice_notification
+};
+
+int __init device_init(void)
+{
+	int ret;
+
+#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID)
+	ret = register_pm_notifier(&pm_notifier);
+	if (ret)
+		return ret;
+#endif
+
+	ret = register_netdevice_notifier(&netdevice_notifier);
+	if (ret)
+		goto error_pm;
+
+	ret = rtnl_link_register(&link_ops);
+	if (ret)
+		goto error_netdevice;
+
+	return 0;
+
+error_netdevice:
+	unregister_netdevice_notifier(&netdevice_notifier);
+error_pm:
+#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID)
+	unregister_pm_notifier(&pm_notifier);
+#endif
+	return ret;
+}
+
+void device_uninit(void)
+{
+	rtnl_link_unregister(&link_ops);
+	unregister_netdevice_notifier(&netdevice_notifier);
+#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID)
+	unregister_pm_notifier(&pm_notifier);
+#endif
+	rcu_barrier_bh();
+}
diff --git a/drivers/net/wireguard/device.h b/drivers/net/wireguard/device.h
new file mode 100644
index 000000000000..2499782518c1
--- /dev/null
+++ b/drivers/net/wireguard/device.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_DEVICE_H
+#define _WG_DEVICE_H
+
+#include "noise.h"
+#include "allowedips.h"
+#include "hashtables.h"
+#include "cookie.h"
+
+#include <linux/types.h>
+#include <linux/netdevice.h>
+#include <linux/workqueue.h>
+#include <linux/mutex.h>
+#include <linux/net.h>
+#include <linux/ptr_ring.h>
+
+struct wireguard_device;
+
+struct multicore_worker {
+	void *ptr;
+	struct work_struct work;
+};
+
+struct crypt_queue {
+	struct ptr_ring ring;
+	union {
+		struct {
+			struct multicore_worker __percpu *worker;
+			int last_cpu;
+		};
+		struct work_struct work;
+	};
+};
+
+struct wireguard_device {
+	struct net_device *dev;
+	struct crypt_queue encrypt_queue, decrypt_queue;
+	struct sock __rcu *sock4, *sock6;
+	struct net *creating_net;
+	struct noise_static_identity static_identity;
+	struct workqueue_struct *handshake_receive_wq, *handshake_send_wq;
+	struct workqueue_struct *packet_crypt_wq;
+	struct sk_buff_head incoming_handshakes;
+	int incoming_handshake_cpu;
+	struct multicore_worker __percpu *incoming_handshakes_worker;
+	struct cookie_checker cookie_checker;
+	struct pubkey_hashtable peer_hashtable;
+	struct index_hashtable index_hashtable;
+	struct allowedips peer_allowedips;
+	struct mutex device_update_lock, socket_update_lock;
+	struct list_head device_list, peer_list;
+	unsigned int num_peers, device_update_gen;
+	u32 fwmark;
+	u16 incoming_port;
+	bool have_creating_net_ref;
+};
+
+int device_init(void);
+void device_uninit(void);
+
+#endif /* _WG_DEVICE_H */
diff --git a/drivers/net/wireguard/hashtables.c b/drivers/net/wireguard/hashtables.c
new file mode 100644
index 000000000000..4ba228845f2d
--- /dev/null
+++ b/drivers/net/wireguard/hashtables.c
@@ -0,0 +1,209 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "hashtables.h"
+#include "peer.h"
+#include "noise.h"
+
+static inline struct hlist_head *pubkey_bucket(struct pubkey_hashtable *table,
+					const u8 pubkey[NOISE_PUBLIC_KEY_LEN])
+{
+	/* siphash gives us a secure 64bit number based on a random key. Since
+	 * the bits are uniformly distributed, we can then mask off to get the
+	 * bits we need.
+	 */
+	return &table->hashtable[
+		siphash(pubkey, NOISE_PUBLIC_KEY_LEN, &table->key) &
+			(HASH_SIZE(table->hashtable) - 1)];
+}
+
+void pubkey_hashtable_init(struct pubkey_hashtable *table)
+{
+	get_random_bytes(&table->key, sizeof(table->key));
+	hash_init(table->hashtable);
+	mutex_init(&table->lock);
+}
+
+void pubkey_hashtable_add(struct pubkey_hashtable *table,
+			  struct wireguard_peer *peer)
+{
+	mutex_lock(&table->lock);
+	hlist_add_head_rcu(&peer->pubkey_hash,
+			   pubkey_bucket(table, peer->handshake.remote_static));
+	mutex_unlock(&table->lock);
+}
+
+void pubkey_hashtable_remove(struct pubkey_hashtable *table,
+			     struct wireguard_peer *peer)
+{
+	mutex_lock(&table->lock);
+	hlist_del_init_rcu(&peer->pubkey_hash);
+	mutex_unlock(&table->lock);
+}
+
+/* Returns a strong reference to a peer */
+struct wireguard_peer *
+pubkey_hashtable_lookup(struct pubkey_hashtable *table,
+			const u8 pubkey[NOISE_PUBLIC_KEY_LEN])
+{
+	struct wireguard_peer *iter_peer, *peer = NULL;
+
+	rcu_read_lock_bh();
+	hlist_for_each_entry_rcu_bh (iter_peer, pubkey_bucket(table, pubkey),
+				     pubkey_hash) {
+		if (!memcmp(pubkey, iter_peer->handshake.remote_static,
+			    NOISE_PUBLIC_KEY_LEN)) {
+			peer = iter_peer;
+			break;
+		}
+	}
+	peer = peer_get_maybe_zero(peer);
+	rcu_read_unlock_bh();
+	return peer;
+}
+
+static inline struct hlist_head *index_bucket(struct index_hashtable *table,
+					      const __le32 index)
+{
+	/* Since the indices are random and thus all bits are uniformly
+	 * distributed, we can find its bucket simply by masking.
+	 */
+	return &table->hashtable[(__force u32)index &
+				 (HASH_SIZE(table->hashtable) - 1)];
+}
+
+void index_hashtable_init(struct index_hashtable *table)
+{
+	hash_init(table->hashtable);
+	spin_lock_init(&table->lock);
+}
+
+/* At the moment, we limit ourselves to 2^20 total peers, which generally might
+ * amount to 2^20*3 items in this hashtable. The algorithm below works by
+ * picking a random number and testing it. We can see that these limits mean we
+ * usually succeed pretty quickly:
+ *
+ * >>> def calculation(tries, size):
+ * ...     return (size / 2**32)**(tries - 1) *  (1 - (size / 2**32))
+ * ...
+ * >>> calculation(1, 2**20 * 3)
+ * 0.999267578125
+ * >>> calculation(2, 2**20 * 3)
+ * 0.0007318854331970215
+ * >>> calculation(3, 2**20 * 3)
+ * 5.360489012673497e-07
+ * >>> calculation(4, 2**20 * 3)
+ * 3.9261394135792216e-10
+ *
+ * At the moment, we don't do any masking, so this algorithm isn't exactly
+ * constant time in either the random guessing or in the hash list lookup. We
+ * could require a minimum of 3 tries, which would successfully mask the
+ * guessing. this would not, however, help with the growing hash lengths, which
+ * is another thing to consider moving forward.
+ */
+
+__le32 index_hashtable_insert(struct index_hashtable *table,
+			      struct index_hashtable_entry *entry)
+{
+	struct index_hashtable_entry *existing_entry;
+
+	spin_lock_bh(&table->lock);
+	hlist_del_init_rcu(&entry->index_hash);
+	spin_unlock_bh(&table->lock);
+
+	rcu_read_lock_bh();
+
+search_unused_slot:
+	/* First we try to find an unused slot, randomly, while unlocked. */
+	entry->index = (__force __le32)get_random_u32();
+	hlist_for_each_entry_rcu_bh (existing_entry,
+				     index_bucket(table, entry->index),
+				     index_hash) {
+		if (existing_entry->index == entry->index)
+			/* If it's already in use, we continue searching. */
+			goto search_unused_slot;
+	}
+
+	/* Once we've found an unused slot, we lock it, and then double-check
+	 * that nobody else stole it from us.
+	 */
+	spin_lock_bh(&table->lock);
+	hlist_for_each_entry_rcu_bh (existing_entry,
+				     index_bucket(table, entry->index),
+				     index_hash) {
+		if (existing_entry->index == entry->index) {
+			spin_unlock_bh(&table->lock);
+			/* If it was stolen, we start over. */
+			goto search_unused_slot;
+		}
+	}
+	/* Otherwise, we know we have it exclusively (since we're locked),
+	 * so we insert.
+	 */
+	hlist_add_head_rcu(&entry->index_hash,
+			   index_bucket(table, entry->index));
+	spin_unlock_bh(&table->lock);
+
+	rcu_read_unlock_bh();
+
+	return entry->index;
+}
+
+bool index_hashtable_replace(struct index_hashtable *table,
+			     struct index_hashtable_entry *old,
+			     struct index_hashtable_entry *new)
+{
+	if (unlikely(hlist_unhashed(&old->index_hash)))
+		return false;
+	spin_lock_bh(&table->lock);
+	new->index = old->index;
+	hlist_replace_rcu(&old->index_hash, &new->index_hash);
+
+	/* Calling init here NULLs out index_hash, and in fact after this
+	 * function returns, it's theoretically possible for this to get
+	 * reinserted elsewhere. That means the RCU lookup below might either
+	 * terminate early or jump between buckets, in which case the packet
+	 * simply gets dropped, which isn't terrible.
+	 */
+	INIT_HLIST_NODE(&old->index_hash);
+	spin_unlock_bh(&table->lock);
+	return true;
+}
+
+void index_hashtable_remove(struct index_hashtable *table,
+			    struct index_hashtable_entry *entry)
+{
+	spin_lock_bh(&table->lock);
+	hlist_del_init_rcu(&entry->index_hash);
+	spin_unlock_bh(&table->lock);
+}
+
+/* Returns a strong reference to a entry->peer */
+struct index_hashtable_entry *
+index_hashtable_lookup(struct index_hashtable *table,
+		       const enum index_hashtable_type type_mask,
+		       const __le32 index, struct wireguard_peer **peer)
+{
+	struct index_hashtable_entry *iter_entry, *entry = NULL;
+
+	rcu_read_lock_bh();
+	hlist_for_each_entry_rcu_bh (iter_entry, index_bucket(table, index),
+				     index_hash) {
+		if (iter_entry->index == index) {
+			if (likely(iter_entry->type & type_mask))
+				entry = iter_entry;
+			break;
+		}
+	}
+	if (likely(entry)) {
+		entry->peer = peer_get_maybe_zero(entry->peer);
+		if (likely(entry->peer))
+			*peer = entry->peer;
+		else
+			entry = NULL;
+	}
+	rcu_read_unlock_bh();
+	return entry;
+}
diff --git a/drivers/net/wireguard/hashtables.h b/drivers/net/wireguard/hashtables.h
new file mode 100644
index 000000000000..62858c554283
--- /dev/null
+++ b/drivers/net/wireguard/hashtables.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_HASHTABLES_H
+#define _WG_HASHTABLES_H
+
+#include "messages.h"
+
+#include <linux/hashtable.h>
+#include <linux/mutex.h>
+#include <linux/siphash.h>
+
+struct wireguard_peer;
+
+struct pubkey_hashtable {
+	/* TODO: move to rhashtable */
+	DECLARE_HASHTABLE(hashtable, 11);
+	siphash_key_t key;
+	struct mutex lock;
+};
+
+void pubkey_hashtable_init(struct pubkey_hashtable *table);
+void pubkey_hashtable_add(struct pubkey_hashtable *table,
+			  struct wireguard_peer *peer);
+void pubkey_hashtable_remove(struct pubkey_hashtable *table,
+			     struct wireguard_peer *peer);
+struct wireguard_peer *
+pubkey_hashtable_lookup(struct pubkey_hashtable *table,
+			const u8 pubkey[NOISE_PUBLIC_KEY_LEN]);
+
+struct index_hashtable {
+	/* TODO: move to rhashtable */
+	DECLARE_HASHTABLE(hashtable, 13);
+	spinlock_t lock;
+};
+
+enum index_hashtable_type {
+	INDEX_HASHTABLE_HANDSHAKE = 1U << 0,
+	INDEX_HASHTABLE_KEYPAIR = 1U << 1
+};
+
+struct index_hashtable_entry {
+	struct wireguard_peer *peer;
+	struct hlist_node index_hash;
+	enum index_hashtable_type type;
+	__le32 index;
+};
+void index_hashtable_init(struct index_hashtable *table);
+__le32 index_hashtable_insert(struct index_hashtable *table,
+			      struct index_hashtable_entry *entry);
+bool index_hashtable_replace(struct index_hashtable *table,
+			     struct index_hashtable_entry *old,
+			     struct index_hashtable_entry *new);
+void index_hashtable_remove(struct index_hashtable *table,
+			    struct index_hashtable_entry *entry);
+struct index_hashtable_entry *
+index_hashtable_lookup(struct index_hashtable *table,
+		       const enum index_hashtable_type type_mask,
+		       const __le32 index, struct wireguard_peer **peer);
+
+#endif /* _WG_HASHTABLES_H */
diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c
new file mode 100644
index 000000000000..45f999041660
--- /dev/null
+++ b/drivers/net/wireguard/main.c
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "version.h"
+#include "device.h"
+#include "noise.h"
+#include "queueing.h"
+#include "ratelimiter.h"
+#include "netlink.h"
+
+#include <uapi/linux/wireguard.h>
+
+#include <linux/version.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/genetlink.h>
+#include <net/rtnetlink.h>
+
+static int __init mod_init(void)
+{
+	int ret;
+
+#ifdef DEBUG
+	if (!allowedips_selftest() || !packet_counter_selftest() ||
+	    !ratelimiter_selftest())
+		return -ENOTRECOVERABLE;
+#endif
+	noise_init();
+
+	ret = device_init();
+	if (ret < 0)
+		goto err_device;
+
+	ret = genetlink_init();
+	if (ret < 0)
+		goto err_netlink;
+
+	pr_info("WireGuard " WIREGUARD_VERSION " loaded. See www.wireguard.com for information.\n");
+	pr_info("Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.\n");
+
+	return 0;
+
+err_netlink:
+	device_uninit();
+err_device:
+	return ret;
+}
+
+static void __exit mod_exit(void)
+{
+	genetlink_uninit();
+	device_uninit();
+	pr_debug("WireGuard unloaded\n");
+}
+
+module_init(mod_init);
+module_exit(mod_exit);
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("Fast, modern, and secure VPN tunnel");
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_VERSION(WIREGUARD_VERSION);
+MODULE_ALIAS_RTNL_LINK(KBUILD_MODNAME);
+MODULE_ALIAS_GENL_FAMILY(WG_GENL_NAME);
diff --git a/drivers/net/wireguard/messages.h b/drivers/net/wireguard/messages.h
new file mode 100644
index 000000000000..131e1c44049d
--- /dev/null
+++ b/drivers/net/wireguard/messages.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_MESSAGES_H
+#define _WG_MESSAGES_H
+
+#include <zinc/curve25519.h>
+#include <zinc/chacha20poly1305.h>
+#include <zinc/blake2s.h>
+
+#include <linux/kernel.h>
+#include <linux/param.h>
+#include <linux/skbuff.h>
+
+enum noise_lengths {
+	NOISE_PUBLIC_KEY_LEN = CURVE25519_POINT_SIZE,
+	NOISE_SYMMETRIC_KEY_LEN = CHACHA20POLY1305_KEYLEN,
+	NOISE_TIMESTAMP_LEN = sizeof(u64) + sizeof(u32),
+	NOISE_AUTHTAG_LEN = CHACHA20POLY1305_AUTHTAGLEN,
+	NOISE_HASH_LEN = BLAKE2S_OUTBYTES
+};
+
+#define noise_encrypted_len(plain_len) (plain_len + NOISE_AUTHTAG_LEN)
+
+enum cookie_values {
+	COOKIE_SECRET_MAX_AGE = 2 * 60,
+	COOKIE_SECRET_LATENCY = 5,
+	COOKIE_NONCE_LEN = XCHACHA20POLY1305_NONCELEN,
+	COOKIE_LEN = 16
+};
+
+enum counter_values {
+	COUNTER_BITS_TOTAL = 2048,
+	COUNTER_REDUNDANT_BITS = BITS_PER_LONG,
+	COUNTER_WINDOW_SIZE = COUNTER_BITS_TOTAL - COUNTER_REDUNDANT_BITS
+};
+
+enum limits {
+	REKEY_AFTER_MESSAGES = U64_MAX - 0xffff,
+	REJECT_AFTER_MESSAGES = U64_MAX - COUNTER_WINDOW_SIZE - 1,
+	REKEY_TIMEOUT = 5,
+	REKEY_TIMEOUT_JITTER_MAX_JIFFIES = HZ / 3,
+	REKEY_AFTER_TIME = 120,
+	REJECT_AFTER_TIME = 180,
+	INITIATIONS_PER_SECOND = 50,
+	MAX_PEERS_PER_DEVICE = 1U << 20,
+	KEEPALIVE_TIMEOUT = 10,
+	MAX_TIMER_HANDSHAKES = 90 / REKEY_TIMEOUT,
+	MAX_QUEUED_INCOMING_HANDSHAKES = 4096, /* TODO: replace this with DQL */
+	MAX_STAGED_PACKETS = 128,
+	MAX_QUEUED_PACKETS = 1024 /* TODO: replace this with DQL */
+};
+
+enum message_type {
+	MESSAGE_INVALID = 0,
+	MESSAGE_HANDSHAKE_INITIATION = 1,
+	MESSAGE_HANDSHAKE_RESPONSE = 2,
+	MESSAGE_HANDSHAKE_COOKIE = 3,
+	MESSAGE_DATA = 4
+};
+
+struct message_header {
+	/* The actual layout of this that we want is:
+	 * u8 type
+	 * u8 reserved_zero[3]
+	 *
+	 * But it turns out that by encoding this as little endian,
+	 * we achieve the same thing, and it makes checking faster.
+	 */
+	__le32 type;
+};
+
+struct message_macs {
+	u8 mac1[COOKIE_LEN];
+	u8 mac2[COOKIE_LEN];
+};
+
+struct message_handshake_initiation {
+	struct message_header header;
+	__le32 sender_index;
+	u8 unencrypted_ephemeral[NOISE_PUBLIC_KEY_LEN];
+	u8 encrypted_static[noise_encrypted_len(NOISE_PUBLIC_KEY_LEN)];
+	u8 encrypted_timestamp[noise_encrypted_len(NOISE_TIMESTAMP_LEN)];
+	struct message_macs macs;
+};
+
+struct message_handshake_response {
+	struct message_header header;
+	__le32 sender_index;
+	__le32 receiver_index;
+	u8 unencrypted_ephemeral[NOISE_PUBLIC_KEY_LEN];
+	u8 encrypted_nothing[noise_encrypted_len(0)];
+	struct message_macs macs;
+};
+
+struct message_handshake_cookie {
+	struct message_header header;
+	__le32 receiver_index;
+	u8 nonce[COOKIE_NONCE_LEN];
+	u8 encrypted_cookie[noise_encrypted_len(COOKIE_LEN)];
+};
+
+struct message_data {
+	struct message_header header;
+	__le32 key_idx;
+	__le64 counter;
+	u8 encrypted_data[];
+};
+
+#define message_data_len(plain_len)                                            \
+	(noise_encrypted_len(plain_len) + sizeof(struct message_data))
+
+enum message_alignments {
+	MESSAGE_PADDING_MULTIPLE = 16,
+	MESSAGE_MINIMUM_LENGTH = message_data_len(0)
+};
+
+#define SKB_HEADER_LEN                                                         \
+	(max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) +                   \
+	 sizeof(struct udphdr) + NET_SKB_PAD)
+#define DATA_PACKET_HEAD_ROOM                                                  \
+	ALIGN(sizeof(struct message_data) + SKB_HEADER_LEN, 4)
+
+enum { HANDSHAKE_DSCP = 0x88 /* AF41, plus 00 ECN */ };
+
+#endif /* _WG_MESSAGES_H */
diff --git a/drivers/net/wireguard/netlink.c b/drivers/net/wireguard/netlink.c
new file mode 100644
index 000000000000..c17049958426
--- /dev/null
+++ b/drivers/net/wireguard/netlink.c
@@ -0,0 +1,605 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "netlink.h"
+#include "device.h"
+#include "peer.h"
+#include "socket.h"
+#include "queueing.h"
+#include "messages.h"
+
+#include <uapi/linux/wireguard.h>
+
+#include <linux/if.h>
+#include <net/genetlink.h>
+#include <net/sock.h>
+
+static struct genl_family genl_family;
+
+static const struct nla_policy device_policy[WGDEVICE_A_MAX + 1] = {
+	[WGDEVICE_A_IFINDEX]		= { .type = NLA_U32 },
+	[WGDEVICE_A_IFNAME]		= { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
+	[WGDEVICE_A_PRIVATE_KEY]	= { .len = NOISE_PUBLIC_KEY_LEN },
+	[WGDEVICE_A_PUBLIC_KEY]		= { .len = NOISE_PUBLIC_KEY_LEN },
+	[WGDEVICE_A_FLAGS]		= { .type = NLA_U32 },
+	[WGDEVICE_A_LISTEN_PORT]	= { .type = NLA_U16 },
+	[WGDEVICE_A_FWMARK]		= { .type = NLA_U32 },
+	[WGDEVICE_A_PEERS]		= { .type = NLA_NESTED }
+};
+
+static const struct nla_policy peer_policy[WGPEER_A_MAX + 1] = {
+	[WGPEER_A_PUBLIC_KEY]				= { .len = NOISE_PUBLIC_KEY_LEN },
+	[WGPEER_A_PRESHARED_KEY]			= { .len = NOISE_SYMMETRIC_KEY_LEN },
+	[WGPEER_A_FLAGS]				= { .type = NLA_U32 },
+	[WGPEER_A_ENDPOINT]				= { .len = sizeof(struct sockaddr) },
+	[WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL]	= { .type = NLA_U16 },
+	[WGPEER_A_LAST_HANDSHAKE_TIME]			= { .len = sizeof(struct timespec) },
+	[WGPEER_A_RX_BYTES]				= { .type = NLA_U64 },
+	[WGPEER_A_TX_BYTES]				= { .type = NLA_U64 },
+	[WGPEER_A_ALLOWEDIPS]				= { .type = NLA_NESTED },
+	[WGPEER_A_PROTOCOL_VERSION]			= { .type = NLA_U32 }
+};
+
+static const struct nla_policy allowedip_policy[WGALLOWEDIP_A_MAX + 1] = {
+	[WGALLOWEDIP_A_FAMILY]		= { .type = NLA_U16 },
+	[WGALLOWEDIP_A_IPADDR]		= { .len = sizeof(struct in_addr) },
+	[WGALLOWEDIP_A_CIDR_MASK]	= { .type = NLA_U8 }
+};
+
+static struct wireguard_device *lookup_interface(struct nlattr **attrs,
+						 struct sk_buff *skb)
+{
+	struct net_device *dev = NULL;
+
+	if (!attrs[WGDEVICE_A_IFINDEX] == !attrs[WGDEVICE_A_IFNAME])
+		return ERR_PTR(-EBADR);
+	if (attrs[WGDEVICE_A_IFINDEX])
+		dev = dev_get_by_index(sock_net(skb->sk),
+				       nla_get_u32(attrs[WGDEVICE_A_IFINDEX]));
+	else if (attrs[WGDEVICE_A_IFNAME])
+		dev = dev_get_by_name(sock_net(skb->sk),
+				      nla_data(attrs[WGDEVICE_A_IFNAME]));
+	if (!dev)
+		return ERR_PTR(-ENODEV);
+	if (!dev->rtnl_link_ops || !dev->rtnl_link_ops->kind ||
+	    strcmp(dev->rtnl_link_ops->kind, KBUILD_MODNAME)) {
+		dev_put(dev);
+		return ERR_PTR(-EOPNOTSUPP);
+	}
+	return netdev_priv(dev);
+}
+
+struct allowedips_ctx {
+	struct sk_buff *skb;
+	unsigned int i;
+};
+
+static int get_allowedips(void *ctx, const u8 *ip, u8 cidr, int family)
+{
+	struct allowedips_ctx *actx = ctx;
+	struct nlattr *allowedip_nest;
+
+	allowedip_nest = nla_nest_start(actx->skb, actx->i++);
+	if (!allowedip_nest)
+		return -EMSGSIZE;
+
+	if (nla_put_u8(actx->skb, WGALLOWEDIP_A_CIDR_MASK, cidr) ||
+	    nla_put_u16(actx->skb, WGALLOWEDIP_A_FAMILY, family) ||
+	    nla_put(actx->skb, WGALLOWEDIP_A_IPADDR, family == AF_INET6 ?
+		    sizeof(struct in6_addr) : sizeof(struct in_addr), ip)) {
+		nla_nest_cancel(actx->skb, allowedip_nest);
+		return -EMSGSIZE;
+	}
+
+	nla_nest_end(actx->skb, allowedip_nest);
+	return 0;
+}
+
+static int get_peer(struct wireguard_peer *peer, unsigned int index,
+		    struct allowedips_cursor *rt_cursor, struct sk_buff *skb)
+{
+	struct nlattr *allowedips_nest, *peer_nest = nla_nest_start(skb, index);
+	struct allowedips_ctx ctx = { .skb = skb };
+	bool fail;
+
+	if (!peer_nest)
+		return -EMSGSIZE;
+
+	down_read(&peer->handshake.lock);
+	fail = nla_put(skb, WGPEER_A_PUBLIC_KEY, NOISE_PUBLIC_KEY_LEN,
+		       peer->handshake.remote_static);
+	up_read(&peer->handshake.lock);
+	if (fail)
+		goto err;
+
+	if (!rt_cursor->seq) {
+		down_read(&peer->handshake.lock);
+		fail = nla_put(skb, WGPEER_A_PRESHARED_KEY,
+			       NOISE_SYMMETRIC_KEY_LEN,
+			       peer->handshake.preshared_key);
+		up_read(&peer->handshake.lock);
+		if (fail)
+			goto err;
+
+		if (nla_put(skb, WGPEER_A_LAST_HANDSHAKE_TIME,
+			    sizeof(peer->walltime_last_handshake),
+			    &peer->walltime_last_handshake) ||
+		    nla_put_u16(skb, WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL,
+				peer->persistent_keepalive_interval) ||
+		    nla_put_u64_64bit(skb, WGPEER_A_TX_BYTES, peer->tx_bytes,
+				      WGPEER_A_UNSPEC) ||
+		    nla_put_u64_64bit(skb, WGPEER_A_RX_BYTES, peer->rx_bytes,
+				      WGPEER_A_UNSPEC) ||
+		    nla_put_u32(skb, WGPEER_A_PROTOCOL_VERSION, 1))
+			goto err;
+
+		read_lock_bh(&peer->endpoint_lock);
+		if (peer->endpoint.addr.sa_family == AF_INET)
+			fail = nla_put(skb, WGPEER_A_ENDPOINT,
+				       sizeof(peer->endpoint.addr4),
+				       &peer->endpoint.addr4);
+		else if (peer->endpoint.addr.sa_family == AF_INET6)
+			fail = nla_put(skb, WGPEER_A_ENDPOINT,
+				       sizeof(peer->endpoint.addr6),
+				       &peer->endpoint.addr6);
+		read_unlock_bh(&peer->endpoint_lock);
+		if (fail)
+			goto err;
+	}
+
+	allowedips_nest = nla_nest_start(skb, WGPEER_A_ALLOWEDIPS);
+	if (!allowedips_nest)
+		goto err;
+	if (allowedips_walk_by_peer(&peer->device->peer_allowedips, rt_cursor,
+				    peer, get_allowedips, &ctx,
+				    &peer->device->device_update_lock)) {
+		nla_nest_end(skb, allowedips_nest);
+		nla_nest_end(skb, peer_nest);
+		return -EMSGSIZE;
+	}
+	memset(rt_cursor, 0, sizeof(*rt_cursor));
+	nla_nest_end(skb, allowedips_nest);
+	nla_nest_end(skb, peer_nest);
+	return 0;
+err:
+	nla_nest_cancel(skb, peer_nest);
+	return -EMSGSIZE;
+}
+
+static int get_device_start(struct netlink_callback *cb)
+{
+	struct nlattr **attrs = genl_family_attrbuf(&genl_family);
+	int ret = nlmsg_parse(cb->nlh, GENL_HDRLEN + genl_family.hdrsize, attrs,
+			      genl_family.maxattr, device_policy, NULL);
+	struct wireguard_device *wg;
+
+	if (ret < 0)
+		return ret;
+	cb->args[2] = (long)kzalloc(sizeof(struct allowedips_cursor),
+				    GFP_KERNEL);
+	if (unlikely(!cb->args[2]))
+		return -ENOMEM;
+	wg = lookup_interface(attrs, cb->skb);
+	if (IS_ERR(wg)) {
+		kfree((void *)cb->args[2]);
+		cb->args[2] = 0;
+		return PTR_ERR(wg);
+	}
+	cb->args[0] = (long)wg;
+	return 0;
+}
+
+static int get_device_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	struct wireguard_peer *peer, *next_peer_cursor, *last_peer_cursor;
+	struct allowedips_cursor *rt_cursor;
+	struct wireguard_device *wg;
+	unsigned int peer_idx = 0;
+	struct nlattr *peers_nest;
+	bool done = true;
+	void *hdr;
+	int ret = -EMSGSIZE;
+
+	wg = (struct wireguard_device *)cb->args[0];
+	next_peer_cursor = (struct wireguard_peer *)cb->args[1];
+	last_peer_cursor = (struct wireguard_peer *)cb->args[1];
+	rt_cursor = (struct allowedips_cursor *)cb->args[2];
+
+	rtnl_lock();
+	mutex_lock(&wg->device_update_lock);
+	cb->seq = wg->device_update_gen;
+
+	hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
+			  &genl_family, NLM_F_MULTI, WG_CMD_GET_DEVICE);
+	if (!hdr)
+		goto out;
+	genl_dump_check_consistent(cb, hdr);
+
+	if (!last_peer_cursor) {
+		if (nla_put_u16(skb, WGDEVICE_A_LISTEN_PORT,
+				wg->incoming_port) ||
+		    nla_put_u32(skb, WGDEVICE_A_FWMARK, wg->fwmark) ||
+		    nla_put_u32(skb, WGDEVICE_A_IFINDEX, wg->dev->ifindex) ||
+		    nla_put_string(skb, WGDEVICE_A_IFNAME, wg->dev->name))
+			goto out;
+
+		down_read(&wg->static_identity.lock);
+		if (wg->static_identity.has_identity) {
+			if (nla_put(skb, WGDEVICE_A_PRIVATE_KEY,
+				    NOISE_PUBLIC_KEY_LEN,
+				    wg->static_identity.static_private) ||
+			    nla_put(skb, WGDEVICE_A_PUBLIC_KEY,
+				    NOISE_PUBLIC_KEY_LEN,
+				    wg->static_identity.static_public)) {
+				up_read(&wg->static_identity.lock);
+				goto out;
+			}
+		}
+		up_read(&wg->static_identity.lock);
+	}
+
+	peers_nest = nla_nest_start(skb, WGDEVICE_A_PEERS);
+	if (!peers_nest)
+		goto out;
+	ret = 0;
+	/* If the last cursor was removed via list_del_init in peer_remove, then
+	 * we just treat this the same as there being no more peers left. The
+	 * reason is that seq_nr should indicate to userspace that this isn't a
+	 * coherent dump anyway, so they'll try again.
+	 */
+	if (list_empty(&wg->peer_list) ||
+	    (last_peer_cursor && list_empty(&last_peer_cursor->peer_list))) {
+		nla_nest_cancel(skb, peers_nest);
+		goto out;
+	}
+	lockdep_assert_held(&wg->device_update_lock);
+	peer = list_prepare_entry(last_peer_cursor, &wg->peer_list, peer_list);
+	list_for_each_entry_continue (peer, &wg->peer_list, peer_list) {
+		if (get_peer(peer, peer_idx++, rt_cursor, skb)) {
+			done = false;
+			break;
+		}
+		next_peer_cursor = peer;
+	}
+	nla_nest_end(skb, peers_nest);
+
+out:
+	if (!ret && !done && next_peer_cursor)
+		peer_get(next_peer_cursor);
+	peer_put(last_peer_cursor);
+	mutex_unlock(&wg->device_update_lock);
+	rtnl_unlock();
+
+	if (ret) {
+		genlmsg_cancel(skb, hdr);
+		return ret;
+	}
+	genlmsg_end(skb, hdr);
+	if (done) {
+		cb->args[1] = 0;
+		return 0;
+	}
+	cb->args[1] = (long)next_peer_cursor;
+	return skb->len;
+
+	/* At this point, we can't really deal ourselves with safely zeroing out
+	 * the private key material after usage. This will need an additional API
+	 * in the kernel for marking skbs as zero_on_free.
+	 */
+}
+
+static int get_device_done(struct netlink_callback *cb)
+{
+	struct wireguard_device *wg = (struct wireguard_device *)cb->args[0];
+	struct wireguard_peer *peer = (struct wireguard_peer *)cb->args[1];
+	struct allowedips_cursor *rt_cursor =
+		(struct allowedips_cursor *)cb->args[2];
+
+	if (wg)
+		dev_put(wg->dev);
+	kfree(rt_cursor);
+	peer_put(peer);
+	return 0;
+}
+
+static int set_port(struct wireguard_device *wg, u16 port)
+{
+	struct wireguard_peer *peer;
+
+	if (wg->incoming_port == port)
+		return 0;
+	list_for_each_entry (peer, &wg->peer_list, peer_list)
+		socket_clear_peer_endpoint_src(peer);
+	if (!netif_running(wg->dev)) {
+		wg->incoming_port = port;
+		return 0;
+	}
+	return socket_init(wg, port);
+}
+
+static int set_allowedip(struct wireguard_peer *peer, struct nlattr **attrs)
+{
+	int ret = -EINVAL;
+	u16 family;
+	u8 cidr;
+
+	if (!attrs[WGALLOWEDIP_A_FAMILY] || !attrs[WGALLOWEDIP_A_IPADDR] ||
+	    !attrs[WGALLOWEDIP_A_CIDR_MASK])
+		return ret;
+	family = nla_get_u16(attrs[WGALLOWEDIP_A_FAMILY]);
+	cidr = nla_get_u8(attrs[WGALLOWEDIP_A_CIDR_MASK]);
+
+	if (family == AF_INET && cidr <= 32 &&
+	    nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr))
+		ret = allowedips_insert_v4(
+			&peer->device->peer_allowedips,
+			nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer,
+			&peer->device->device_update_lock);
+	else if (family == AF_INET6 && cidr <= 128 &&
+		 nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in6_addr))
+		ret = allowedips_insert_v6(
+			&peer->device->peer_allowedips,
+			nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer,
+			&peer->device->device_update_lock);
+
+	return ret;
+}
+
+static int set_peer(struct wireguard_device *wg, struct nlattr **attrs)
+{
+	int ret;
+	u32 flags = 0;
+	struct wireguard_peer *peer = NULL;
+	u8 *public_key = NULL, *preshared_key = NULL;
+
+	ret = -EINVAL;
+	if (attrs[WGPEER_A_PUBLIC_KEY] &&
+	    nla_len(attrs[WGPEER_A_PUBLIC_KEY]) == NOISE_PUBLIC_KEY_LEN)
+		public_key = nla_data(attrs[WGPEER_A_PUBLIC_KEY]);
+	else
+		goto out;
+	if (attrs[WGPEER_A_PRESHARED_KEY] &&
+	    nla_len(attrs[WGPEER_A_PRESHARED_KEY]) == NOISE_SYMMETRIC_KEY_LEN)
+		preshared_key = nla_data(attrs[WGPEER_A_PRESHARED_KEY]);
+	if (attrs[WGPEER_A_FLAGS])
+		flags = nla_get_u32(attrs[WGPEER_A_FLAGS]);
+
+	ret = -EPFNOSUPPORT;
+	if (attrs[WGPEER_A_PROTOCOL_VERSION]) {
+		if (nla_get_u32(attrs[WGPEER_A_PROTOCOL_VERSION]) != 1)
+			goto out;
+	}
+
+	peer = pubkey_hashtable_lookup(&wg->peer_hashtable,
+				       nla_data(attrs[WGPEER_A_PUBLIC_KEY]));
+	if (!peer) { /* Peer doesn't exist yet. Add a new one. */
+		ret = -ENODEV;
+		if (flags & WGPEER_F_REMOVE_ME)
+			goto out; /* Tried to remove a non-existing peer. */
+
+		down_read(&wg->static_identity.lock);
+		if (wg->static_identity.has_identity &&
+		    !memcmp(nla_data(attrs[WGPEER_A_PUBLIC_KEY]),
+			    wg->static_identity.static_public,
+			    NOISE_PUBLIC_KEY_LEN)) {
+			/* We silently ignore peers that have the same public
+			 * key as the device. The reason we do it silently is
+			 * that we'd like for people to be able to reuse the
+			 * same set of API calls across peers.
+			 */
+			up_read(&wg->static_identity.lock);
+			ret = 0;
+			goto out;
+		}
+		up_read(&wg->static_identity.lock);
+
+		ret = -ENOMEM;
+		peer = peer_create(wg, public_key, preshared_key);
+		if (!peer)
+			goto out;
+		/* Take additional reference, as though we've just been
+		 * looked up.
+		 */
+		peer_get(peer);
+	}
+
+	ret = 0;
+	if (flags & WGPEER_F_REMOVE_ME) {
+		peer_remove(peer);
+		goto out;
+	}
+
+	if (preshared_key) {
+		down_write(&peer->handshake.lock);
+		memcpy(&peer->handshake.preshared_key, preshared_key,
+		       NOISE_SYMMETRIC_KEY_LEN);
+		up_write(&peer->handshake.lock);
+	}
+
+	if (attrs[WGPEER_A_ENDPOINT]) {
+		struct sockaddr *addr = nla_data(attrs[WGPEER_A_ENDPOINT]);
+		size_t len = nla_len(attrs[WGPEER_A_ENDPOINT]);
+
+		if ((len == sizeof(struct sockaddr_in) &&
+		     addr->sa_family == AF_INET) ||
+		    (len == sizeof(struct sockaddr_in6) &&
+		     addr->sa_family == AF_INET6)) {
+			struct endpoint endpoint = { { { 0 } } };
+
+			memcpy(&endpoint.addr, addr, len);
+			socket_set_peer_endpoint(peer, &endpoint);
+		}
+	}
+
+	if (flags & WGPEER_F_REPLACE_ALLOWEDIPS)
+		allowedips_remove_by_peer(&wg->peer_allowedips, peer,
+					  &wg->device_update_lock);
+
+	if (attrs[WGPEER_A_ALLOWEDIPS]) {
+		struct nlattr *attr, *allowedip[WGALLOWEDIP_A_MAX + 1];
+		int rem;
+
+		nla_for_each_nested (attr, attrs[WGPEER_A_ALLOWEDIPS], rem) {
+			ret = nla_parse_nested(allowedip, WGALLOWEDIP_A_MAX,
+					       attr, allowedip_policy, NULL);
+			if (ret < 0)
+				goto out;
+			ret = set_allowedip(peer, allowedip);
+			if (ret < 0)
+				goto out;
+		}
+	}
+
+	if (attrs[WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL]) {
+		const u16 persistent_keepalive_interval = nla_get_u16(
+				attrs[WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL]);
+		const bool send_keepalive =
+			!peer->persistent_keepalive_interval &&
+			persistent_keepalive_interval &&
+			netif_running(wg->dev);
+
+		peer->persistent_keepalive_interval = persistent_keepalive_interval;
+		if (send_keepalive)
+			packet_send_keepalive(peer);
+	}
+
+	if (netif_running(wg->dev))
+		packet_send_staged_packets(peer);
+
+out:
+	peer_put(peer);
+	if (attrs[WGPEER_A_PRESHARED_KEY])
+		memzero_explicit(nla_data(attrs[WGPEER_A_PRESHARED_KEY]),
+				 nla_len(attrs[WGPEER_A_PRESHARED_KEY]));
+	return ret;
+}
+
+static int set_device(struct sk_buff *skb, struct genl_info *info)
+{
+	int ret;
+	struct wireguard_device *wg = lookup_interface(info->attrs, skb);
+
+	if (IS_ERR(wg)) {
+		ret = PTR_ERR(wg);
+		goto out_nodev;
+	}
+
+	rtnl_lock();
+	mutex_lock(&wg->device_update_lock);
+	++wg->device_update_gen;
+
+	if (info->attrs[WGDEVICE_A_FWMARK]) {
+		struct wireguard_peer *peer;
+
+		wg->fwmark = nla_get_u32(info->attrs[WGDEVICE_A_FWMARK]);
+		list_for_each_entry (peer, &wg->peer_list, peer_list)
+			socket_clear_peer_endpoint_src(peer);
+	}
+
+	if (info->attrs[WGDEVICE_A_LISTEN_PORT]) {
+		ret = set_port(
+			wg, nla_get_u16(info->attrs[WGDEVICE_A_LISTEN_PORT]));
+		if (ret)
+			goto out;
+	}
+
+	if (info->attrs[WGDEVICE_A_FLAGS] &&
+	    nla_get_u32(info->attrs[WGDEVICE_A_FLAGS]) &
+		    WGDEVICE_F_REPLACE_PEERS)
+		peer_remove_all(wg);
+
+	if (info->attrs[WGDEVICE_A_PRIVATE_KEY] &&
+	    nla_len(info->attrs[WGDEVICE_A_PRIVATE_KEY]) ==
+		    NOISE_PUBLIC_KEY_LEN) {
+		u8 *private_key = nla_data(info->attrs[WGDEVICE_A_PRIVATE_KEY]);
+		u8 public_key[NOISE_PUBLIC_KEY_LEN];
+		struct wireguard_peer *peer, *temp;
+
+		/* We remove before setting, to prevent race, which means doing
+		 * two 25519-genpub ops.
+		 */
+		if (curve25519_generate_public(public_key, private_key)) {
+			peer = pubkey_hashtable_lookup(&wg->peer_hashtable,
+						       public_key);
+			if (peer) {
+				peer_put(peer);
+				peer_remove(peer);
+			}
+		}
+
+		down_write(&wg->static_identity.lock);
+		noise_set_static_identity_private_key(&wg->static_identity,
+						      private_key);
+		list_for_each_entry_safe (peer, temp, &wg->peer_list,
+					  peer_list) {
+			if (!noise_precompute_static_static(peer))
+				peer_remove(peer);
+		}
+		cookie_checker_precompute_device_keys(&wg->cookie_checker);
+		up_write(&wg->static_identity.lock);
+	}
+
+	if (info->attrs[WGDEVICE_A_PEERS]) {
+		int rem;
+		struct nlattr *attr, *peer[WGPEER_A_MAX + 1];
+
+		nla_for_each_nested (attr, info->attrs[WGDEVICE_A_PEERS], rem) {
+			ret = nla_parse_nested(peer, WGPEER_A_MAX, attr,
+					       peer_policy, NULL);
+			if (ret < 0)
+				goto out;
+			ret = set_peer(wg, peer);
+			if (ret < 0)
+				goto out;
+		}
+	}
+	ret = 0;
+
+out:
+	mutex_unlock(&wg->device_update_lock);
+	rtnl_unlock();
+	dev_put(wg->dev);
+out_nodev:
+	if (info->attrs[WGDEVICE_A_PRIVATE_KEY])
+		memzero_explicit(nla_data(info->attrs[WGDEVICE_A_PRIVATE_KEY]),
+				 nla_len(info->attrs[WGDEVICE_A_PRIVATE_KEY]));
+	return ret;
+}
+
+static const struct genl_ops genl_ops[] = {
+	{
+		.cmd = WG_CMD_GET_DEVICE,
+		.start = get_device_start,
+		.dumpit = get_device_dump,
+		.done = get_device_done,
+		.policy = device_policy,
+		.flags = GENL_UNS_ADMIN_PERM
+	}, {
+		.cmd = WG_CMD_SET_DEVICE,
+		.doit = set_device,
+		.policy = device_policy,
+		.flags = GENL_UNS_ADMIN_PERM
+	}
+};
+
+static struct genl_family genl_family __ro_after_init = {
+	.ops = genl_ops,
+	.n_ops = ARRAY_SIZE(genl_ops),
+	.name = WG_GENL_NAME,
+	.version = WG_GENL_VERSION,
+	.maxattr = WGDEVICE_A_MAX,
+	.module = THIS_MODULE,
+	.netnsok = true
+};
+
+int __init genetlink_init(void)
+{
+	return genl_register_family(&genl_family);
+}
+
+void __exit genetlink_uninit(void)
+{
+	genl_unregister_family(&genl_family);
+}
diff --git a/drivers/net/wireguard/netlink.h b/drivers/net/wireguard/netlink.h
new file mode 100644
index 000000000000..c1cd9b019bd1
--- /dev/null
+++ b/drivers/net/wireguard/netlink.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_NETLINK_H
+#define _WG_NETLINK_H
+
+int genetlink_init(void);
+void genetlink_uninit(void);
+
+#endif /* _WG_NETLINK_H */
diff --git a/drivers/net/wireguard/noise.c b/drivers/net/wireguard/noise.c
new file mode 100644
index 000000000000..9bd2d7ef869a
--- /dev/null
+++ b/drivers/net/wireguard/noise.c
@@ -0,0 +1,784 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "noise.h"
+#include "device.h"
+#include "peer.h"
+#include "messages.h"
+#include "queueing.h"
+#include "hashtables.h"
+
+#include <linux/rcupdate.h>
+#include <linux/slab.h>
+#include <linux/bitmap.h>
+#include <linux/scatterlist.h>
+#include <linux/highmem.h>
+#include <crypto/algapi.h>
+
+/* This implements Noise_IKpsk2:
+ *
+ * <- s
+ * ******
+ * -> e, es, s, ss, {t}
+ * <- e, ee, se, psk, {}
+ */
+
+static const u8 handshake_name[37] = "Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s";
+static const u8 identifier_name[34] = "WireGuard v1 zx2c4 Jason@zx2c4.com";
+static u8 handshake_init_hash[NOISE_HASH_LEN] __ro_after_init;
+static u8 handshake_init_chaining_key[NOISE_HASH_LEN] __ro_after_init;
+static atomic64_t keypair_counter = ATOMIC64_INIT(0);
+
+void __init noise_init(void)
+{
+	struct blake2s_state blake;
+
+	blake2s(handshake_init_chaining_key, handshake_name, NULL,
+		NOISE_HASH_LEN, sizeof(handshake_name), 0);
+	blake2s_init(&blake, NOISE_HASH_LEN);
+	blake2s_update(&blake, handshake_init_chaining_key, NOISE_HASH_LEN);
+	blake2s_update(&blake, identifier_name, sizeof(identifier_name));
+	blake2s_final(&blake, handshake_init_hash, NOISE_HASH_LEN);
+}
+
+/* Must hold peer->handshake.static_identity->lock */
+bool noise_precompute_static_static(struct wireguard_peer *peer)
+{
+	bool ret = true;
+
+	down_write(&peer->handshake.lock);
+	if (peer->handshake.static_identity->has_identity)
+		ret = curve25519(
+			peer->handshake.precomputed_static_static,
+			peer->handshake.static_identity->static_private,
+			peer->handshake.remote_static);
+	else
+		memset(peer->handshake.precomputed_static_static, 0,
+		       NOISE_PUBLIC_KEY_LEN);
+	up_write(&peer->handshake.lock);
+	return ret;
+}
+
+bool noise_handshake_init(struct noise_handshake *handshake,
+			  struct noise_static_identity *static_identity,
+			  const u8 peer_public_key[NOISE_PUBLIC_KEY_LEN],
+			  const u8 peer_preshared_key[NOISE_SYMMETRIC_KEY_LEN],
+			  struct wireguard_peer *peer)
+{
+	memset(handshake, 0, sizeof(*handshake));
+	init_rwsem(&handshake->lock);
+	handshake->entry.type = INDEX_HASHTABLE_HANDSHAKE;
+	handshake->entry.peer = peer;
+	memcpy(handshake->remote_static, peer_public_key, NOISE_PUBLIC_KEY_LEN);
+	if (peer_preshared_key)
+		memcpy(handshake->preshared_key, peer_preshared_key,
+		       NOISE_SYMMETRIC_KEY_LEN);
+	handshake->static_identity = static_identity;
+	handshake->state = HANDSHAKE_ZEROED;
+	return noise_precompute_static_static(peer);
+}
+
+static void handshake_zero(struct noise_handshake *handshake)
+{
+	memset(&handshake->ephemeral_private, 0, NOISE_PUBLIC_KEY_LEN);
+	memset(&handshake->remote_ephemeral, 0, NOISE_PUBLIC_KEY_LEN);
+	memset(&handshake->hash, 0, NOISE_HASH_LEN);
+	memset(&handshake->chaining_key, 0, NOISE_HASH_LEN);
+	handshake->remote_index = 0;
+	handshake->state = HANDSHAKE_ZEROED;
+}
+
+void noise_handshake_clear(struct noise_handshake *handshake)
+{
+	index_hashtable_remove(&handshake->entry.peer->device->index_hashtable,
+			       &handshake->entry);
+	down_write(&handshake->lock);
+	handshake_zero(handshake);
+	up_write(&handshake->lock);
+	index_hashtable_remove(&handshake->entry.peer->device->index_hashtable,
+			       &handshake->entry);
+}
+
+static struct noise_keypair *keypair_create(struct wireguard_peer *peer)
+{
+	struct noise_keypair *keypair = kzalloc(sizeof(*keypair), GFP_KERNEL);
+
+	if (unlikely(!keypair))
+		return NULL;
+	keypair->internal_id = atomic64_inc_return(&keypair_counter);
+	keypair->entry.type = INDEX_HASHTABLE_KEYPAIR;
+	keypair->entry.peer = peer;
+	kref_init(&keypair->refcount);
+	return keypair;
+}
+
+static void keypair_free_rcu(struct rcu_head *rcu)
+{
+	kzfree(container_of(rcu, struct noise_keypair, rcu));
+}
+
+static void keypair_free_kref(struct kref *kref)
+{
+	struct noise_keypair *keypair =
+		container_of(kref, struct noise_keypair, refcount);
+	net_dbg_ratelimited("%s: Keypair %llu destroyed for peer %llu\n",
+			    keypair->entry.peer->device->dev->name,
+			    keypair->internal_id,
+			    keypair->entry.peer->internal_id);
+	index_hashtable_remove(&keypair->entry.peer->device->index_hashtable,
+			       &keypair->entry);
+	call_rcu_bh(&keypair->rcu, keypair_free_rcu);
+}
+
+void noise_keypair_put(struct noise_keypair *keypair, bool unreference_now)
+{
+	if (unlikely(!keypair))
+		return;
+	if (unlikely(unreference_now))
+		index_hashtable_remove(
+			&keypair->entry.peer->device->index_hashtable,
+			&keypair->entry);
+	kref_put(&keypair->refcount, keypair_free_kref);
+}
+
+struct noise_keypair *noise_keypair_get(struct noise_keypair *keypair)
+{
+	RCU_LOCKDEP_WARN(!rcu_read_lock_bh_held(),
+		"Taking noise keypair reference without holding the RCU BH read lock");
+	if (unlikely(!keypair || !kref_get_unless_zero(&keypair->refcount)))
+		return NULL;
+	return keypair;
+}
+
+void noise_keypairs_clear(struct noise_keypairs *keypairs)
+{
+	struct noise_keypair *old;
+
+	spin_lock_bh(&keypairs->keypair_update_lock);
+	old = rcu_dereference_protected(keypairs->previous_keypair,
+		lockdep_is_held(&keypairs->keypair_update_lock));
+	RCU_INIT_POINTER(keypairs->previous_keypair, NULL);
+	noise_keypair_put(old, true);
+	old = rcu_dereference_protected(keypairs->next_keypair,
+		lockdep_is_held(&keypairs->keypair_update_lock));
+	RCU_INIT_POINTER(keypairs->next_keypair, NULL);
+	noise_keypair_put(old, true);
+	old = rcu_dereference_protected(keypairs->current_keypair,
+		lockdep_is_held(&keypairs->keypair_update_lock));
+	RCU_INIT_POINTER(keypairs->current_keypair, NULL);
+	noise_keypair_put(old, true);
+	spin_unlock_bh(&keypairs->keypair_update_lock);
+}
+
+static void add_new_keypair(struct noise_keypairs *keypairs,
+			    struct noise_keypair *new_keypair)
+{
+	struct noise_keypair *previous_keypair, *next_keypair, *current_keypair;
+
+	spin_lock_bh(&keypairs->keypair_update_lock);
+	previous_keypair = rcu_dereference_protected(keypairs->previous_keypair,
+		lockdep_is_held(&keypairs->keypair_update_lock));
+	next_keypair = rcu_dereference_protected(keypairs->next_keypair,
+		lockdep_is_held(&keypairs->keypair_update_lock));
+	current_keypair = rcu_dereference_protected(keypairs->current_keypair,
+		lockdep_is_held(&keypairs->keypair_update_lock));
+	if (new_keypair->i_am_the_initiator) {
+		/* If we're the initiator, it means we've sent a handshake, and
+		 * received a confirmation response, which means this new
+		 * keypair can now be used.
+		 */
+		if (next_keypair) {
+			/* If there already was a next keypair pending, we
+			 * demote it to be the previous keypair, and free the
+			 * existing current. Note that this means KCI can result
+			 * in this transition. It would perhaps be more sound to
+			 * always just get rid of the unused next keypair
+			 * instead of putting it in the previous slot, but this
+			 * might be a bit less robust. Something to think about
+			 * for the future.
+			 */
+			RCU_INIT_POINTER(keypairs->next_keypair, NULL);
+			rcu_assign_pointer(keypairs->previous_keypair,
+					   next_keypair);
+			noise_keypair_put(current_keypair, true);
+		} else /* If there wasn't an existing next keypair, we replace
+			 * the previous with the current one.
+			 */
+			rcu_assign_pointer(keypairs->previous_keypair,
+					   current_keypair);
+		/* At this point we can get rid of the old previous keypair, and
+		 * set up the new keypair.
+		 */
+		noise_keypair_put(previous_keypair, true);
+		rcu_assign_pointer(keypairs->current_keypair, new_keypair);
+	} else {
+		/* If we're the responder, it means we can't use the new keypair
+		 * until we receive confirmation via the first data packet, so
+		 * we get rid of the existing previous one, the possibly
+		 * existing next one, and slide in the new next one.
+		 */
+		rcu_assign_pointer(keypairs->next_keypair, new_keypair);
+		noise_keypair_put(next_keypair, true);
+		RCU_INIT_POINTER(keypairs->previous_keypair, NULL);
+		noise_keypair_put(previous_keypair, true);
+	}
+	spin_unlock_bh(&keypairs->keypair_update_lock);
+}
+
+bool noise_received_with_keypair(struct noise_keypairs *keypairs,
+				 struct noise_keypair *received_keypair)
+{
+	struct noise_keypair *old_keypair;
+	bool key_is_new;
+
+	/* We first check without taking the spinlock. */
+	key_is_new = received_keypair ==
+		     rcu_access_pointer(keypairs->next_keypair);
+	if (likely(!key_is_new))
+		return false;
+
+	spin_lock_bh(&keypairs->keypair_update_lock);
+	/* After locking, we double check that things didn't change from
+	 * beneath us.
+	 */
+	if (unlikely(received_keypair !=
+		    rcu_dereference_protected(keypairs->next_keypair,
+			    lockdep_is_held(&keypairs->keypair_update_lock)))) {
+		spin_unlock_bh(&keypairs->keypair_update_lock);
+		return false;
+	}
+
+	/* When we've finally received the confirmation, we slide the next
+	 * into the current, the current into the previous, and get rid of
+	 * the old previous.
+	 */
+	old_keypair = rcu_dereference_protected(keypairs->previous_keypair,
+		lockdep_is_held(&keypairs->keypair_update_lock));
+	rcu_assign_pointer(keypairs->previous_keypair,
+		rcu_dereference_protected(keypairs->current_keypair,
+			lockdep_is_held(&keypairs->keypair_update_lock)));
+	noise_keypair_put(old_keypair, true);
+	rcu_assign_pointer(keypairs->current_keypair, received_keypair);
+	RCU_INIT_POINTER(keypairs->next_keypair, NULL);
+
+	spin_unlock_bh(&keypairs->keypair_update_lock);
+	return true;
+}
+
+/* Must hold static_identity->lock */
+void noise_set_static_identity_private_key(
+	struct noise_static_identity *static_identity,
+	const u8 private_key[NOISE_PUBLIC_KEY_LEN])
+{
+	memcpy(static_identity->static_private, private_key,
+	       NOISE_PUBLIC_KEY_LEN);
+	static_identity->has_identity = curve25519_generate_public(
+		static_identity->static_public, private_key);
+}
+
+/* This is Hugo Krawczyk's HKDF:
+ *  - https://eprint.iacr.org/2010/264.pdf
+ *  - https://tools.ietf.org/html/rfc5869
+ */
+static void kdf(u8 *first_dst, u8 *second_dst, u8 *third_dst, const u8 *data,
+		size_t first_len, size_t second_len, size_t third_len,
+		size_t data_len, const u8 chaining_key[NOISE_HASH_LEN])
+{
+	u8 output[BLAKE2S_OUTBYTES + 1];
+	u8 secret[BLAKE2S_OUTBYTES];
+
+#ifdef DEBUG
+	BUG_ON(first_len > BLAKE2S_OUTBYTES || second_len > BLAKE2S_OUTBYTES ||
+	       third_len > BLAKE2S_OUTBYTES ||
+	       ((second_len || second_dst || third_len || third_dst) &&
+		(!first_len || !first_dst)) ||
+	       ((third_len || third_dst) && (!second_len || !second_dst)));
+#endif
+
+	/* Extract entropy from data into secret */
+	blake2s_hmac(secret, data, chaining_key, BLAKE2S_OUTBYTES, data_len,
+		     NOISE_HASH_LEN);
+
+	if (!first_dst || !first_len)
+		goto out;
+
+	/* Expand first key: key = secret, data = 0x1 */
+	output[0] = 1;
+	blake2s_hmac(output, output, secret, BLAKE2S_OUTBYTES, 1,
+		     BLAKE2S_OUTBYTES);
+	memcpy(first_dst, output, first_len);
+
+	if (!second_dst || !second_len)
+		goto out;
+
+	/* Expand second key: key = secret, data = first-key || 0x2 */
+	output[BLAKE2S_OUTBYTES] = 2;
+	blake2s_hmac(output, output, secret, BLAKE2S_OUTBYTES,
+		     BLAKE2S_OUTBYTES + 1, BLAKE2S_OUTBYTES);
+	memcpy(second_dst, output, second_len);
+
+	if (!third_dst || !third_len)
+		goto out;
+
+	/* Expand third key: key = secret, data = second-key || 0x3 */
+	output[BLAKE2S_OUTBYTES] = 3;
+	blake2s_hmac(output, output, secret, BLAKE2S_OUTBYTES,
+		     BLAKE2S_OUTBYTES + 1, BLAKE2S_OUTBYTES);
+	memcpy(third_dst, output, third_len);
+
+out:
+	/* Clear sensitive data from stack */
+	memzero_explicit(secret, BLAKE2S_OUTBYTES);
+	memzero_explicit(output, BLAKE2S_OUTBYTES + 1);
+}
+
+static void symmetric_key_init(struct noise_symmetric_key *key)
+{
+	spin_lock_init(&key->counter.receive.lock);
+	atomic64_set(&key->counter.counter, 0);
+	memset(key->counter.receive.backtrack, 0,
+	       sizeof(key->counter.receive.backtrack));
+	key->birthdate = ktime_get_boot_fast_ns();
+	key->is_valid = true;
+}
+
+static void derive_keys(struct noise_symmetric_key *first_dst,
+			struct noise_symmetric_key *second_dst,
+			const u8 chaining_key[NOISE_HASH_LEN])
+{
+	kdf(first_dst->key, second_dst->key, NULL, NULL,
+	    NOISE_SYMMETRIC_KEY_LEN, NOISE_SYMMETRIC_KEY_LEN, 0, 0,
+	    chaining_key);
+	symmetric_key_init(first_dst);
+	symmetric_key_init(second_dst);
+}
+
+static bool __must_check mix_dh(u8 chaining_key[NOISE_HASH_LEN],
+				u8 key[NOISE_SYMMETRIC_KEY_LEN],
+				const u8 private[NOISE_PUBLIC_KEY_LEN],
+				const u8 public[NOISE_PUBLIC_KEY_LEN])
+{
+	u8 dh_calculation[NOISE_PUBLIC_KEY_LEN];
+
+	if (unlikely(!curve25519(dh_calculation, private, public)))
+		return false;
+	kdf(chaining_key, key, NULL, dh_calculation, NOISE_HASH_LEN,
+	    NOISE_SYMMETRIC_KEY_LEN, 0, NOISE_PUBLIC_KEY_LEN, chaining_key);
+	memzero_explicit(dh_calculation, NOISE_PUBLIC_KEY_LEN);
+	return true;
+}
+
+static void mix_hash(u8 hash[NOISE_HASH_LEN], const u8 *src, size_t src_len)
+{
+	struct blake2s_state blake;
+
+	blake2s_init(&blake, NOISE_HASH_LEN);
+	blake2s_update(&blake, hash, NOISE_HASH_LEN);
+	blake2s_update(&blake, src, src_len);
+	blake2s_final(&blake, hash, NOISE_HASH_LEN);
+}
+
+static void mix_psk(u8 chaining_key[NOISE_HASH_LEN], u8 hash[NOISE_HASH_LEN],
+		    u8 key[NOISE_SYMMETRIC_KEY_LEN],
+		    const u8 psk[NOISE_SYMMETRIC_KEY_LEN])
+{
+	u8 temp_hash[NOISE_HASH_LEN];
+
+	kdf(chaining_key, temp_hash, key, psk, NOISE_HASH_LEN, NOISE_HASH_LEN,
+	    NOISE_SYMMETRIC_KEY_LEN, NOISE_SYMMETRIC_KEY_LEN, chaining_key);
+	mix_hash(hash, temp_hash, NOISE_HASH_LEN);
+	memzero_explicit(temp_hash, NOISE_HASH_LEN);
+}
+
+static void handshake_init(u8 chaining_key[NOISE_HASH_LEN],
+			   u8 hash[NOISE_HASH_LEN],
+			   const u8 remote_static[NOISE_PUBLIC_KEY_LEN])
+{
+	memcpy(hash, handshake_init_hash, NOISE_HASH_LEN);
+	memcpy(chaining_key, handshake_init_chaining_key, NOISE_HASH_LEN);
+	mix_hash(hash, remote_static, NOISE_PUBLIC_KEY_LEN);
+}
+
+static void message_encrypt(u8 *dst_ciphertext, const u8 *src_plaintext,
+			    size_t src_len, u8 key[NOISE_SYMMETRIC_KEY_LEN],
+			    u8 hash[NOISE_HASH_LEN])
+{
+	chacha20poly1305_encrypt(dst_ciphertext, src_plaintext, src_len, hash,
+				 NOISE_HASH_LEN,
+				 0 /* Always zero for Noise_IK */, key);
+	mix_hash(hash, dst_ciphertext, noise_encrypted_len(src_len));
+}
+
+static bool message_decrypt(u8 *dst_plaintext, const u8 *src_ciphertext,
+			    size_t src_len, u8 key[NOISE_SYMMETRIC_KEY_LEN],
+			    u8 hash[NOISE_HASH_LEN])
+{
+	if (!chacha20poly1305_decrypt(dst_plaintext, src_ciphertext, src_len,
+				      hash, NOISE_HASH_LEN,
+				      0 /* Always zero for Noise_IK */, key))
+		return false;
+	mix_hash(hash, src_ciphertext, src_len);
+	return true;
+}
+
+static void message_ephemeral(u8 ephemeral_dst[NOISE_PUBLIC_KEY_LEN],
+			      const u8 ephemeral_src[NOISE_PUBLIC_KEY_LEN],
+			      u8 chaining_key[NOISE_HASH_LEN],
+			      u8 hash[NOISE_HASH_LEN])
+{
+	if (ephemeral_dst != ephemeral_src)
+		memcpy(ephemeral_dst, ephemeral_src, NOISE_PUBLIC_KEY_LEN);
+	mix_hash(hash, ephemeral_src, NOISE_PUBLIC_KEY_LEN);
+	kdf(chaining_key, NULL, NULL, ephemeral_src, NOISE_HASH_LEN, 0, 0,
+	    NOISE_PUBLIC_KEY_LEN, chaining_key);
+}
+
+static void tai64n_now(u8 output[NOISE_TIMESTAMP_LEN])
+{
+	struct timespec64 now;
+
+	getnstimeofday64(&now);
+	/* https://cr.yp.to/libtai/tai64.html */
+	*(__be64 *)output = cpu_to_be64(0x400000000000000aULL + now.tv_sec);
+	*(__be32 *)(output + sizeof(__be64)) = cpu_to_be32(now.tv_nsec);
+}
+
+bool noise_handshake_create_initiation(struct message_handshake_initiation *dst,
+				       struct noise_handshake *handshake)
+{
+	u8 timestamp[NOISE_TIMESTAMP_LEN];
+	u8 key[NOISE_SYMMETRIC_KEY_LEN];
+	bool ret = false;
+
+	/* We need to wait for crng _before_ taking any locks, since
+	 * curve25519_generate_secret uses get_random_bytes_wait.
+	 */
+	wait_for_random_bytes();
+
+	down_read(&handshake->static_identity->lock);
+	down_write(&handshake->lock);
+
+	if (unlikely(!handshake->static_identity->has_identity))
+		goto out;
+
+	dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION);
+
+	handshake_init(handshake->chaining_key, handshake->hash,
+		       handshake->remote_static);
+
+	/* e */
+	curve25519_generate_secret(handshake->ephemeral_private);
+	if (!curve25519_generate_public(dst->unencrypted_ephemeral,
+					handshake->ephemeral_private))
+		goto out;
+	message_ephemeral(dst->unencrypted_ephemeral,
+			  dst->unencrypted_ephemeral, handshake->chaining_key,
+			  handshake->hash);
+
+	/* es */
+	if (!mix_dh(handshake->chaining_key, key, handshake->ephemeral_private,
+		    handshake->remote_static))
+		goto out;
+
+	/* s */
+	message_encrypt(dst->encrypted_static,
+			handshake->static_identity->static_public,
+			NOISE_PUBLIC_KEY_LEN, key, handshake->hash);
+
+	/* ss */
+	kdf(handshake->chaining_key, key, NULL,
+	    handshake->precomputed_static_static, NOISE_HASH_LEN,
+	    NOISE_SYMMETRIC_KEY_LEN, 0, NOISE_PUBLIC_KEY_LEN,
+	    handshake->chaining_key);
+
+	/* {t} */
+	tai64n_now(timestamp);
+	message_encrypt(dst->encrypted_timestamp, timestamp,
+			NOISE_TIMESTAMP_LEN, key, handshake->hash);
+
+	dst->sender_index = index_hashtable_insert(
+		&handshake->entry.peer->device->index_hashtable,
+		&handshake->entry);
+
+	handshake->state = HANDSHAKE_CREATED_INITIATION;
+	ret = true;
+
+out:
+	up_write(&handshake->lock);
+	up_read(&handshake->static_identity->lock);
+	memzero_explicit(key, NOISE_SYMMETRIC_KEY_LEN);
+	return ret;
+}
+
+struct wireguard_peer *
+noise_handshake_consume_initiation(struct message_handshake_initiation *src,
+				   struct wireguard_device *wg)
+{
+	struct wireguard_peer *peer = NULL, *ret_peer = NULL;
+	struct noise_handshake *handshake;
+	bool replay_attack, flood_attack;
+	u8 key[NOISE_SYMMETRIC_KEY_LEN];
+	u8 chaining_key[NOISE_HASH_LEN];
+	u8 hash[NOISE_HASH_LEN];
+	u8 s[NOISE_PUBLIC_KEY_LEN];
+	u8 e[NOISE_PUBLIC_KEY_LEN];
+	u8 t[NOISE_TIMESTAMP_LEN];
+
+	down_read(&wg->static_identity.lock);
+	if (unlikely(!wg->static_identity.has_identity))
+		goto out;
+
+	handshake_init(chaining_key, hash, wg->static_identity.static_public);
+
+	/* e */
+	message_ephemeral(e, src->unencrypted_ephemeral, chaining_key, hash);
+
+	/* es */
+	if (!mix_dh(chaining_key, key, wg->static_identity.static_private, e))
+		goto out;
+
+	/* s */
+	if (!message_decrypt(s, src->encrypted_static,
+			     sizeof(src->encrypted_static), key, hash))
+		goto out;
+
+	/* Lookup which peer we're actually talking to */
+	peer = pubkey_hashtable_lookup(&wg->peer_hashtable, s);
+	if (!peer)
+		goto out;
+	handshake = &peer->handshake;
+
+	/* ss */
+	kdf(chaining_key, key, NULL, handshake->precomputed_static_static,
+	    NOISE_HASH_LEN, NOISE_SYMMETRIC_KEY_LEN, 0, NOISE_PUBLIC_KEY_LEN,
+	    chaining_key);
+
+	/* {t} */
+	if (!message_decrypt(t, src->encrypted_timestamp,
+			     sizeof(src->encrypted_timestamp), key, hash))
+		goto out;
+
+	down_read(&handshake->lock);
+	replay_attack = memcmp(t, handshake->latest_timestamp,
+			       NOISE_TIMESTAMP_LEN) <= 0;
+	flood_attack = handshake->last_initiation_consumption +
+			       NSEC_PER_SEC / INITIATIONS_PER_SECOND >
+		       ktime_get_boot_fast_ns();
+	up_read(&handshake->lock);
+	if (replay_attack || flood_attack)
+		goto out;
+
+	/* Success! Copy everything to peer */
+	down_write(&handshake->lock);
+	memcpy(handshake->remote_ephemeral, e, NOISE_PUBLIC_KEY_LEN);
+	memcpy(handshake->latest_timestamp, t, NOISE_TIMESTAMP_LEN);
+	memcpy(handshake->hash, hash, NOISE_HASH_LEN);
+	memcpy(handshake->chaining_key, chaining_key, NOISE_HASH_LEN);
+	handshake->remote_index = src->sender_index;
+	handshake->last_initiation_consumption = ktime_get_boot_fast_ns();
+	handshake->state = HANDSHAKE_CONSUMED_INITIATION;
+	up_write(&handshake->lock);
+	ret_peer = peer;
+
+out:
+	memzero_explicit(key, NOISE_SYMMETRIC_KEY_LEN);
+	memzero_explicit(hash, NOISE_HASH_LEN);
+	memzero_explicit(chaining_key, NOISE_HASH_LEN);
+	up_read(&wg->static_identity.lock);
+	if (!ret_peer)
+		peer_put(peer);
+	return ret_peer;
+}
+
+bool noise_handshake_create_response(struct message_handshake_response *dst,
+				     struct noise_handshake *handshake)
+{
+	bool ret = false;
+	u8 key[NOISE_SYMMETRIC_KEY_LEN];
+
+	/* We need to wait for crng _before_ taking any locks, since
+	 * curve25519_generate_secret uses get_random_bytes_wait.
+	 */
+	wait_for_random_bytes();
+
+	down_read(&handshake->static_identity->lock);
+	down_write(&handshake->lock);
+
+	if (handshake->state != HANDSHAKE_CONSUMED_INITIATION)
+		goto out;
+
+	dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE);
+	dst->receiver_index = handshake->remote_index;
+
+	/* e */
+	curve25519_generate_secret(handshake->ephemeral_private);
+	if (!curve25519_generate_public(dst->unencrypted_ephemeral,
+					handshake->ephemeral_private))
+		goto out;
+	message_ephemeral(dst->unencrypted_ephemeral,
+			  dst->unencrypted_ephemeral, handshake->chaining_key,
+			  handshake->hash);
+
+	/* ee */
+	if (!mix_dh(handshake->chaining_key, NULL, handshake->ephemeral_private,
+		    handshake->remote_ephemeral))
+		goto out;
+
+	/* se */
+	if (!mix_dh(handshake->chaining_key, NULL, handshake->ephemeral_private,
+		    handshake->remote_static))
+		goto out;
+
+	/* psk */
+	mix_psk(handshake->chaining_key, handshake->hash, key,
+		handshake->preshared_key);
+
+	/* {} */
+	message_encrypt(dst->encrypted_nothing, NULL, 0, key, handshake->hash);
+
+	dst->sender_index = index_hashtable_insert(
+		&handshake->entry.peer->device->index_hashtable,
+		&handshake->entry);
+
+	handshake->state = HANDSHAKE_CREATED_RESPONSE;
+	ret = true;
+
+out:
+	up_write(&handshake->lock);
+	up_read(&handshake->static_identity->lock);
+	memzero_explicit(key, NOISE_SYMMETRIC_KEY_LEN);
+	return ret;
+}
+
+struct wireguard_peer *
+noise_handshake_consume_response(struct message_handshake_response *src,
+				 struct wireguard_device *wg)
+{
+	struct noise_handshake *handshake;
+	struct wireguard_peer *peer = NULL, *ret_peer = NULL;
+	u8 key[NOISE_SYMMETRIC_KEY_LEN];
+	u8 hash[NOISE_HASH_LEN];
+	u8 chaining_key[NOISE_HASH_LEN];
+	u8 e[NOISE_PUBLIC_KEY_LEN];
+	u8 ephemeral_private[NOISE_PUBLIC_KEY_LEN];
+	u8 static_private[NOISE_PUBLIC_KEY_LEN];
+	enum noise_handshake_state state = HANDSHAKE_ZEROED;
+
+	down_read(&wg->static_identity.lock);
+
+	if (unlikely(!wg->static_identity.has_identity))
+		goto out;
+
+	handshake = (struct noise_handshake *)index_hashtable_lookup(
+		&wg->index_hashtable, INDEX_HASHTABLE_HANDSHAKE,
+		src->receiver_index, &peer);
+	if (unlikely(!handshake))
+		goto out;
+
+	down_read(&handshake->lock);
+	state = handshake->state;
+	memcpy(hash, handshake->hash, NOISE_HASH_LEN);
+	memcpy(chaining_key, handshake->chaining_key, NOISE_HASH_LEN);
+	memcpy(ephemeral_private, handshake->ephemeral_private,
+	       NOISE_PUBLIC_KEY_LEN);
+	up_read(&handshake->lock);
+
+	if (state != HANDSHAKE_CREATED_INITIATION)
+		goto fail;
+
+	/* e */
+	message_ephemeral(e, src->unencrypted_ephemeral, chaining_key, hash);
+
+	/* ee */
+	if (!mix_dh(chaining_key, NULL, ephemeral_private, e))
+		goto fail;
+
+	/* se */
+	if (!mix_dh(chaining_key, NULL, wg->static_identity.static_private, e))
+		goto fail;
+
+	/* psk */
+	mix_psk(chaining_key, hash, key, handshake->preshared_key);
+
+	/* {} */
+	if (!message_decrypt(NULL, src->encrypted_nothing,
+			     sizeof(src->encrypted_nothing), key, hash))
+		goto fail;
+
+	/* Success! Copy everything to peer */
+	down_write(&handshake->lock);
+	/* It's important to check that the state is still the same, while we
+	 * have an exclusive lock.
+	 */
+	if (handshake->state != state) {
+		up_write(&handshake->lock);
+		goto fail;
+	}
+	memcpy(handshake->remote_ephemeral, e, NOISE_PUBLIC_KEY_LEN);
+	memcpy(handshake->hash, hash, NOISE_HASH_LEN);
+	memcpy(handshake->chaining_key, chaining_key, NOISE_HASH_LEN);
+	handshake->remote_index = src->sender_index;
+	handshake->state = HANDSHAKE_CONSUMED_RESPONSE;
+	up_write(&handshake->lock);
+	ret_peer = peer;
+	goto out;
+
+fail:
+	peer_put(peer);
+out:
+	memzero_explicit(key, NOISE_SYMMETRIC_KEY_LEN);
+	memzero_explicit(hash, NOISE_HASH_LEN);
+	memzero_explicit(chaining_key, NOISE_HASH_LEN);
+	memzero_explicit(ephemeral_private, NOISE_PUBLIC_KEY_LEN);
+	memzero_explicit(static_private, NOISE_PUBLIC_KEY_LEN);
+	up_read(&wg->static_identity.lock);
+	return ret_peer;
+}
+
+bool noise_handshake_begin_session(struct noise_handshake *handshake,
+				   struct noise_keypairs *keypairs)
+{
+	struct noise_keypair *new_keypair;
+	bool ret = false;
+
+	down_write(&handshake->lock);
+	if (handshake->state != HANDSHAKE_CREATED_RESPONSE &&
+	    handshake->state != HANDSHAKE_CONSUMED_RESPONSE)
+		goto out;
+
+	new_keypair = keypair_create(handshake->entry.peer);
+	if (!new_keypair)
+		goto out;
+	new_keypair->i_am_the_initiator = handshake->state ==
+					  HANDSHAKE_CONSUMED_RESPONSE;
+	new_keypair->remote_index = handshake->remote_index;
+
+	if (new_keypair->i_am_the_initiator)
+		derive_keys(&new_keypair->sending, &new_keypair->receiving,
+			    handshake->chaining_key);
+	else
+		derive_keys(&new_keypair->receiving, &new_keypair->sending,
+			    handshake->chaining_key);
+
+	handshake_zero(handshake);
+	rcu_read_lock_bh();
+	if (likely(!container_of(handshake, struct wireguard_peer,
+				 handshake)->is_dead)) {
+		add_new_keypair(keypairs, new_keypair);
+		net_dbg_ratelimited("%s: Keypair %llu created for peer %llu\n",
+				    handshake->entry.peer->device->dev->name,
+				    new_keypair->internal_id,
+				    handshake->entry.peer->internal_id);
+		ret = index_hashtable_replace(
+			&handshake->entry.peer->device->index_hashtable,
+			&handshake->entry, &new_keypair->entry);
+	} else
+		kzfree(new_keypair);
+	rcu_read_unlock_bh();
+
+out:
+	up_write(&handshake->lock);
+	return ret;
+}
diff --git a/drivers/net/wireguard/noise.h b/drivers/net/wireguard/noise.h
new file mode 100644
index 000000000000..6a563ce41750
--- /dev/null
+++ b/drivers/net/wireguard/noise.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+#ifndef _WG_NOISE_H
+#define _WG_NOISE_H
+
+#include "messages.h"
+#include "hashtables.h"
+
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+#include <linux/rwsem.h>
+#include <linux/mutex.h>
+#include <linux/ktime.h>
+#include <linux/kref.h>
+
+union noise_counter {
+	struct {
+		u64 counter;
+		unsigned long backtrack[COUNTER_BITS_TOTAL / BITS_PER_LONG];
+		spinlock_t lock;
+	} receive;
+	atomic64_t counter;
+};
+
+struct noise_symmetric_key {
+	u8 key[NOISE_SYMMETRIC_KEY_LEN];
+	union noise_counter counter;
+	u64 birthdate;
+	bool is_valid;
+};
+
+struct noise_keypair {
+	struct index_hashtable_entry entry;
+	struct noise_symmetric_key sending;
+	struct noise_symmetric_key receiving;
+	__le32 remote_index;
+	bool i_am_the_initiator;
+	struct kref refcount;
+	struct rcu_head rcu;
+	u64 internal_id;
+};
+
+struct noise_keypairs {
+	struct noise_keypair __rcu *current_keypair;
+	struct noise_keypair __rcu *previous_keypair;
+	struct noise_keypair __rcu *next_keypair;
+	spinlock_t keypair_update_lock;
+};
+
+struct noise_static_identity {
+	u8 static_public[NOISE_PUBLIC_KEY_LEN];
+	u8 static_private[NOISE_PUBLIC_KEY_LEN];
+	struct rw_semaphore lock;
+	bool has_identity;
+};
+
+enum noise_handshake_state {
+	HANDSHAKE_ZEROED,
+	HANDSHAKE_CREATED_INITIATION,
+	HANDSHAKE_CONSUMED_INITIATION,
+	HANDSHAKE_CREATED_RESPONSE,
+	HANDSHAKE_CONSUMED_RESPONSE
+};
+
+struct noise_handshake {
+	struct index_hashtable_entry entry;
+
+	enum noise_handshake_state state;
+	u64 last_initiation_consumption;
+
+	struct noise_static_identity *static_identity;
+
+	u8 ephemeral_private[NOISE_PUBLIC_KEY_LEN];
+	u8 remote_static[NOISE_PUBLIC_KEY_LEN];
+	u8 remote_ephemeral[NOISE_PUBLIC_KEY_LEN];
+	u8 precomputed_static_static[NOISE_PUBLIC_KEY_LEN];
+
+	u8 preshared_key[NOISE_SYMMETRIC_KEY_LEN];
+
+	u8 hash[NOISE_HASH_LEN];
+	u8 chaining_key[NOISE_HASH_LEN];
+
+	u8 latest_timestamp[NOISE_TIMESTAMP_LEN];
+	__le32 remote_index;
+
+	/* Protects all members except the immutable (after noise_handshake_
+	 * init): remote_static, precomputed_static_static, static_identity. */
+	struct rw_semaphore lock;
+};
+
+struct wireguard_device;
+
+void noise_init(void);
+bool noise_handshake_init(struct noise_handshake *handshake,
+			  struct noise_static_identity *static_identity,
+			  const u8 peer_public_key[NOISE_PUBLIC_KEY_LEN],
+			  const u8 peer_preshared_key[NOISE_SYMMETRIC_KEY_LEN],
+			  struct wireguard_peer *peer);
+void noise_handshake_clear(struct noise_handshake *handshake);
+void noise_keypair_put(struct noise_keypair *keypair, bool unreference_now);
+struct noise_keypair *noise_keypair_get(struct noise_keypair *keypair);
+void noise_keypairs_clear(struct noise_keypairs *keypairs);
+bool noise_received_with_keypair(struct noise_keypairs *keypairs,
+				 struct noise_keypair *received_keypair);
+
+void noise_set_static_identity_private_key(
+	struct noise_static_identity *static_identity,
+	const u8 private_key[NOISE_PUBLIC_KEY_LEN]);
+bool noise_precompute_static_static(struct wireguard_peer *peer);
+
+bool noise_handshake_create_initiation(struct message_handshake_initiation *dst,
+				       struct noise_handshake *handshake);
+struct wireguard_peer *
+noise_handshake_consume_initiation(struct message_handshake_initiation *src,
+				   struct wireguard_device *wg);
+
+bool noise_handshake_create_response(struct message_handshake_response *dst,
+				     struct noise_handshake *handshake);
+struct wireguard_peer *
+noise_handshake_consume_response(struct message_handshake_response *src,
+				 struct wireguard_device *wg);
+
+bool noise_handshake_begin_session(struct noise_handshake *handshake,
+				   struct noise_keypairs *keypairs);
+
+#endif /* _WG_NOISE_H */
diff --git a/drivers/net/wireguard/peer.c b/drivers/net/wireguard/peer.c
new file mode 100644
index 000000000000..ca7981cf77c1
--- /dev/null
+++ b/drivers/net/wireguard/peer.c
@@ -0,0 +1,191 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "peer.h"
+#include "device.h"
+#include "queueing.h"
+#include "timers.h"
+#include "hashtables.h"
+#include "noise.h"
+
+#include <linux/kref.h>
+#include <linux/lockdep.h>
+#include <linux/rcupdate.h>
+#include <linux/list.h>
+
+static atomic64_t peer_counter = ATOMIC64_INIT(0);
+
+struct wireguard_peer *
+peer_create(struct wireguard_device *wg,
+	    const u8 public_key[NOISE_PUBLIC_KEY_LEN],
+	    const u8 preshared_key[NOISE_SYMMETRIC_KEY_LEN])
+{
+	struct wireguard_peer *peer;
+
+	lockdep_assert_held(&wg->device_update_lock);
+
+	if (wg->num_peers >= MAX_PEERS_PER_DEVICE)
+		return NULL;
+
+	peer = kzalloc(sizeof(*peer), GFP_KERNEL);
+	if (unlikely(!peer))
+		return NULL;
+	peer->device = wg;
+
+	if (!noise_handshake_init(&peer->handshake, &wg->static_identity,
+				  public_key, preshared_key, peer))
+		goto err_1;
+	if (dst_cache_init(&peer->endpoint_cache, GFP_KERNEL))
+		goto err_1;
+	if (packet_queue_init(&peer->tx_queue, packet_tx_worker, false,
+			      MAX_QUEUED_PACKETS))
+		goto err_2;
+	if (packet_queue_init(&peer->rx_queue, NULL, false, MAX_QUEUED_PACKETS))
+		goto err_3;
+
+	peer->internal_id = atomic64_inc_return(&peer_counter);
+	peer->serial_work_cpu = nr_cpumask_bits;
+	cookie_init(&peer->latest_cookie);
+	timers_init(peer);
+	cookie_checker_precompute_peer_keys(peer);
+	spin_lock_init(&peer->keypairs.keypair_update_lock);
+	INIT_WORK(&peer->transmit_handshake_work, packet_handshake_send_worker);
+	rwlock_init(&peer->endpoint_lock);
+	kref_init(&peer->refcount);
+	skb_queue_head_init(&peer->staged_packet_queue);
+	atomic64_set(&peer->last_sent_handshake,
+		     ktime_get_boot_fast_ns() -
+			     (u64)(REKEY_TIMEOUT + 1) * NSEC_PER_SEC);
+	set_bit(NAPI_STATE_NO_BUSY_POLL, &peer->napi.state);
+	netif_napi_add(wg->dev, &peer->napi, packet_rx_poll, NAPI_POLL_WEIGHT);
+	napi_enable(&peer->napi);
+	list_add_tail(&peer->peer_list, &wg->peer_list);
+	pubkey_hashtable_add(&wg->peer_hashtable, peer);
+	++wg->num_peers;
+	pr_debug("%s: Peer %llu created\n", wg->dev->name, peer->internal_id);
+	return peer;
+
+err_3:
+	packet_queue_free(&peer->tx_queue, false);
+err_2:
+	dst_cache_destroy(&peer->endpoint_cache);
+err_1:
+	kfree(peer);
+	return NULL;
+}
+
+struct wireguard_peer *peer_get_maybe_zero(struct wireguard_peer *peer)
+{
+	RCU_LOCKDEP_WARN(!rcu_read_lock_bh_held(),
+			 "Taking peer reference without holding the RCU read lock");
+	if (unlikely(!peer || !kref_get_unless_zero(&peer->refcount)))
+		return NULL;
+	return peer;
+}
+
+/* We have a separate "remove" function to get rid of the final reference
+ * because peer_list, clearing handshakes, and flushing all require mutexes
+ * which requires sleeping, which must only be done from certain contexts.
+ */
+void peer_remove(struct wireguard_peer *peer)
+{
+	if (unlikely(!peer))
+		return;
+	lockdep_assert_held(&peer->device->device_update_lock);
+
+	/* Remove from configuration-time lookup structures so new packets
+	 * can't enter.
+	 */
+	list_del_init(&peer->peer_list);
+	allowedips_remove_by_peer(&peer->device->peer_allowedips, peer,
+				  &peer->device->device_update_lock);
+	pubkey_hashtable_remove(&peer->device->peer_hashtable, peer);
+
+	/* Mark as dead, so that we don't allow jumping contexts after. */
+	WRITE_ONCE(peer->is_dead, true);
+	synchronize_rcu_bh();
+
+	/* Now that no more keypairs can be created for this peer, we destroy
+	 * existing ones.
+	 */
+	noise_keypairs_clear(&peer->keypairs);
+
+	/* Destroy all ongoing timers that were in-flight at the beginning of
+	 * this function.
+	 */
+	timers_stop(peer);
+
+	/* The transition between packet encryption/decryption queues isn't
+	 * guarded by is_dead, but each reference's life is strictly bounded by
+	 * two generations: once for parallel crypto and once for serial
+	 * ingestion, so we can simply flush twice, and be sure that we no
+	 * longer have references inside these queues.
+	 */
+
+	/* a) For encrypt/decrypt. */
+	flush_workqueue(peer->device->packet_crypt_wq);
+	/* b.1) For send (but not receive, since that's napi). */
+	flush_workqueue(peer->device->packet_crypt_wq);
+	/* b.2.1) For receive (but not send, since that's wq). */
+	napi_disable(&peer->napi);
+	/* b.2.1) It's now safe to remove the napi struct, which must be done
+	 * here from process context.
+	 */
+	netif_napi_del(&peer->napi);
+
+	/* Ensure any workstructs we own (like transmit_handshake_work or
+	 * clear_peer_work) no longer are in use.
+	 */
+	flush_workqueue(peer->device->handshake_send_wq);
+
+	--peer->device->num_peers;
+	peer_put(peer);
+}
+
+static void rcu_release(struct rcu_head *rcu)
+{
+	struct wireguard_peer *peer =
+		container_of(rcu, struct wireguard_peer, rcu);
+	dst_cache_destroy(&peer->endpoint_cache);
+	packet_queue_free(&peer->rx_queue, false);
+	packet_queue_free(&peer->tx_queue, false);
+	kzfree(peer);
+}
+
+static void kref_release(struct kref *refcount)
+{
+	struct wireguard_peer *peer =
+		container_of(refcount, struct wireguard_peer, refcount);
+	pr_debug("%s: Peer %llu (%pISpfsc) destroyed\n",
+		 peer->device->dev->name, peer->internal_id,
+		 &peer->endpoint.addr);
+	/* Remove ourself from dynamic runtime lookup structures, now that the
+	 * last reference is gone.
+	 */
+	index_hashtable_remove(&peer->device->index_hashtable,
+			       &peer->handshake.entry);
+	/* Remove any lingering packets that didn't have a chance to be
+	 * transmitted.
+	 */
+	skb_queue_purge(&peer->staged_packet_queue);
+	/* Free the memory used. */
+	call_rcu_bh(&peer->rcu, rcu_release);
+}
+
+void peer_put(struct wireguard_peer *peer)
+{
+	if (unlikely(!peer))
+		return;
+	kref_put(&peer->refcount, kref_release);
+}
+
+void peer_remove_all(struct wireguard_device *wg)
+{
+	struct wireguard_peer *peer, *temp;
+
+	lockdep_assert_held(&wg->device_update_lock);
+	list_for_each_entry_safe (peer, temp, &wg->peer_list, peer_list)
+		peer_remove(peer);
+}
diff --git a/drivers/net/wireguard/peer.h b/drivers/net/wireguard/peer.h
new file mode 100644
index 000000000000..5613ccc2e9c2
--- /dev/null
+++ b/drivers/net/wireguard/peer.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_PEER_H
+#define _WG_PEER_H
+
+#include "device.h"
+#include "noise.h"
+#include "cookie.h"
+
+#include <linux/types.h>
+#include <linux/netfilter.h>
+#include <linux/spinlock.h>
+#include <linux/kref.h>
+#include <net/dst_cache.h>
+
+struct wireguard_device;
+
+struct endpoint {
+	union {
+		struct sockaddr addr;
+		struct sockaddr_in addr4;
+		struct sockaddr_in6 addr6;
+	};
+	union {
+		struct {
+			struct in_addr src4;
+			/* Essentially the same as addr6->scope_id */
+			int src_if4;
+		};
+		struct in6_addr src6;
+	};
+};
+
+struct wireguard_peer {
+	struct wireguard_device *device;
+	struct crypt_queue tx_queue, rx_queue;
+	struct sk_buff_head staged_packet_queue;
+	int serial_work_cpu;
+	struct noise_keypairs keypairs;
+	struct endpoint endpoint;
+	struct dst_cache endpoint_cache;
+	rwlock_t endpoint_lock;
+	struct noise_handshake handshake;
+	atomic64_t last_sent_handshake;
+	struct work_struct transmit_handshake_work, clear_peer_work;
+	struct cookie latest_cookie;
+	struct hlist_node pubkey_hash;
+	u64 rx_bytes, tx_bytes;
+	struct timer_list timer_retransmit_handshake, timer_send_keepalive;
+	struct timer_list timer_new_handshake, timer_zero_key_material;
+	struct timer_list timer_persistent_keepalive;
+	unsigned int timer_handshake_attempts;
+	u16 persistent_keepalive_interval;
+	bool timers_enabled, timer_need_another_keepalive;
+	bool sent_lastminute_handshake;
+	struct timespec walltime_last_handshake;
+	struct kref refcount;
+	struct rcu_head rcu;
+	struct list_head peer_list;
+	u64 internal_id;
+	struct napi_struct napi;
+	bool is_dead;
+};
+
+struct wireguard_peer *
+peer_create(struct wireguard_device *wg,
+	    const u8 public_key[NOISE_PUBLIC_KEY_LEN],
+	    const u8 preshared_key[NOISE_SYMMETRIC_KEY_LEN]);
+
+struct wireguard_peer *__must_check
+peer_get_maybe_zero(struct wireguard_peer *peer);
+static inline struct wireguard_peer *peer_get(struct wireguard_peer *peer)
+{
+	kref_get(&peer->refcount);
+	return peer;
+}
+void peer_put(struct wireguard_peer *peer);
+void peer_remove(struct wireguard_peer *peer);
+void peer_remove_all(struct wireguard_device *wg);
+
+struct wireguard_peer *peer_lookup_by_index(struct wireguard_device *wg,
+					    u32 index);
+
+#endif /* _WG_PEER_H */
diff --git a/drivers/net/wireguard/queueing.c b/drivers/net/wireguard/queueing.c
new file mode 100644
index 000000000000..9ec6588e3bf1
--- /dev/null
+++ b/drivers/net/wireguard/queueing.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "queueing.h"
+
+struct multicore_worker __percpu *
+packet_alloc_percpu_multicore_worker(work_func_t function, void *ptr)
+{
+	int cpu;
+	struct multicore_worker __percpu *worker =
+		alloc_percpu(struct multicore_worker);
+
+	if (!worker)
+		return NULL;
+
+	for_each_possible_cpu (cpu) {
+		per_cpu_ptr(worker, cpu)->ptr = ptr;
+		INIT_WORK(&per_cpu_ptr(worker, cpu)->work, function);
+	}
+	return worker;
+}
+
+int packet_queue_init(struct crypt_queue *queue, work_func_t function,
+		      bool multicore, unsigned int len)
+{
+	int ret;
+
+	memset(queue, 0, sizeof(*queue));
+	ret = ptr_ring_init(&queue->ring, len, GFP_KERNEL);
+	if (ret)
+		return ret;
+	if (function) {
+		if (multicore) {
+			queue->worker = packet_alloc_percpu_multicore_worker(
+				function, queue);
+			if (!queue->worker)
+				return -ENOMEM;
+		} else
+			INIT_WORK(&queue->work, function);
+	}
+	return 0;
+}
+
+void packet_queue_free(struct crypt_queue *queue, bool multicore)
+{
+	if (multicore)
+		free_percpu(queue->worker);
+	WARN_ON(!__ptr_ring_empty(&queue->ring));
+	ptr_ring_cleanup(&queue->ring, NULL);
+}
diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h
new file mode 100644
index 000000000000..66b7134a968a
--- /dev/null
+++ b/drivers/net/wireguard/queueing.h
@@ -0,0 +1,193 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_QUEUEING_H
+#define _WG_QUEUEING_H
+
+#include "peer.h"
+#include <linux/types.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+struct wireguard_device;
+struct wireguard_peer;
+struct multicore_worker;
+struct crypt_queue;
+struct sk_buff;
+
+/* queueing.c APIs: */
+int packet_queue_init(struct crypt_queue *queue, work_func_t function,
+		      bool multicore, unsigned int len);
+void packet_queue_free(struct crypt_queue *queue, bool multicore);
+struct multicore_worker __percpu *
+packet_alloc_percpu_multicore_worker(work_func_t function, void *ptr);
+
+/* receive.c APIs: */
+void packet_receive(struct wireguard_device *wg, struct sk_buff *skb);
+void packet_handshake_receive_worker(struct work_struct *work);
+/* NAPI poll function: */
+int packet_rx_poll(struct napi_struct *napi, int budget);
+/* Workqueue worker: */
+void packet_decrypt_worker(struct work_struct *work);
+
+/* send.c APIs: */
+void packet_send_queued_handshake_initiation(struct wireguard_peer *peer,
+					     bool is_retry);
+void packet_send_handshake_response(struct wireguard_peer *peer);
+void packet_send_handshake_cookie(struct wireguard_device *wg,
+				  struct sk_buff *initiating_skb,
+				  __le32 sender_index);
+void packet_send_keepalive(struct wireguard_peer *peer);
+void packet_send_staged_packets(struct wireguard_peer *peer);
+/* Workqueue workers: */
+void packet_handshake_send_worker(struct work_struct *work);
+void packet_tx_worker(struct work_struct *work);
+void packet_encrypt_worker(struct work_struct *work);
+
+enum packet_state {
+	PACKET_STATE_UNCRYPTED,
+	PACKET_STATE_CRYPTED,
+	PACKET_STATE_DEAD
+};
+
+struct packet_cb {
+	u64 nonce;
+	struct noise_keypair *keypair;
+	atomic_t state;
+	u32 mtu;
+	u8 ds;
+};
+
+#define PACKET_PEER(skb) (((struct packet_cb *)skb->cb)->keypair->entry.peer)
+#define PACKET_CB(skb) ((struct packet_cb *)skb->cb)
+
+/* Returns either the correct skb->protocol value, or 0 if invalid. */
+static inline __be16 skb_examine_untrusted_ip_hdr(struct sk_buff *skb)
+{
+	if (skb_network_header(skb) >= skb->head &&
+	    (skb_network_header(skb) + sizeof(struct iphdr)) <=
+		    skb_tail_pointer(skb) &&
+	    ip_hdr(skb)->version == 4)
+		return htons(ETH_P_IP);
+	if (skb_network_header(skb) >= skb->head &&
+	    (skb_network_header(skb) + sizeof(struct ipv6hdr)) <=
+		    skb_tail_pointer(skb) &&
+	    ipv6_hdr(skb)->version == 6)
+		return htons(ETH_P_IPV6);
+	return 0;
+}
+
+static inline void skb_reset(struct sk_buff *skb)
+{
+	const int pfmemalloc = skb->pfmemalloc;
+	skb_scrub_packet(skb, true);
+	memset(&skb->headers_start, 0,
+	       offsetof(struct sk_buff, headers_end) -
+		       offsetof(struct sk_buff, headers_start));
+	skb->pfmemalloc = pfmemalloc;
+	skb->queue_mapping = 0;
+	skb->nohdr = 0;
+	skb->peeked = 0;
+	skb->mac_len = 0;
+	skb->dev = NULL;
+#ifdef CONFIG_NET_SCHED
+	skb->tc_index = 0;
+	skb_reset_tc(skb);
+#endif
+	skb->hdr_len = skb_headroom(skb);
+	skb_reset_mac_header(skb);
+	skb_reset_network_header(skb);
+	skb_probe_transport_header(skb, 0);
+	skb_reset_inner_headers(skb);
+}
+
+static inline int cpumask_choose_online(int *stored_cpu, unsigned int id)
+{
+	unsigned int cpu = *stored_cpu, cpu_index, i;
+
+	if (unlikely(cpu == nr_cpumask_bits ||
+		     !cpumask_test_cpu(cpu, cpu_online_mask))) {
+		cpu_index = id % cpumask_weight(cpu_online_mask);
+		cpu = cpumask_first(cpu_online_mask);
+		for (i = 0; i < cpu_index; ++i)
+			cpu = cpumask_next(cpu, cpu_online_mask);
+		*stored_cpu = cpu;
+	}
+	return cpu;
+}
+
+/* This function is racy, in the sense that next is unlocked, so it could return
+ * the same CPU twice. A race-free version of this would be to instead store an
+ * atomic sequence number, do an increment-and-return, and then iterate through
+ * every possible CPU until we get to that index -- choose_cpu. However that's
+ * a bit slower, and it doesn't seem like this potential race actually
+ * introduces any performance loss, so we live with it.
+ */
+static inline int cpumask_next_online(int *next)
+{
+	int cpu = *next;
+
+	while (unlikely(!cpumask_test_cpu(cpu, cpu_online_mask)))
+		cpu = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits;
+	*next = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits;
+	return cpu;
+}
+
+static inline int queue_enqueue_per_device_and_peer(
+	struct crypt_queue *device_queue, struct crypt_queue *peer_queue,
+	struct sk_buff *skb, struct workqueue_struct *wq, int *next_cpu)
+{
+	int cpu;
+
+	atomic_set_release(&PACKET_CB(skb)->state, PACKET_STATE_UNCRYPTED);
+	/* We first queue this up for the peer ingestion, but the consumer
+	 * will wait for the state to change to CRYPTED or DEAD before.
+	 */
+	if (unlikely(ptr_ring_produce_bh(&peer_queue->ring, skb)))
+		return -ENOSPC;
+	/* Then we queue it up in the device queue, which consumes the
+	 * packet as soon as it can.
+	 */
+	cpu = cpumask_next_online(next_cpu);
+	if (unlikely(ptr_ring_produce_bh(&device_queue->ring, skb)))
+		return -EPIPE;
+	queue_work_on(cpu, wq, &per_cpu_ptr(device_queue->worker, cpu)->work);
+	return 0;
+}
+
+static inline void queue_enqueue_per_peer(struct crypt_queue *queue,
+					  struct sk_buff *skb,
+					  enum packet_state state)
+{
+	/* We take a reference, because as soon as we call atomic_set, the
+	 * peer can be freed from below us.
+	 */
+	struct wireguard_peer *peer = peer_get(PACKET_PEER(skb));
+	atomic_set_release(&PACKET_CB(skb)->state, state);
+	queue_work_on(cpumask_choose_online(&peer->serial_work_cpu,
+					    peer->internal_id),
+		      peer->device->packet_crypt_wq, &queue->work);
+	peer_put(peer);
+}
+
+static inline void queue_enqueue_per_peer_napi(struct crypt_queue *queue,
+					       struct sk_buff *skb,
+					       enum packet_state state)
+{
+	/* We take a reference, because as soon as we call atomic_set, the
+	 * peer can be freed from below us.
+	 */
+	struct wireguard_peer *peer = peer_get(PACKET_PEER(skb));
+	atomic_set_release(&PACKET_CB(skb)->state, state);
+	napi_schedule(&peer->napi);
+	peer_put(peer);
+}
+
+#ifdef DEBUG
+bool packet_counter_selftest(void);
+#endif
+
+#endif /* _WG_QUEUEING_H */
diff --git a/drivers/net/wireguard/ratelimiter.c b/drivers/net/wireguard/ratelimiter.c
new file mode 100644
index 000000000000..52381ee80663
--- /dev/null
+++ b/drivers/net/wireguard/ratelimiter.c
@@ -0,0 +1,220 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "ratelimiter.h"
+#include <linux/siphash.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <net/ip.h>
+
+static struct kmem_cache *entry_cache;
+static hsiphash_key_t key;
+static spinlock_t table_lock = __SPIN_LOCK_UNLOCKED("ratelimiter_table_lock");
+static DEFINE_MUTEX(init_lock);
+static atomic64_t refcnt = ATOMIC64_INIT(0);
+static atomic_t total_entries = ATOMIC_INIT(0);
+static unsigned int max_entries, table_size;
+static void gc_entries(struct work_struct *);
+static DECLARE_DEFERRABLE_WORK(gc_work, gc_entries);
+static struct hlist_head *table_v4;
+#if IS_ENABLED(CONFIG_IPV6)
+static struct hlist_head *table_v6;
+#endif
+
+struct ratelimiter_entry {
+	u64 last_time_ns, tokens;
+	__be64 ip;
+	void *net;
+	spinlock_t lock;
+	struct hlist_node hash;
+	struct rcu_head rcu;
+};
+
+enum {
+	PACKETS_PER_SECOND = 20,
+	PACKETS_BURSTABLE = 5,
+	PACKET_COST = NSEC_PER_SEC / PACKETS_PER_SECOND,
+	TOKEN_MAX = PACKET_COST * PACKETS_BURSTABLE
+};
+
+static void entry_free(struct rcu_head *rcu)
+{
+	kmem_cache_free(entry_cache,
+			container_of(rcu, struct ratelimiter_entry, rcu));
+	atomic_dec(&total_entries);
+}
+
+static void entry_uninit(struct ratelimiter_entry *entry)
+{
+	hlist_del_rcu(&entry->hash);
+	call_rcu(&entry->rcu, entry_free);
+}
+
+/* Calling this function with a NULL work uninits all entries. */
+static void gc_entries(struct work_struct *work)
+{
+	const u64 now = ktime_get_boot_fast_ns();
+	struct ratelimiter_entry *entry;
+	struct hlist_node *temp;
+	unsigned int i;
+
+	for (i = 0; i < table_size; ++i) {
+		spin_lock(&table_lock);
+		hlist_for_each_entry_safe (entry, temp, &table_v4[i], hash) {
+			if (unlikely(!work) ||
+			    now - entry->last_time_ns > NSEC_PER_SEC)
+				entry_uninit(entry);
+		}
+#if IS_ENABLED(CONFIG_IPV6)
+		hlist_for_each_entry_safe (entry, temp, &table_v6[i], hash) {
+			if (unlikely(!work) ||
+			    now - entry->last_time_ns > NSEC_PER_SEC)
+				entry_uninit(entry);
+		}
+#endif
+		spin_unlock(&table_lock);
+		if (likely(work))
+			cond_resched();
+	}
+	if (likely(work))
+		queue_delayed_work(system_power_efficient_wq, &gc_work, HZ);
+}
+
+bool ratelimiter_allow(struct sk_buff *skb, struct net *net)
+{
+	struct { __be64 ip; u32 net; } data = {
+		.net = (unsigned long)net & 0xffffffff };
+	struct ratelimiter_entry *entry;
+	struct hlist_head *bucket;
+
+	if (skb->protocol == htons(ETH_P_IP)) {
+		data.ip = (__force __be64)ip_hdr(skb)->saddr;
+		bucket = &table_v4[hsiphash(&data, sizeof(u32) * 3, &key) &
+				   (table_size - 1)];
+	}
+#if IS_ENABLED(CONFIG_IPV6)
+	else if (skb->protocol == htons(ETH_P_IPV6)) {
+		memcpy(&data.ip, &ipv6_hdr(skb)->saddr,
+		       sizeof(__be64)); /* Only 64 bits */
+		bucket = &table_v6[hsiphash(&data, sizeof(u32) * 3, &key) &
+				   (table_size - 1)];
+	}
+#endif
+	else
+		return false;
+	rcu_read_lock();
+	hlist_for_each_entry_rcu (entry, bucket, hash) {
+		if (entry->net == net && entry->ip == data.ip) {
+			u64 now, tokens;
+			bool ret;
+			/* Quasi-inspired by nft_limit.c, but this is actually a
+			 * slightly different algorithm. Namely, we incorporate
+			 * the burst as part of the maximum tokens, rather than
+			 * as part of the rate.
+			 */
+			spin_lock(&entry->lock);
+			now = ktime_get_boot_fast_ns();
+			tokens = min_t(u64, TOKEN_MAX,
+				       entry->tokens + now -
+					       entry->last_time_ns);
+			entry->last_time_ns = now;
+			ret = tokens >= PACKET_COST;
+			entry->tokens = ret ? tokens - PACKET_COST : tokens;
+			spin_unlock(&entry->lock);
+			rcu_read_unlock();
+			return ret;
+		}
+	}
+	rcu_read_unlock();
+
+	if (atomic_inc_return(&total_entries) > max_entries)
+		goto err_oom;
+
+	entry = kmem_cache_alloc(entry_cache, GFP_KERNEL);
+	if (unlikely(!entry))
+		goto err_oom;
+
+	entry->net = net;
+	entry->ip = data.ip;
+	INIT_HLIST_NODE(&entry->hash);
+	spin_lock_init(&entry->lock);
+	entry->last_time_ns = ktime_get_boot_fast_ns();
+	entry->tokens = TOKEN_MAX - PACKET_COST;
+	spin_lock(&table_lock);
+	hlist_add_head_rcu(&entry->hash, bucket);
+	spin_unlock(&table_lock);
+	return true;
+
+err_oom:
+	atomic_dec(&total_entries);
+	return false;
+}
+
+int ratelimiter_init(void)
+{
+	mutex_lock(&init_lock);
+	if (atomic64_inc_return(&refcnt) != 1)
+		goto out;
+
+	entry_cache = KMEM_CACHE(ratelimiter_entry, 0);
+	if (!entry_cache)
+		goto err;
+
+	/* xt_hashlimit.c uses a slightly different algorithm for ratelimiting,
+	 * but what it shares in common is that it uses a massive hashtable. So,
+	 * we borrow their wisdom about good table sizes on different systems
+	 * dependent on RAM. This calculation here comes from there.
+	 */
+	table_size = (totalram_pages > (1U << 30) / PAGE_SIZE) ? 8192 :
+		max_t(unsigned long, 16, roundup_pow_of_two(
+			(totalram_pages << PAGE_SHIFT) /
+			(1U << 14) / sizeof(struct hlist_head)));
+	max_entries = table_size * 8;
+
+	table_v4 = kvzalloc(table_size * sizeof(*table_v4), GFP_KERNEL);
+	if (unlikely(!table_v4))
+		goto err_kmemcache;
+
+#if IS_ENABLED(CONFIG_IPV6)
+	table_v6 = kvzalloc(table_size * sizeof(*table_v6), GFP_KERNEL);
+	if (unlikely(!table_v6)) {
+		kvfree(table_v4);
+		goto err_kmemcache;
+	}
+#endif
+
+	queue_delayed_work(system_power_efficient_wq, &gc_work, HZ);
+	get_random_bytes(&key, sizeof(key));
+out:
+	mutex_unlock(&init_lock);
+	return 0;
+
+err_kmemcache:
+	kmem_cache_destroy(entry_cache);
+err:
+	atomic64_dec(&refcnt);
+	mutex_unlock(&init_lock);
+	return -ENOMEM;
+}
+
+void ratelimiter_uninit(void)
+{
+	mutex_lock(&init_lock);
+	if (atomic64_dec_if_positive(&refcnt))
+		goto out;
+
+	cancel_delayed_work_sync(&gc_work);
+	gc_entries(NULL);
+	rcu_barrier();
+	kvfree(table_v4);
+#if IS_ENABLED(CONFIG_IPV6)
+	kvfree(table_v6);
+#endif
+	kmem_cache_destroy(entry_cache);
+out:
+	mutex_unlock(&init_lock);
+}
+
+#include "selftest/ratelimiter.h"
diff --git a/drivers/net/wireguard/ratelimiter.h b/drivers/net/wireguard/ratelimiter.h
new file mode 100644
index 000000000000..8931c0615374
--- /dev/null
+++ b/drivers/net/wireguard/ratelimiter.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_RATELIMITER_H
+#define _WG_RATELIMITER_H
+
+#include <linux/skbuff.h>
+
+int ratelimiter_init(void);
+void ratelimiter_uninit(void);
+bool ratelimiter_allow(struct sk_buff *skb, struct net *net);
+
+#ifdef DEBUG
+bool ratelimiter_selftest(void);
+#endif
+
+#endif /* _WG_RATELIMITER_H */
diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c
new file mode 100644
index 000000000000..e5ce21703512
--- /dev/null
+++ b/drivers/net/wireguard/receive.c
@@ -0,0 +1,597 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "queueing.h"
+#include "device.h"
+#include "peer.h"
+#include "timers.h"
+#include "messages.h"
+#include "cookie.h"
+#include "socket.h"
+
+#include <linux/simd.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/udp.h>
+#include <net/ip_tunnels.h>
+
+/* Must be called with bh disabled. */
+static inline void rx_stats(struct wireguard_peer *peer, size_t len)
+{
+	struct pcpu_sw_netstats *tstats =
+		get_cpu_ptr(peer->device->dev->tstats);
+
+	u64_stats_update_begin(&tstats->syncp);
+	++tstats->rx_packets;
+	tstats->rx_bytes += len;
+	peer->rx_bytes += len;
+	u64_stats_update_end(&tstats->syncp);
+	put_cpu_ptr(tstats);
+}
+
+#define SKB_TYPE_LE32(skb) (((struct message_header *)(skb)->data)->type)
+
+static inline size_t validate_header_len(struct sk_buff *skb)
+{
+	if (unlikely(skb->len < sizeof(struct message_header)))
+		return 0;
+	if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_DATA) &&
+	    skb->len >= MESSAGE_MINIMUM_LENGTH)
+		return sizeof(struct message_data);
+	if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION) &&
+	    skb->len == sizeof(struct message_handshake_initiation))
+		return sizeof(struct message_handshake_initiation);
+	if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE) &&
+	    skb->len == sizeof(struct message_handshake_response))
+		return sizeof(struct message_handshake_response);
+	if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE) &&
+	    skb->len == sizeof(struct message_handshake_cookie))
+		return sizeof(struct message_handshake_cookie);
+	return 0;
+}
+
+static inline int skb_prepare_header(struct sk_buff *skb,
+				     struct wireguard_device *wg)
+{
+	size_t data_offset, data_len, header_len;
+	struct udphdr *udp;
+
+	if (unlikely(skb_examine_untrusted_ip_hdr(skb) != skb->protocol ||
+		     skb_transport_header(skb) < skb->head ||
+		     (skb_transport_header(skb) + sizeof(struct udphdr)) >
+			     skb_tail_pointer(skb)))
+		return -EINVAL; /* Bogus IP header */
+	udp = udp_hdr(skb);
+	data_offset = (u8 *)udp - skb->data;
+	if (unlikely(data_offset > U16_MAX ||
+		     data_offset + sizeof(struct udphdr) > skb->len))
+		/* Packet has offset at impossible location or isn't big enough
+		 * to have UDP fields.
+		 */
+		return -EINVAL;
+	data_len = ntohs(udp->len);
+	if (unlikely(data_len < sizeof(struct udphdr) ||
+		     data_len > skb->len - data_offset))
+		/* UDP packet is reporting too small of a size or lying about
+		 * its size.
+		 */
+		return -EINVAL;
+	data_len -= sizeof(struct udphdr);
+	data_offset = (u8 *)udp + sizeof(struct udphdr) - skb->data;
+	if (unlikely(!pskb_may_pull(skb,
+				data_offset + sizeof(struct message_header)) ||
+		     pskb_trim(skb, data_len + data_offset) < 0))
+		return -EINVAL;
+	skb_pull(skb, data_offset);
+	if (unlikely(skb->len != data_len))
+		 /* Final len does not agree with calculated len */
+		return -EINVAL;
+	header_len = validate_header_len(skb);
+	if (unlikely(!header_len))
+		return -EINVAL;
+	__skb_push(skb, data_offset);
+	if (unlikely(!pskb_may_pull(skb, data_offset + header_len)))
+		return -EINVAL;
+	__skb_pull(skb, data_offset);
+	return 0;
+}
+
+static void receive_handshake_packet(struct wireguard_device *wg,
+				     struct sk_buff *skb)
+{
+	struct wireguard_peer *peer = NULL;
+	enum cookie_mac_state mac_state;
+	/* This is global, so that our load calculation applies to
+	 * the whole system.
+	 */
+	static u64 last_under_load;
+	bool packet_needs_cookie;
+	bool under_load;
+
+	if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE)) {
+		net_dbg_skb_ratelimited("%s: Receiving cookie response from %pISpfsc\n",
+					wg->dev->name, skb);
+		cookie_message_consume(
+			(struct message_handshake_cookie *)skb->data, wg);
+		return;
+	}
+
+	under_load = skb_queue_len(&wg->incoming_handshakes) >=
+		     MAX_QUEUED_INCOMING_HANDSHAKES / 8;
+	if (under_load)
+		last_under_load = ktime_get_boot_fast_ns();
+	else if (last_under_load)
+		under_load = !has_expired(last_under_load, 1);
+	mac_state = cookie_validate_packet(&wg->cookie_checker, skb,
+					   under_load);
+	if ((under_load && mac_state == VALID_MAC_WITH_COOKIE) ||
+	    (!under_load && mac_state == VALID_MAC_BUT_NO_COOKIE))
+		packet_needs_cookie = false;
+	else if (under_load && mac_state == VALID_MAC_BUT_NO_COOKIE)
+		packet_needs_cookie = true;
+	else {
+		net_dbg_skb_ratelimited("%s: Invalid MAC of handshake, dropping packet from %pISpfsc\n",
+					wg->dev->name, skb);
+		return;
+	}
+
+	switch (SKB_TYPE_LE32(skb)) {
+	case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION): {
+		struct message_handshake_initiation *message =
+			(struct message_handshake_initiation *)skb->data;
+
+		if (packet_needs_cookie) {
+			packet_send_handshake_cookie(wg, skb,
+						     message->sender_index);
+			return;
+		}
+		peer = noise_handshake_consume_initiation(message, wg);
+		if (unlikely(!peer)) {
+			net_dbg_skb_ratelimited("%s: Invalid handshake initiation from %pISpfsc\n",
+						wg->dev->name, skb);
+			return;
+		}
+		socket_set_peer_endpoint_from_skb(peer, skb);
+		net_dbg_ratelimited("%s: Receiving handshake initiation from peer %llu (%pISpfsc)\n",
+				    wg->dev->name, peer->internal_id,
+				    &peer->endpoint.addr);
+		packet_send_handshake_response(peer);
+		break;
+	}
+	case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE): {
+		struct message_handshake_response *message =
+			(struct message_handshake_response *)skb->data;
+
+		if (packet_needs_cookie) {
+			packet_send_handshake_cookie(wg, skb,
+						     message->sender_index);
+			return;
+		}
+		peer = noise_handshake_consume_response(message, wg);
+		if (unlikely(!peer)) {
+			net_dbg_skb_ratelimited("%s: Invalid handshake response from %pISpfsc\n",
+						wg->dev->name, skb);
+			return;
+		}
+		socket_set_peer_endpoint_from_skb(peer, skb);
+		net_dbg_ratelimited("%s: Receiving handshake response from peer %llu (%pISpfsc)\n",
+				    wg->dev->name, peer->internal_id,
+				    &peer->endpoint.addr);
+		if (noise_handshake_begin_session(&peer->handshake,
+						  &peer->keypairs)) {
+			timers_session_derived(peer);
+			timers_handshake_complete(peer);
+			/* Calling this function will either send any existing
+			 * packets in the queue and not send a keepalive, which
+			 * is the best case, Or, if there's nothing in the
+			 * queue, it will send a keepalive, in order to give
+			 * immediate confirmation of the session.
+			 */
+			packet_send_keepalive(peer);
+		}
+		break;
+	}
+	}
+
+	if (unlikely(!peer)) {
+		WARN(1, "Somehow a wrong type of packet wound up in the handshake queue!\n");
+		return;
+	}
+
+	local_bh_disable();
+	rx_stats(peer, skb->len);
+	local_bh_enable();
+
+	timers_any_authenticated_packet_received(peer);
+	timers_any_authenticated_packet_traversal(peer);
+	peer_put(peer);
+}
+
+void packet_handshake_receive_worker(struct work_struct *work)
+{
+	struct wireguard_device *wg =
+		container_of(work, struct multicore_worker, work)->ptr;
+	struct sk_buff *skb;
+
+	while ((skb = skb_dequeue(&wg->incoming_handshakes)) != NULL) {
+		receive_handshake_packet(wg, skb);
+		dev_kfree_skb(skb);
+		cond_resched();
+	}
+}
+
+static inline void keep_key_fresh(struct wireguard_peer *peer)
+{
+	struct noise_keypair *keypair;
+	bool send = false;
+
+	if (peer->sent_lastminute_handshake)
+		return;
+
+	rcu_read_lock_bh();
+	keypair = rcu_dereference_bh(peer->keypairs.current_keypair);
+	if (likely(keypair && keypair->sending.is_valid) &&
+	    keypair->i_am_the_initiator &&
+	    unlikely(has_expired(keypair->sending.birthdate,
+			REJECT_AFTER_TIME - KEEPALIVE_TIMEOUT - REKEY_TIMEOUT)))
+		send = true;
+	rcu_read_unlock_bh();
+
+	if (send) {
+		peer->sent_lastminute_handshake = true;
+		packet_send_queued_handshake_initiation(peer, false);
+	}
+}
+
+static inline bool skb_decrypt(struct sk_buff *skb,
+			       struct noise_symmetric_key *key,
+			       simd_context_t simd_context)
+{
+	struct scatterlist sg[MAX_SKB_FRAGS * 2 + 1];
+	struct sk_buff *trailer;
+	unsigned int offset;
+	int num_frags;
+
+	if (unlikely(!key))
+		return false;
+
+	if (unlikely(!key->is_valid ||
+		     has_expired(key->birthdate, REJECT_AFTER_TIME) ||
+		     key->counter.receive.counter >= REJECT_AFTER_MESSAGES)) {
+		key->is_valid = false;
+		return false;
+	}
+
+	PACKET_CB(skb)->nonce =
+		le64_to_cpu(((struct message_data *)skb->data)->counter);
+
+	/* We ensure that the network header is part of the packet before we
+	 * call skb_cow_data, so that there's no chance that data is removed
+	 * from the skb, so that later we can extract the original endpoint.
+	 */
+	offset = skb->data - skb_network_header(skb);
+	skb_push(skb, offset);
+	num_frags = skb_cow_data(skb, 0, &trailer);
+	offset += sizeof(struct message_data);
+	skb_pull(skb, offset);
+	if (unlikely(num_frags < 0 || num_frags > ARRAY_SIZE(sg)))
+		return false;
+
+	sg_init_table(sg, num_frags);
+	if (skb_to_sgvec(skb, sg, 0, skb->len) <= 0)
+		return false;
+
+	if (!chacha20poly1305_decrypt_sg(sg, sg, skb->len, NULL, 0,
+					 PACKET_CB(skb)->nonce, key->key,
+					 simd_context))
+		return false;
+
+	/* Another ugly situation of pushing and pulling the header so as to
+	 * keep endpoint information intact.
+	 */
+	skb_push(skb, offset);
+	if (pskb_trim(skb, skb->len - noise_encrypted_len(0)))
+		return false;
+	skb_pull(skb, offset);
+
+	return true;
+}
+
+/* This is RFC6479, a replay detection bitmap algorithm that avoids bitshifts */
+static inline bool counter_validate(union noise_counter *counter,
+				    u64 their_counter)
+{
+	unsigned long index, index_current, top, i;
+	bool ret = false;
+
+	spin_lock_bh(&counter->receive.lock);
+
+	if (unlikely(counter->receive.counter >= REJECT_AFTER_MESSAGES + 1 ||
+		     their_counter >= REJECT_AFTER_MESSAGES))
+		goto out;
+
+	++their_counter;
+
+	if (unlikely((COUNTER_WINDOW_SIZE + their_counter) <
+		     counter->receive.counter))
+		goto out;
+
+	index = their_counter >> ilog2(BITS_PER_LONG);
+
+	if (likely(their_counter > counter->receive.counter)) {
+		index_current = counter->receive.counter >> ilog2(BITS_PER_LONG);
+		top = min_t(unsigned long, index - index_current,
+			    COUNTER_BITS_TOTAL / BITS_PER_LONG);
+		for (i = 1; i <= top; ++i)
+			counter->receive.backtrack[(i + index_current) &
+				((COUNTER_BITS_TOTAL / BITS_PER_LONG) - 1)] = 0;
+		counter->receive.counter = their_counter;
+	}
+
+	index &= (COUNTER_BITS_TOTAL / BITS_PER_LONG) - 1;
+	ret = !test_and_set_bit(their_counter & (BITS_PER_LONG - 1),
+				&counter->receive.backtrack[index]);
+
+out:
+	spin_unlock_bh(&counter->receive.lock);
+	return ret;
+}
+#include "selftest/counter.h"
+
+static void packet_consume_data_done(struct wireguard_peer *peer,
+				     struct sk_buff *skb,
+				     struct endpoint *endpoint)
+{
+	struct net_device *dev = peer->device->dev;
+	struct wireguard_peer *routed_peer;
+	unsigned int len, len_before_trim;
+
+	socket_set_peer_endpoint(peer, endpoint);
+
+	if (unlikely(noise_received_with_keypair(&peer->keypairs,
+						 PACKET_CB(skb)->keypair))) {
+		timers_handshake_complete(peer);
+		packet_send_staged_packets(peer);
+	}
+
+	keep_key_fresh(peer);
+
+	timers_any_authenticated_packet_received(peer);
+	timers_any_authenticated_packet_traversal(peer);
+
+	/* A packet with length 0 is a keepalive packet */
+	if (unlikely(!skb->len)) {
+		rx_stats(peer, message_data_len(0));
+		net_dbg_ratelimited("%s: Receiving keepalive packet from peer %llu (%pISpfsc)\n",
+				    dev->name, peer->internal_id,
+				    &peer->endpoint.addr);
+		goto packet_processed;
+	}
+
+	timers_data_received(peer);
+
+	if (unlikely(skb_network_header(skb) < skb->head))
+		goto dishonest_packet_size;
+	if (unlikely(!(pskb_network_may_pull(skb, sizeof(struct iphdr)) &&
+		       (ip_hdr(skb)->version == 4 ||
+			(ip_hdr(skb)->version == 6 &&
+			 pskb_network_may_pull(skb, sizeof(struct ipv6hdr)))))))
+		goto dishonest_packet_type;
+
+	skb->dev = dev;
+	skb->ip_summed = CHECKSUM_UNNECESSARY;
+	skb->protocol = skb_examine_untrusted_ip_hdr(skb);
+	if (skb->protocol == htons(ETH_P_IP)) {
+		len = ntohs(ip_hdr(skb)->tot_len);
+		if (unlikely(len < sizeof(struct iphdr)))
+			goto dishonest_packet_size;
+		if (INET_ECN_is_ce(PACKET_CB(skb)->ds))
+			IP_ECN_set_ce(ip_hdr(skb));
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		len = ntohs(ipv6_hdr(skb)->payload_len) +
+		      sizeof(struct ipv6hdr);
+		if (INET_ECN_is_ce(PACKET_CB(skb)->ds))
+			IP6_ECN_set_ce(skb, ipv6_hdr(skb));
+	} else
+		goto dishonest_packet_type;
+
+	if (unlikely(len > skb->len))
+		goto dishonest_packet_size;
+	len_before_trim = skb->len;
+	if (unlikely(pskb_trim(skb, len)))
+		goto packet_processed;
+
+	routed_peer = allowedips_lookup_src(&peer->device->peer_allowedips, skb);
+	peer_put(routed_peer); /* We don't need the extra reference. */
+
+	if (unlikely(routed_peer != peer))
+		goto dishonest_packet_peer;
+
+	if (unlikely(napi_gro_receive(&peer->napi, skb) == GRO_DROP)) {
+		++dev->stats.rx_dropped;
+		net_dbg_ratelimited("%s: Failed to give packet to userspace from peer %llu (%pISpfsc)\n",
+				    dev->name, peer->internal_id,
+				    &peer->endpoint.addr);
+	} else
+		rx_stats(peer, message_data_len(len_before_trim));
+	return;
+
+dishonest_packet_peer:
+	net_dbg_skb_ratelimited("%s: Packet has unallowed src IP (%pISc) from peer %llu (%pISpfsc)\n",
+				dev->name, skb, peer->internal_id,
+				&peer->endpoint.addr);
+	++dev->stats.rx_errors;
+	++dev->stats.rx_frame_errors;
+	goto packet_processed;
+dishonest_packet_type:
+	net_dbg_ratelimited("%s: Packet is neither ipv4 nor ipv6 from peer %llu (%pISpfsc)\n",
+			    dev->name, peer->internal_id, &peer->endpoint.addr);
+	++dev->stats.rx_errors;
+	++dev->stats.rx_frame_errors;
+	goto packet_processed;
+dishonest_packet_size:
+	net_dbg_ratelimited("%s: Packet has incorrect size from peer %llu (%pISpfsc)\n",
+			    dev->name, peer->internal_id, &peer->endpoint.addr);
+	++dev->stats.rx_errors;
+	++dev->stats.rx_length_errors;
+	goto packet_processed;
+packet_processed:
+	dev_kfree_skb(skb);
+}
+
+int packet_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct wireguard_peer *peer =
+		container_of(napi, struct wireguard_peer, napi);
+	struct crypt_queue *queue = &peer->rx_queue;
+	struct noise_keypair *keypair;
+	struct endpoint endpoint;
+	enum packet_state state;
+	struct sk_buff *skb;
+	int work_done = 0;
+	bool free;
+
+	if (unlikely(budget <= 0))
+		return 0;
+
+	while ((skb = __ptr_ring_peek(&queue->ring)) != NULL &&
+	       (state = atomic_read_acquire(&PACKET_CB(skb)->state)) !=
+		       PACKET_STATE_UNCRYPTED) {
+		__ptr_ring_discard_one(&queue->ring);
+		peer = PACKET_PEER(skb);
+		keypair = PACKET_CB(skb)->keypair;
+		free = true;
+
+		if (unlikely(state != PACKET_STATE_CRYPTED))
+			goto next;
+
+		if (unlikely(!counter_validate(&keypair->receiving.counter,
+					       PACKET_CB(skb)->nonce))) {
+			net_dbg_ratelimited("%s: Packet has invalid nonce %llu (max %llu)\n",
+					    peer->device->dev->name,
+					    PACKET_CB(skb)->nonce,
+					    keypair->receiving.counter.receive.counter);
+			goto next;
+		}
+
+		if (unlikely(socket_endpoint_from_skb(&endpoint, skb)))
+			goto next;
+
+		skb_reset(skb);
+		packet_consume_data_done(peer, skb, &endpoint);
+		free = false;
+
+	next:
+		noise_keypair_put(keypair, false);
+		peer_put(peer);
+		if (unlikely(free))
+			dev_kfree_skb(skb);
+
+		if (++work_done >= budget)
+			break;
+	}
+
+	if (work_done < budget)
+		napi_complete_done(napi, work_done);
+
+	return work_done;
+}
+
+void packet_decrypt_worker(struct work_struct *work)
+{
+	struct crypt_queue *queue =
+		container_of(work, struct multicore_worker, work)->ptr;
+	simd_context_t simd_context = simd_get();
+	struct sk_buff *skb;
+
+	while ((skb = ptr_ring_consume_bh(&queue->ring)) != NULL) {
+		enum packet_state state = likely(skb_decrypt(skb,
+					   &PACKET_CB(skb)->keypair->receiving,
+					   simd_context)) ?
+				PACKET_STATE_CRYPTED : PACKET_STATE_DEAD;
+		queue_enqueue_per_peer_napi(&PACKET_PEER(skb)->rx_queue, skb,
+					    state);
+		simd_context = simd_relax(simd_context);
+	}
+
+	simd_put(simd_context);
+}
+
+static void packet_consume_data(struct wireguard_device *wg,
+				struct sk_buff *skb)
+{
+	__le32 idx = ((struct message_data *)skb->data)->key_idx;
+	struct wireguard_peer *peer = NULL;
+	int ret;
+
+	rcu_read_lock_bh();
+	PACKET_CB(skb)->keypair =
+		(struct noise_keypair *)index_hashtable_lookup(
+			&wg->index_hashtable, INDEX_HASHTABLE_KEYPAIR, idx,
+			&peer);
+	if (unlikely(!noise_keypair_get(PACKET_CB(skb)->keypair)))
+		goto err_keypair;
+
+	if (unlikely(peer->is_dead))
+		goto err;
+
+	ret = queue_enqueue_per_device_and_peer(&wg->decrypt_queue,
+						&peer->rx_queue, skb,
+						wg->packet_crypt_wq,
+						&wg->decrypt_queue.last_cpu);
+	if (unlikely(ret == -EPIPE))
+		queue_enqueue_per_peer(&peer->rx_queue, skb, PACKET_STATE_DEAD);
+	if (likely(!ret || ret == -EPIPE)) {
+		rcu_read_unlock_bh();
+		return;
+	}
+err:
+	noise_keypair_put(PACKET_CB(skb)->keypair, false);
+err_keypair:
+	rcu_read_unlock_bh();
+	peer_put(peer);
+	dev_kfree_skb(skb);
+}
+
+void packet_receive(struct wireguard_device *wg, struct sk_buff *skb)
+{
+	if (unlikely(skb_prepare_header(skb, wg) < 0))
+		goto err;
+	switch (SKB_TYPE_LE32(skb)) {
+	case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION):
+	case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE):
+	case cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE): {
+		int cpu;
+
+		if (skb_queue_len(&wg->incoming_handshakes) >
+			    MAX_QUEUED_INCOMING_HANDSHAKES ||
+		    unlikely(!rng_is_initialized())) {
+			net_dbg_skb_ratelimited("%s: Dropping handshake packet from %pISpfsc\n",
+						wg->dev->name, skb);
+			goto err;
+		}
+		skb_queue_tail(&wg->incoming_handshakes, skb);
+		/* Queues up a call to packet_process_queued_handshake_
+		 * packets(skb):
+		 */
+		cpu = cpumask_next_online(&wg->incoming_handshake_cpu);
+		queue_work_on(cpu, wg->handshake_receive_wq,
+			&per_cpu_ptr(wg->incoming_handshakes_worker, cpu)->work);
+		break;
+	}
+	case cpu_to_le32(MESSAGE_DATA):
+		PACKET_CB(skb)->ds = ip_tunnel_get_dsfield(ip_hdr(skb), skb);
+		packet_consume_data(wg, skb);
+		break;
+	default:
+		net_dbg_skb_ratelimited("%s: Invalid packet from %pISpfsc\n",
+					wg->dev->name, skb);
+		goto err;
+	}
+	return;
+
+err:
+	dev_kfree_skb(skb);
+}
diff --git a/drivers/net/wireguard/selftest/allowedips.h b/drivers/net/wireguard/selftest/allowedips.h
new file mode 100644
index 000000000000..83cfb34ecb46
--- /dev/null
+++ b/drivers/net/wireguard/selftest/allowedips.h
@@ -0,0 +1,656 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+
+#ifdef DEBUG_PRINT_TRIE_GRAPHVIZ
+#include <linux/siphash.h>
+
+static __init void swap_endian_and_apply_cidr(u8 *dst, const u8 *src, u8 bits,
+					      u8 cidr)
+{
+	swap_endian(dst, src, bits);
+	memset(dst + (cidr + 7) / 8, 0, bits / 8 - (cidr + 7) / 8);
+	if (cidr)
+		dst[(cidr + 7) / 8 - 1] &= ~0U << ((8 - (cidr % 8)) % 8);
+}
+
+static __init void print_node(struct allowedips_node *node, u8 bits)
+{
+	char *fmt_connection = KERN_DEBUG "\t\"%p/%d\" -> \"%p/%d\";\n";
+	char *fmt_declaration = KERN_DEBUG
+		"\t\"%p/%d\"[style=%s, color=\"#%06x\"];\n";
+	char *style = "dotted";
+	u8 ip1[16], ip2[16];
+	u32 color = 0;
+
+	if (bits == 32) {
+		fmt_connection = KERN_DEBUG "\t\"%pI4/%d\" -> \"%pI4/%d\";\n";
+		fmt_declaration = KERN_DEBUG
+			"\t\"%pI4/%d\"[style=%s, color=\"#%06x\"];\n";
+	} else if (bits == 128) {
+		fmt_connection = KERN_DEBUG "\t\"%pI6/%d\" -> \"%pI6/%d\";\n";
+		fmt_declaration = KERN_DEBUG
+			"\t\"%pI6/%d\"[style=%s, color=\"#%06x\"];\n";
+	}
+	if (node->peer) {
+		hsiphash_key_t key = { 0 };
+		memcpy(&key, &node->peer, sizeof(node->peer));
+		color = hsiphash_1u32(0xdeadbeef, &key) % 200 << 16 |
+			hsiphash_1u32(0xbabecafe, &key) % 200 << 8 |
+			hsiphash_1u32(0xabad1dea, &key) % 200;
+		style = "bold";
+	}
+	swap_endian_and_apply_cidr(ip1, node->bits, bits, node->cidr);
+	printk(fmt_declaration, ip1, node->cidr, style, color);
+	if (node->bit[0]) {
+		swap_endian_and_apply_cidr(ip2, node->bit[0]->bits, bits,
+					   node->cidr);
+		printk(fmt_connection, ip1, node->cidr, ip2,
+		       node->bit[0]->cidr);
+		print_node(node->bit[0], bits);
+	}
+	if (node->bit[1]) {
+		swap_endian_and_apply_cidr(ip2, node->bit[1]->bits, bits,
+					   node->cidr);
+		printk(fmt_connection, ip1, node->cidr, ip2,
+		       node->bit[1]->cidr);
+		print_node(node->bit[1], bits);
+	}
+}
+static __init void print_tree(struct allowedips_node *top, u8 bits)
+{
+	printk(KERN_DEBUG "digraph trie {\n");
+	print_node(top, bits);
+	printk(KERN_DEBUG "}\n");
+}
+#endif
+
+#ifdef DEBUG_RANDOM_TRIE
+#define NUM_PEERS 2000
+#define NUM_RAND_ROUTES 400
+#define NUM_MUTATED_ROUTES 100
+#define NUM_QUERIES (NUM_RAND_ROUTES * NUM_MUTATED_ROUTES * 30)
+#include <linux/random.h>
+struct horrible_allowedips {
+	struct hlist_head head;
+};
+struct horrible_allowedips_node {
+	struct hlist_node table;
+	union nf_inet_addr ip;
+	union nf_inet_addr mask;
+	uint8_t ip_version;
+	void *value;
+};
+static __init void horrible_allowedips_init(struct horrible_allowedips *table)
+{
+	INIT_HLIST_HEAD(&table->head);
+}
+static __init void horrible_allowedips_free(struct horrible_allowedips *table)
+{
+	struct horrible_allowedips_node *node;
+	struct hlist_node *h;
+
+	hlist_for_each_entry_safe (node, h, &table->head, table) {
+		hlist_del(&node->table);
+		kfree(node);
+	}
+}
+static __init inline union nf_inet_addr horrible_cidr_to_mask(uint8_t cidr)
+{
+	union nf_inet_addr mask;
+
+	memset(&mask, 0x00, 128 / 8);
+	memset(&mask, 0xff, cidr / 8);
+	if (cidr % 32)
+		mask.all[cidr / 32] = htonl(
+			(0xFFFFFFFFUL << (32 - (cidr % 32))) & 0xFFFFFFFFUL);
+	return mask;
+}
+static __init inline uint8_t horrible_mask_to_cidr(union nf_inet_addr subnet)
+{
+	return hweight32(subnet.all[0]) + hweight32(subnet.all[1]) +
+	       hweight32(subnet.all[2]) + hweight32(subnet.all[3]);
+}
+static __init inline void
+horrible_mask_self(struct horrible_allowedips_node *node)
+{
+	if (node->ip_version == 4)
+		node->ip.ip &= node->mask.ip;
+	else if (node->ip_version == 6) {
+		node->ip.ip6[0] &= node->mask.ip6[0];
+		node->ip.ip6[1] &= node->mask.ip6[1];
+		node->ip.ip6[2] &= node->mask.ip6[2];
+		node->ip.ip6[3] &= node->mask.ip6[3];
+	}
+}
+static __init inline bool
+horrible_match_v4(const struct horrible_allowedips_node *node,
+		  struct in_addr *ip)
+{
+	return (ip->s_addr & node->mask.ip) == node->ip.ip;
+}
+static __init inline bool
+horrible_match_v6(const struct horrible_allowedips_node *node,
+		  struct in6_addr *ip)
+{
+	return (ip->in6_u.u6_addr32[0] & node->mask.ip6[0]) ==
+		       node->ip.ip6[0] &&
+	       (ip->in6_u.u6_addr32[1] & node->mask.ip6[1]) ==
+		       node->ip.ip6[1] &&
+	       (ip->in6_u.u6_addr32[2] & node->mask.ip6[2]) ==
+		       node->ip.ip6[2] &&
+	       (ip->in6_u.u6_addr32[3] & node->mask.ip6[3]) == node->ip.ip6[3];
+}
+static __init void
+horrible_insert_ordered(struct horrible_allowedips *table,
+			struct horrible_allowedips_node *node)
+{
+	struct horrible_allowedips_node *other = NULL, *where = NULL;
+	uint8_t my_cidr = horrible_mask_to_cidr(node->mask);
+
+	hlist_for_each_entry (other, &table->head, table) {
+		if (!memcmp(&other->mask, &node->mask,
+			    sizeof(union nf_inet_addr)) &&
+		    !memcmp(&other->ip, &node->ip,
+			    sizeof(union nf_inet_addr)) &&
+		    other->ip_version == node->ip_version) {
+			other->value = node->value;
+			kfree(node);
+			return;
+		}
+		where = other;
+		if (horrible_mask_to_cidr(other->mask) <= my_cidr)
+			break;
+	}
+	if (!other && !where)
+		hlist_add_head(&node->table, &table->head);
+	else if (!other)
+		hlist_add_behind(&node->table, &where->table);
+	else
+		hlist_add_before(&node->table, &where->table);
+}
+static __init int
+horrible_allowedips_insert_v4(struct horrible_allowedips *table,
+			      struct in_addr *ip, uint8_t cidr, void *value)
+{
+	struct horrible_allowedips_node *node = kzalloc(sizeof(*node), GFP_KERNEL);
+
+	if (unlikely(!node))
+		return -ENOMEM;
+	node->ip.in = *ip;
+	node->mask = horrible_cidr_to_mask(cidr);
+	node->ip_version = 4;
+	node->value = value;
+	horrible_mask_self(node);
+	horrible_insert_ordered(table, node);
+	return 0;
+}
+static __init int
+horrible_allowedips_insert_v6(struct horrible_allowedips *table,
+			      struct in6_addr *ip, uint8_t cidr, void *value)
+{
+	struct horrible_allowedips_node *node = kzalloc(sizeof(*node), GFP_KERNEL);
+
+	if (unlikely(!node))
+		return -ENOMEM;
+	node->ip.in6 = *ip;
+	node->mask = horrible_cidr_to_mask(cidr);
+	node->ip_version = 6;
+	node->value = value;
+	horrible_mask_self(node);
+	horrible_insert_ordered(table, node);
+	return 0;
+}
+static __init void *
+horrible_allowedips_lookup_v4(struct horrible_allowedips *table,
+			      struct in_addr *ip)
+{
+	struct horrible_allowedips_node *node;
+	void *ret = NULL;
+
+	hlist_for_each_entry (node, &table->head, table) {
+		if (node->ip_version != 4)
+			continue;
+		if (horrible_match_v4(node, ip)) {
+			ret = node->value;
+			break;
+		}
+	}
+	return ret;
+}
+static __init void *
+horrible_allowedips_lookup_v6(struct horrible_allowedips *table,
+			      struct in6_addr *ip)
+{
+	struct horrible_allowedips_node *node;
+	void *ret = NULL;
+
+	hlist_for_each_entry (node, &table->head, table) {
+		if (node->ip_version != 6)
+			continue;
+		if (horrible_match_v6(node, ip)) {
+			ret = node->value;
+			break;
+		}
+	}
+	return ret;
+}
+
+static __init bool randomized_test(void)
+{
+	unsigned int i, j, k, mutate_amount, cidr;
+	u8 ip[16], mutate_mask[16], mutated[16];
+	struct wireguard_peer **peers, *peer;
+	struct horrible_allowedips h;
+	DEFINE_MUTEX(mutex);
+	struct allowedips t;
+	bool ret = false;
+
+	mutex_init(&mutex);
+
+	allowedips_init(&t);
+	horrible_allowedips_init(&h);
+
+	peers = kcalloc(NUM_PEERS, sizeof(*peers), GFP_KERNEL);
+	if (unlikely(!peers)) {
+		pr_info("allowedips random self-test: out of memory\n");
+		goto free;
+	}
+	for (i = 0; i < NUM_PEERS; ++i) {
+		peers[i] = kzalloc(sizeof(*peers[i]), GFP_KERNEL);
+		if (unlikely(!peers[i])) {
+			pr_info("allowedips random self-test: out of memory\n");
+			goto free;
+		}
+		kref_init(&peers[i]->refcount);
+	}
+
+	mutex_lock(&mutex);
+
+	for (i = 0; i < NUM_RAND_ROUTES; ++i) {
+		prandom_bytes(ip, 4);
+		cidr = prandom_u32_max(32) + 1;
+		peer = peers[prandom_u32_max(NUM_PEERS)];
+		if (allowedips_insert_v4(&t, (struct in_addr *)ip, cidr, peer,
+					 &mutex) < 0) {
+			pr_info("allowedips random self-test: out of memory\n");
+			goto free;
+		}
+		if (horrible_allowedips_insert_v4(&h, (struct in_addr *)ip,
+						  cidr, peer) < 0) {
+			pr_info("allowedips random self-test: out of memory\n");
+			goto free;
+		}
+		for (j = 0; j < NUM_MUTATED_ROUTES; ++j) {
+			memcpy(mutated, ip, 4);
+			prandom_bytes(mutate_mask, 4);
+			mutate_amount = prandom_u32_max(32);
+			for (k = 0; k < mutate_amount / 8; ++k)
+				mutate_mask[k] = 0xff;
+			mutate_mask[k] = 0xff
+					 << ((8 - (mutate_amount % 8)) % 8);
+			for (; k < 4; ++k)
+				mutate_mask[k] = 0;
+			for (k = 0; k < 4; ++k)
+				mutated[k] = (mutated[k] & mutate_mask[k]) |
+					     (~mutate_mask[k] &
+					      prandom_u32_max(256));
+			cidr = prandom_u32_max(32) + 1;
+			peer = peers[prandom_u32_max(NUM_PEERS)];
+			if (allowedips_insert_v4(&t, (struct in_addr *)mutated,
+						 cidr, peer, &mutex) < 0) {
+				pr_info("allowedips random self-test: out of memory\n");
+				goto free;
+			}
+			if (horrible_allowedips_insert_v4(&h,
+				(struct in_addr *)mutated, cidr, peer)) {
+				pr_info("allowedips random self-test: out of memory\n");
+				goto free;
+			}
+		}
+	}
+
+	for (i = 0; i < NUM_RAND_ROUTES; ++i) {
+		prandom_bytes(ip, 16);
+		cidr = prandom_u32_max(128) + 1;
+		peer = peers[prandom_u32_max(NUM_PEERS)];
+		if (allowedips_insert_v6(&t, (struct in6_addr *)ip, cidr, peer,
+					 &mutex) < 0) {
+			pr_info("allowedips random self-test: out of memory\n");
+			goto free;
+		}
+		if (horrible_allowedips_insert_v6(&h, (struct in6_addr *)ip,
+						  cidr, peer) < 0) {
+			pr_info("allowedips random self-test: out of memory\n");
+			goto free;
+		}
+		for (j = 0; j < NUM_MUTATED_ROUTES; ++j) {
+			memcpy(mutated, ip, 16);
+			prandom_bytes(mutate_mask, 16);
+			mutate_amount = prandom_u32_max(128);
+			for (k = 0; k < mutate_amount / 8; ++k)
+				mutate_mask[k] = 0xff;
+			mutate_mask[k] = 0xff
+					 << ((8 - (mutate_amount % 8)) % 8);
+			for (; k < 4; ++k)
+				mutate_mask[k] = 0;
+			for (k = 0; k < 4; ++k)
+				mutated[k] = (mutated[k] & mutate_mask[k]) |
+					     (~mutate_mask[k] &
+					      prandom_u32_max(256));
+			cidr = prandom_u32_max(128) + 1;
+			peer = peers[prandom_u32_max(NUM_PEERS)];
+			if (allowedips_insert_v6(&t, (struct in6_addr *)mutated,
+						 cidr, peer, &mutex) < 0) {
+				pr_info("allowedips random self-test: out of memory\n");
+				goto free;
+			}
+			if (horrible_allowedips_insert_v6(
+				    &h, (struct in6_addr *)mutated, cidr,
+				    peer)) {
+				pr_info("allowedips random self-test: out of memory\n");
+				goto free;
+			}
+		}
+	}
+
+	mutex_unlock(&mutex);
+
+#ifdef DEBUG_PRINT_TRIE_GRAPHVIZ
+	print_tree(t.root4, 32);
+	print_tree(t.root6, 128);
+#endif
+
+	for (i = 0; i < NUM_QUERIES; ++i) {
+		prandom_bytes(ip, 4);
+		if (lookup(t.root4, 32, ip) !=
+		    horrible_allowedips_lookup_v4(&h, (struct in_addr *)ip)) {
+			pr_info("allowedips random self-test: FAIL\n");
+			goto free;
+		}
+	}
+
+	for (i = 0; i < NUM_QUERIES; ++i) {
+		prandom_bytes(ip, 16);
+		if (lookup(t.root6, 128, ip) !=
+		    horrible_allowedips_lookup_v6(&h, (struct in6_addr *)ip)) {
+			pr_info("allowedips random self-test: FAIL\n");
+			goto free;
+		}
+	}
+	ret = true;
+
+free:
+	mutex_lock(&mutex);
+	allowedips_free(&t, &mutex);
+	mutex_unlock(&mutex);
+	horrible_allowedips_free(&h);
+	if (peers) {
+		for (i = 0; i < NUM_PEERS; ++i)
+			kfree(peers[i]);
+	}
+	kfree(peers);
+	return ret;
+}
+#endif
+
+static __init inline struct in_addr *ip4(u8 a, u8 b, u8 c, u8 d)
+{
+	static struct in_addr ip;
+	u8 *split = (u8 *)&ip;
+	split[0] = a;
+	split[1] = b;
+	split[2] = c;
+	split[3] = d;
+	return &ip;
+}
+static __init inline struct in6_addr *ip6(u32 a, u32 b, u32 c, u32 d)
+{
+	static struct in6_addr ip;
+	__be32 *split = (__be32 *)&ip;
+	split[0] = cpu_to_be32(a);
+	split[1] = cpu_to_be32(b);
+	split[2] = cpu_to_be32(c);
+	split[3] = cpu_to_be32(d);
+	return &ip;
+}
+
+struct walk_ctx {
+	int count;
+	bool found_a, found_b, found_c, found_d, found_e;
+	bool found_other;
+};
+
+static __init int walk_callback(void *ctx, const u8 *ip, u8 cidr, int family)
+{
+	struct walk_ctx *wctx = ctx;
+
+	wctx->count++;
+
+	if (cidr == 27 &&
+	    !memcmp(ip, ip4(192, 95, 5, 64), sizeof(struct in_addr)))
+		wctx->found_a = true;
+	else if (cidr == 128 &&
+		 !memcmp(ip, ip6(0x26075300, 0x60006b00, 0, 0xc05f0543),
+			 sizeof(struct in6_addr)))
+		wctx->found_b = true;
+	else if (cidr == 29 &&
+		 !memcmp(ip, ip4(10, 1, 0, 16), sizeof(struct in_addr)))
+		wctx->found_c = true;
+	else if (cidr == 83 &&
+		 !memcmp(ip, ip6(0x26075300, 0x6d8a6bf8, 0xdab1e000, 0),
+			 sizeof(struct in6_addr)))
+		wctx->found_d = true;
+	else if (cidr == 21 &&
+		 !memcmp(ip, ip6(0x26075000, 0, 0, 0), sizeof(struct in6_addr)))
+		wctx->found_e = true;
+	else
+		wctx->found_other = true;
+
+	return 0;
+}
+
+#define init_peer(name) do {                                               \
+		name = kzalloc(sizeof(*name), GFP_KERNEL);                 \
+		if (unlikely(!name)) {                                     \
+			pr_info("allowedips self-test: out of memory\n");  \
+			goto free;                                         \
+		}                                                          \
+		kref_init(&name->refcount);                                \
+	} while (0)
+
+#define insert(version, mem, ipa, ipb, ipc, ipd, cidr)                    \
+	allowedips_insert_v##version(&t, ip##version(ipa, ipb, ipc, ipd), \
+				     cidr, mem, &mutex)
+
+#define maybe_fail() do {                                               \
+		++i;                                                    \
+		if (!_s) {                                              \
+			pr_info("allowedips self-test %zu: FAIL\n", i); \
+			success = false;                                \
+		}                                                       \
+	} while (0)
+
+#define test(version, mem, ipa, ipb, ipc, ipd) do {                        \
+		bool _s = lookup(t.root##version, version == 4 ? 32 : 128, \
+				 ip##version(ipa, ipb, ipc, ipd)) == mem;  \
+		maybe_fail();                                              \
+	} while (0)
+
+#define test_negative(version, mem, ipa, ipb, ipc, ipd) do {               \
+		bool _s = lookup(t.root##version, version == 4 ? 32 : 128, \
+				 ip##version(ipa, ipb, ipc, ipd)) != mem;  \
+		maybe_fail();                                              \
+	} while (0)
+
+#define test_boolean(cond) do {   \
+		bool _s = (cond); \
+		maybe_fail();     \
+	} while (0)
+
+bool __init allowedips_selftest(void)
+{
+	struct wireguard_peer *a = NULL, *b = NULL, *c = NULL, *d = NULL,
+			      *e = NULL, *f = NULL, *g = NULL, *h = NULL;
+	struct allowedips_cursor cursor = { 0 };
+	struct walk_ctx wctx = { 0 };
+	bool success = false;
+	struct allowedips t;
+	DEFINE_MUTEX(mutex);
+	struct in6_addr ip;
+	size_t i = 0;
+	__be64 part;
+
+	mutex_init(&mutex);
+	mutex_lock(&mutex);
+
+	allowedips_init(&t);
+	init_peer(a);
+	init_peer(b);
+	init_peer(c);
+	init_peer(d);
+	init_peer(e);
+	init_peer(f);
+	init_peer(g);
+	init_peer(h);
+
+	insert(4, a, 192, 168, 4, 0, 24);
+	insert(4, b, 192, 168, 4, 4, 32);
+	insert(4, c, 192, 168, 0, 0, 16);
+	insert(4, d, 192, 95, 5, 64, 27);
+	/* replaces previous entry, and maskself is required */
+	insert(4, c, 192, 95, 5, 65, 27);
+	insert(6, d, 0x26075300, 0x60006b00, 0, 0xc05f0543, 128);
+	insert(6, c, 0x26075300, 0x60006b00, 0, 0, 64);
+	insert(4, e, 0, 0, 0, 0, 0);
+	insert(6, e, 0, 0, 0, 0, 0);
+	/* replaces previous entry */
+	insert(6, f, 0, 0, 0, 0, 0);
+	insert(6, g, 0x24046800, 0, 0, 0, 32);
+	/* maskself is required */
+	insert(6, h, 0x24046800, 0x40040800, 0xdeadbeef, 0xdeadbeef, 64);
+	insert(6, a, 0x24046800, 0x40040800, 0xdeadbeef, 0xdeadbeef, 128);
+	insert(6, c, 0x24446800, 0x40e40800, 0xdeaebeef, 0xdefbeef, 128);
+	insert(6, b, 0x24446800, 0xf0e40800, 0xeeaebeef, 0, 98);
+	insert(4, g, 64, 15, 112, 0, 20);
+	/* maskself is required */
+	insert(4, h, 64, 15, 123, 211, 25);
+	insert(4, a, 10, 0, 0, 0, 25);
+	insert(4, b, 10, 0, 0, 128, 25);
+	insert(4, a, 10, 1, 0, 0, 30);
+	insert(4, b, 10, 1, 0, 4, 30);
+	insert(4, c, 10, 1, 0, 8, 29);
+	insert(4, d, 10, 1, 0, 16, 29);
+
+#ifdef DEBUG_PRINT_TRIE_GRAPHVIZ
+	print_tree(t.root4, 32);
+	print_tree(t.root6, 128);
+#endif
+
+	success = true;
+
+	test(4, a, 192, 168, 4, 20);
+	test(4, a, 192, 168, 4, 0);
+	test(4, b, 192, 168, 4, 4);
+	test(4, c, 192, 168, 200, 182);
+	test(4, c, 192, 95, 5, 68);
+	test(4, e, 192, 95, 5, 96);
+	test(6, d, 0x26075300, 0x60006b00, 0, 0xc05f0543);
+	test(6, c, 0x26075300, 0x60006b00, 0, 0xc02e01ee);
+	test(6, f, 0x26075300, 0x60006b01, 0, 0);
+	test(6, g, 0x24046800, 0x40040806, 0, 0x1006);
+	test(6, g, 0x24046800, 0x40040806, 0x1234, 0x5678);
+	test(6, f, 0x240467ff, 0x40040806, 0x1234, 0x5678);
+	test(6, f, 0x24046801, 0x40040806, 0x1234, 0x5678);
+	test(6, h, 0x24046800, 0x40040800, 0x1234, 0x5678);
+	test(6, h, 0x24046800, 0x40040800, 0, 0);
+	test(6, h, 0x24046800, 0x40040800, 0x10101010, 0x10101010);
+	test(6, a, 0x24046800, 0x40040800, 0xdeadbeef, 0xdeadbeef);
+	test(4, g, 64, 15, 116, 26);
+	test(4, g, 64, 15, 127, 3);
+	test(4, g, 64, 15, 123, 1);
+	test(4, h, 64, 15, 123, 128);
+	test(4, h, 64, 15, 123, 129);
+	test(4, a, 10, 0, 0, 52);
+	test(4, b, 10, 0, 0, 220);
+	test(4, a, 10, 1, 0, 2);
+	test(4, b, 10, 1, 0, 6);
+	test(4, c, 10, 1, 0, 10);
+	test(4, d, 10, 1, 0, 20);
+
+	insert(4, a, 1, 0, 0, 0, 32);
+	insert(4, a, 64, 0, 0, 0, 32);
+	insert(4, a, 128, 0, 0, 0, 32);
+	insert(4, a, 192, 0, 0, 0, 32);
+	insert(4, a, 255, 0, 0, 0, 32);
+	allowedips_remove_by_peer(&t, a, &mutex);
+	test_negative(4, a, 1, 0, 0, 0);
+	test_negative(4, a, 64, 0, 0, 0);
+	test_negative(4, a, 128, 0, 0, 0);
+	test_negative(4, a, 192, 0, 0, 0);
+	test_negative(4, a, 255, 0, 0, 0);
+
+	allowedips_free(&t, &mutex);
+	allowedips_init(&t);
+	insert(4, a, 192, 168, 0, 0, 16);
+	insert(4, a, 192, 168, 0, 0, 24);
+	allowedips_remove_by_peer(&t, a, &mutex);
+	test_negative(4, a, 192, 168, 0, 1);
+
+	/* These will hit the BUG_ON(len >= 128) in free_node if something goes wrong. */
+	for (i = 0; i < 128; ++i) {
+		part = cpu_to_be64(~(1LLU << (i % 64)));
+		memset(&ip, 0xff, 16);
+		memcpy((u8 *)&ip + (i < 64) * 8, &part, 8);
+		allowedips_insert_v6(&t, &ip, 128, a, &mutex);
+	}
+
+	allowedips_free(&t, &mutex);
+
+	allowedips_init(&t);
+	insert(4, a, 192, 95, 5, 93, 27);
+	insert(6, a, 0x26075300, 0x60006b00, 0, 0xc05f0543, 128);
+	insert(4, a, 10, 1, 0, 20, 29);
+	insert(6, a, 0x26075300, 0x6d8a6bf8, 0xdab1f1df, 0xc05f1523, 83);
+	insert(6, a, 0x26075300, 0x6d8a6bf8, 0xdab1f1df, 0xc05f1523, 21);
+	allowedips_walk_by_peer(&t, &cursor, a, walk_callback, &wctx, &mutex);
+	test_boolean(wctx.count == 5);
+	test_boolean(wctx.found_a);
+	test_boolean(wctx.found_b);
+	test_boolean(wctx.found_c);
+	test_boolean(wctx.found_d);
+	test_boolean(wctx.found_e);
+	test_boolean(!wctx.found_other);
+
+#ifdef DEBUG_RANDOM_TRIE
+	if (success)
+		success = randomized_test();
+#endif
+
+	if (success)
+		pr_info("allowedips self-tests: pass\n");
+
+free:
+	allowedips_free(&t, &mutex);
+	kfree(a);
+	kfree(b);
+	kfree(c);
+	kfree(d);
+	kfree(e);
+	kfree(f);
+	kfree(g);
+	kfree(h);
+	mutex_unlock(&mutex);
+
+	return success;
+}
+#undef test_negative
+#undef test
+#undef remove
+#undef insert
+#undef init_peer
+
+#endif
diff --git a/drivers/net/wireguard/selftest/counter.h b/drivers/net/wireguard/selftest/counter.h
new file mode 100644
index 000000000000..1c2a3b4e1fdc
--- /dev/null
+++ b/drivers/net/wireguard/selftest/counter.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+bool __init packet_counter_selftest(void)
+{
+	unsigned int test_num = 0, i;
+	union noise_counter counter;
+	bool success = true;
+
+#define T_INIT do {                                               \
+		memset(&counter, 0, sizeof(union noise_counter)); \
+		spin_lock_init(&counter.receive.lock);            \
+	} while (0)
+#define T_LIM (COUNTER_WINDOW_SIZE + 1)
+#define T(n, v) do {                                                  \
+		++test_num;                                           \
+		if (counter_validate(&counter, n) != v) {             \
+			pr_info("nonce counter self-test %u: FAIL\n", \
+				test_num);                            \
+			success = false;                              \
+		}                                                     \
+	} while (0)
+
+	T_INIT;
+	/*  1 */ T(0, true);
+	/*  2 */ T(1, true);
+	/*  3 */ T(1, false);
+	/*  4 */ T(9, true);
+	/*  5 */ T(8, true);
+	/*  6 */ T(7, true);
+	/*  7 */ T(7, false);
+	/*  8 */ T(T_LIM, true);
+	/*  9 */ T(T_LIM - 1, true);
+	/* 10 */ T(T_LIM - 1, false);
+	/* 11 */ T(T_LIM - 2, true);
+	/* 12 */ T(2, true);
+	/* 13 */ T(2, false);
+	/* 14 */ T(T_LIM + 16, true);
+	/* 15 */ T(3, false);
+	/* 16 */ T(T_LIM + 16, false);
+	/* 17 */ T(T_LIM * 4, true);
+	/* 18 */ T(T_LIM * 4 - (T_LIM - 1), true);
+	/* 19 */ T(10, false);
+	/* 20 */ T(T_LIM * 4 - T_LIM, false);
+	/* 21 */ T(T_LIM * 4 - (T_LIM + 1), false);
+	/* 22 */ T(T_LIM * 4 - (T_LIM - 2), true);
+	/* 23 */ T(T_LIM * 4 + 1 - T_LIM, false);
+	/* 24 */ T(0, false);
+	/* 25 */ T(REJECT_AFTER_MESSAGES, false);
+	/* 26 */ T(REJECT_AFTER_MESSAGES - 1, true);
+	/* 27 */ T(REJECT_AFTER_MESSAGES, false);
+	/* 28 */ T(REJECT_AFTER_MESSAGES - 1, false);
+	/* 29 */ T(REJECT_AFTER_MESSAGES - 2, true);
+	/* 30 */ T(REJECT_AFTER_MESSAGES + 1, false);
+	/* 31 */ T(REJECT_AFTER_MESSAGES + 2, false);
+	/* 32 */ T(REJECT_AFTER_MESSAGES - 2, false);
+	/* 33 */ T(REJECT_AFTER_MESSAGES - 3, true);
+	/* 34 */ T(0, false);
+
+	T_INIT;
+	for (i = 1; i <= COUNTER_WINDOW_SIZE; ++i)
+		T(i, true);
+	T(0, true);
+	T(0, false);
+
+	T_INIT;
+	for (i = 2; i <= COUNTER_WINDOW_SIZE + 1; ++i)
+		T(i, true);
+	T(1, true);
+	T(0, false);
+
+	T_INIT;
+	for (i = COUNTER_WINDOW_SIZE + 1; i-- > 0;)
+		T(i, true);
+
+	T_INIT;
+	for (i = COUNTER_WINDOW_SIZE + 2; i-- > 1;)
+		T(i, true);
+	T(0, false);
+
+	T_INIT;
+	for (i = COUNTER_WINDOW_SIZE + 1; i-- > 1;)
+		T(i, true);
+	T(COUNTER_WINDOW_SIZE + 1, true);
+	T(0, false);
+
+	T_INIT;
+	for (i = COUNTER_WINDOW_SIZE + 1; i-- > 1;)
+		T(i, true);
+	T(0, true);
+	T(COUNTER_WINDOW_SIZE + 1, true);
+#undef T
+#undef T_LIM
+#undef T_INIT
+
+	if (success)
+		pr_info("nonce counter self-tests: pass\n");
+	return success;
+}
+#endif
diff --git a/drivers/net/wireguard/selftest/ratelimiter.h b/drivers/net/wireguard/selftest/ratelimiter.h
new file mode 100644
index 000000000000..f9f99961fc7c
--- /dev/null
+++ b/drivers/net/wireguard/selftest/ratelimiter.h
@@ -0,0 +1,174 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifdef DEBUG
+
+#include <linux/jiffies.h>
+
+static const struct {
+	bool result;
+	unsigned int msec_to_sleep_before;
+} expected_results[] __initconst = {
+	[0 ... PACKETS_BURSTABLE - 1] = { true, 0 },
+	[PACKETS_BURSTABLE] = { false, 0 },
+	[PACKETS_BURSTABLE + 1] = { true, MSEC_PER_SEC / PACKETS_PER_SECOND },
+	[PACKETS_BURSTABLE + 2] = { false, 0 },
+	[PACKETS_BURSTABLE + 3] = { true, (MSEC_PER_SEC / PACKETS_PER_SECOND) * 2 },
+	[PACKETS_BURSTABLE + 4] = { true, 0 },
+	[PACKETS_BURSTABLE + 5] = { false, 0 }
+};
+
+static __init unsigned int maximum_jiffies_at_index(int index)
+{
+	unsigned int total_msecs = 2 * MSEC_PER_SEC / PACKETS_PER_SECOND / 3;
+	int i;
+
+	for (i = 0; i <= index; ++i)
+		total_msecs += expected_results[i].msec_to_sleep_before;
+	return msecs_to_jiffies(total_msecs);
+}
+
+bool __init ratelimiter_selftest(void)
+{
+	int i, test = 0, tries = 0, ret = false;
+	unsigned long loop_start_time;
+#if IS_ENABLED(CONFIG_IPV6)
+	struct sk_buff *skb6;
+	struct ipv6hdr *hdr6;
+#endif
+	struct sk_buff *skb4;
+	struct iphdr *hdr4;
+
+	BUILD_BUG_ON(MSEC_PER_SEC % PACKETS_PER_SECOND != 0);
+
+	if (ratelimiter_init())
+		goto out;
+	++test;
+	if (ratelimiter_init()) {
+		ratelimiter_uninit();
+		goto out;
+	}
+	++test;
+	if (ratelimiter_init()) {
+		ratelimiter_uninit();
+		ratelimiter_uninit();
+		goto out;
+	}
+	++test;
+
+	skb4 = alloc_skb(sizeof(struct iphdr), GFP_KERNEL);
+	if (unlikely(!skb4))
+		goto err_nofree;
+	skb4->protocol = htons(ETH_P_IP);
+	hdr4 = (struct iphdr *)skb_put(skb4, sizeof(*hdr4));
+	hdr4->saddr = htonl(8182);
+	skb_reset_network_header(skb4);
+	++test;
+
+#if IS_ENABLED(CONFIG_IPV6)
+	skb6 = alloc_skb(sizeof(struct ipv6hdr), GFP_KERNEL);
+	if (unlikely(!skb6)) {
+		kfree_skb(skb4);
+		goto err_nofree;
+	}
+	skb6->protocol = htons(ETH_P_IPV6);
+	hdr6 = (struct ipv6hdr *)skb_put(skb6, sizeof(*hdr6));
+	hdr6->saddr.in6_u.u6_addr32[0] = htonl(1212);
+	hdr6->saddr.in6_u.u6_addr32[1] = htonl(289188);
+	skb_reset_network_header(skb6);
+	++test;
+#endif
+
+restart:
+	loop_start_time = jiffies;
+	for (i = 0; i < ARRAY_SIZE(expected_results); ++i) {
+#define ensure_time do {                                                   \
+		if (time_is_before_jiffies(loop_start_time +               \
+					   maximum_jiffies_at_index(i))) { \
+			if (++tries >= 5000)                               \
+				goto err;                                  \
+			gc_entries(NULL);                                  \
+			rcu_barrier();                                     \
+			msleep(500);                                       \
+			goto restart;                                      \
+		}                                                          \
+	} while (0)
+
+		if (expected_results[i].msec_to_sleep_before)
+			msleep(expected_results[i].msec_to_sleep_before);
+
+		ensure_time;
+		if (ratelimiter_allow(skb4, &init_net) !=
+		    expected_results[i].result)
+			goto err;
+		++test;
+		hdr4->saddr = htonl(ntohl(hdr4->saddr) + i + 1);
+		ensure_time;
+		if (!ratelimiter_allow(skb4, &init_net))
+			goto err;
+		++test;
+		hdr4->saddr = htonl(ntohl(hdr4->saddr) - i - 1);
+
+#if IS_ENABLED(CONFIG_IPV6)
+		hdr6->saddr.in6_u.u6_addr32[2] =
+			hdr6->saddr.in6_u.u6_addr32[3] = htonl(i);
+		ensure_time;
+		if (ratelimiter_allow(skb6, &init_net) !=
+		    expected_results[i].result)
+			goto err;
+		++test;
+		hdr6->saddr.in6_u.u6_addr32[0] =
+			htonl(ntohl(hdr6->saddr.in6_u.u6_addr32[0]) + i + 1);
+		ensure_time;
+		if (!ratelimiter_allow(skb6, &init_net))
+			goto err;
+		++test;
+		hdr6->saddr.in6_u.u6_addr32[0] =
+			htonl(ntohl(hdr6->saddr.in6_u.u6_addr32[0]) - i - 1);
+		ensure_time;
+#endif
+	}
+
+	tries = 0;
+restart2:
+	gc_entries(NULL);
+	rcu_barrier();
+
+	if (atomic_read(&total_entries))
+		goto err;
+	++test;
+
+	for (i = 0; i <= max_entries; ++i) {
+		hdr4->saddr = htonl(i);
+		if (ratelimiter_allow(skb4, &init_net) != (i != max_entries)) {
+			if (++tries < 5000)
+				goto restart2;
+			goto err;
+		}
+		++test;
+	}
+
+	ret = true;
+
+err:
+	kfree_skb(skb4);
+#if IS_ENABLED(CONFIG_IPV6)
+	kfree_skb(skb6);
+#endif
+err_nofree:
+	ratelimiter_uninit();
+	ratelimiter_uninit();
+	ratelimiter_uninit();
+	/* Uninit one extra time to check underflow detection. */
+	ratelimiter_uninit();
+out:
+	if (ret)
+		pr_info("ratelimiter self-tests: pass\n");
+	else
+		pr_info("ratelimiter self-test %d: fail\n", test);
+
+	return ret;
+}
+#endif
diff --git a/drivers/net/wireguard/send.c b/drivers/net/wireguard/send.c
new file mode 100644
index 000000000000..5b6d0fe733c4
--- /dev/null
+++ b/drivers/net/wireguard/send.c
@@ -0,0 +1,420 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "queueing.h"
+#include "timers.h"
+#include "device.h"
+#include "peer.h"
+#include "socket.h"
+#include "messages.h"
+#include "cookie.h"
+
+#include <linux/simd.h>
+#include <linux/uio.h>
+#include <linux/inetdevice.h>
+#include <linux/socket.h>
+#include <net/ip_tunnels.h>
+#include <net/udp.h>
+#include <net/sock.h>
+
+static void packet_send_handshake_initiation(struct wireguard_peer *peer)
+{
+	struct message_handshake_initiation packet;
+
+	if (!has_expired(atomic64_read(&peer->last_sent_handshake),
+			 REKEY_TIMEOUT))
+		return; /* This function is rate limited. */
+
+	atomic64_set(&peer->last_sent_handshake, ktime_get_boot_fast_ns());
+	net_dbg_ratelimited("%s: Sending handshake initiation to peer %llu (%pISpfsc)\n",
+			    peer->device->dev->name, peer->internal_id,
+			    &peer->endpoint.addr);
+
+	if (noise_handshake_create_initiation(&packet, &peer->handshake)) {
+		cookie_add_mac_to_packet(&packet, sizeof(packet), peer);
+		timers_any_authenticated_packet_traversal(peer);
+		timers_any_authenticated_packet_sent(peer);
+		atomic64_set(&peer->last_sent_handshake,
+			     ktime_get_boot_fast_ns());
+		socket_send_buffer_to_peer(peer, &packet, sizeof(packet),
+					   HANDSHAKE_DSCP);
+		timers_handshake_initiated(peer);
+	}
+}
+
+void packet_handshake_send_worker(struct work_struct *work)
+{
+	struct wireguard_peer *peer = container_of(work, struct wireguard_peer,
+						   transmit_handshake_work);
+
+	packet_send_handshake_initiation(peer);
+	peer_put(peer);
+}
+
+void packet_send_queued_handshake_initiation(struct wireguard_peer *peer,
+					     bool is_retry)
+{
+	if (!is_retry)
+		peer->timer_handshake_attempts = 0;
+
+	rcu_read_lock_bh();
+	/* We check last_sent_handshake here in addition to the actual function
+	 * we're queueing up, so that we don't queue things if not strictly
+	 * necessary:
+	 */
+	if (!has_expired(atomic64_read(&peer->last_sent_handshake),
+			 REKEY_TIMEOUT) || unlikely(peer->is_dead))
+		goto out;
+
+	peer_get(peer);
+	/* Queues up calling packet_send_queued_handshakes(peer), where we do a
+	 * peer_put(peer) after:
+	 */
+	if (!queue_work(peer->device->handshake_send_wq,
+			&peer->transmit_handshake_work))
+		/* If the work was already queued, we want to drop the
+		 * extra reference:
+		 */
+		peer_put(peer);
+out:
+	rcu_read_unlock_bh();
+}
+
+void packet_send_handshake_response(struct wireguard_peer *peer)
+{
+	struct message_handshake_response packet;
+
+	atomic64_set(&peer->last_sent_handshake, ktime_get_boot_fast_ns());
+	net_dbg_ratelimited("%s: Sending handshake response to peer %llu (%pISpfsc)\n",
+			    peer->device->dev->name, peer->internal_id,
+			    &peer->endpoint.addr);
+
+	if (noise_handshake_create_response(&packet, &peer->handshake)) {
+		cookie_add_mac_to_packet(&packet, sizeof(packet), peer);
+		if (noise_handshake_begin_session(&peer->handshake,
+						  &peer->keypairs)) {
+			timers_session_derived(peer);
+			timers_any_authenticated_packet_traversal(peer);
+			timers_any_authenticated_packet_sent(peer);
+			atomic64_set(&peer->last_sent_handshake,
+				     ktime_get_boot_fast_ns());
+			socket_send_buffer_to_peer(peer, &packet,
+						   sizeof(packet),
+						   HANDSHAKE_DSCP);
+		}
+	}
+}
+
+void packet_send_handshake_cookie(struct wireguard_device *wg,
+				  struct sk_buff *initiating_skb,
+				  __le32 sender_index)
+{
+	struct message_handshake_cookie packet;
+
+	net_dbg_skb_ratelimited("%s: Sending cookie response for denied handshake message for %pISpfsc\n",
+				wg->dev->name, initiating_skb);
+	cookie_message_create(&packet, initiating_skb, sender_index,
+			      &wg->cookie_checker);
+	socket_send_buffer_as_reply_to_skb(wg, initiating_skb, &packet,
+					   sizeof(packet));
+}
+
+static inline void keep_key_fresh(struct wireguard_peer *peer)
+{
+	struct noise_keypair *keypair;
+	bool send = false;
+
+	rcu_read_lock_bh();
+	keypair = rcu_dereference_bh(peer->keypairs.current_keypair);
+	if (likely(keypair && keypair->sending.is_valid) &&
+	    (unlikely(atomic64_read(&keypair->sending.counter.counter) >
+		      REKEY_AFTER_MESSAGES) ||
+	     (keypair->i_am_the_initiator &&
+	      unlikely(has_expired(keypair->sending.birthdate,
+				   REKEY_AFTER_TIME)))))
+		send = true;
+	rcu_read_unlock_bh();
+
+	if (send)
+		packet_send_queued_handshake_initiation(peer, false);
+}
+
+static inline unsigned int skb_padding(struct sk_buff *skb)
+{
+	/* We do this modulo business with the MTU, just in case the networking
+	 * layer gives us a packet that's bigger than the MTU. In that case, we
+	 * wouldn't want the final subtraction to overflow in the case of the
+	 * padded_size being clamped.
+	 */
+	unsigned int last_unit = skb->len % PACKET_CB(skb)->mtu;
+	unsigned int padded_size = ALIGN(last_unit, MESSAGE_PADDING_MULTIPLE);
+
+	if (padded_size > PACKET_CB(skb)->mtu)
+		padded_size = PACKET_CB(skb)->mtu;
+	return padded_size - last_unit;
+}
+
+static inline bool skb_encrypt(struct sk_buff *skb,
+			       struct noise_keypair *keypair,
+			       simd_context_t simd_context)
+{
+	unsigned int padding_len, plaintext_len, trailer_len;
+	struct scatterlist sg[MAX_SKB_FRAGS * 2 + 1];
+	struct message_data *header;
+	struct sk_buff *trailer;
+	int num_frags;
+
+	/* Calculate lengths. */
+	padding_len = skb_padding(skb);
+	trailer_len = padding_len + noise_encrypted_len(0);
+	plaintext_len = skb->len + padding_len;
+
+	/* Expand data section to have room for padding and auth tag. */
+	num_frags = skb_cow_data(skb, trailer_len, &trailer);
+	if (unlikely(num_frags < 0 || num_frags > ARRAY_SIZE(sg)))
+		return false;
+
+	/* Set the padding to zeros, and make sure it and the auth tag are part
+	 * of the skb.
+	 */
+	memset(skb_tail_pointer(trailer), 0, padding_len);
+
+	/* Expand head section to have room for our header and the network
+	 * stack's headers.
+	 */
+	if (unlikely(skb_cow_head(skb, DATA_PACKET_HEAD_ROOM) < 0))
+		return false;
+
+	/* We have to remember to add the checksum to the innerpacket, in case
+	 * the receiver forwards it.
+	 */
+	if (likely(!skb_checksum_setup(skb, true)))
+		skb_checksum_help(skb);
+
+	/* Only after checksumming can we safely add on the padding at the end
+	 * and the header.
+	 */
+	skb_set_inner_network_header(skb, 0);
+	header = (struct message_data *)skb_push(skb, sizeof(*header));
+	header->header.type = cpu_to_le32(MESSAGE_DATA);
+	header->key_idx = keypair->remote_index;
+	header->counter = cpu_to_le64(PACKET_CB(skb)->nonce);
+	pskb_put(skb, trailer, trailer_len);
+
+	/* Now we can encrypt the scattergather segments */
+	sg_init_table(sg, num_frags);
+	if (skb_to_sgvec(skb, sg, sizeof(struct message_data),
+			 noise_encrypted_len(plaintext_len)) <= 0)
+		return false;
+	return chacha20poly1305_encrypt_sg(sg, sg, plaintext_len, NULL, 0,
+					   PACKET_CB(skb)->nonce,
+					   keypair->sending.key, simd_context);
+}
+
+void packet_send_keepalive(struct wireguard_peer *peer)
+{
+	struct sk_buff *skb;
+
+	if (skb_queue_empty(&peer->staged_packet_queue)) {
+		skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH,
+				GFP_ATOMIC);
+		if (unlikely(!skb))
+			return;
+		skb_reserve(skb, DATA_PACKET_HEAD_ROOM);
+		skb->dev = peer->device->dev;
+		PACKET_CB(skb)->mtu = skb->dev->mtu;
+		skb_queue_tail(&peer->staged_packet_queue, skb);
+		net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n",
+				    peer->device->dev->name, peer->internal_id,
+				    &peer->endpoint.addr);
+	}
+
+	packet_send_staged_packets(peer);
+}
+
+#define skb_walk_null_queue_safe(first, skb, next)                             \
+	for (skb = first, next = skb->next; skb;                               \
+	     skb = next, next = skb ? skb->next : NULL)
+static inline void skb_free_null_queue(struct sk_buff *first)
+{
+	struct sk_buff *skb, *next;
+
+	skb_walk_null_queue_safe (first, skb, next)
+		dev_kfree_skb(skb);
+}
+
+static void packet_create_data_done(struct sk_buff *first,
+				    struct wireguard_peer *peer)
+{
+	struct sk_buff *skb, *next;
+	bool is_keepalive, data_sent = false;
+
+	timers_any_authenticated_packet_traversal(peer);
+	timers_any_authenticated_packet_sent(peer);
+	skb_walk_null_queue_safe (first, skb, next) {
+		is_keepalive = skb->len == message_data_len(0);
+		if (likely(!socket_send_skb_to_peer(peer, skb,
+				PACKET_CB(skb)->ds) && !is_keepalive))
+			data_sent = true;
+	}
+
+	if (likely(data_sent))
+		timers_data_sent(peer);
+
+	keep_key_fresh(peer);
+}
+
+void packet_tx_worker(struct work_struct *work)
+{
+	struct crypt_queue *queue =
+		container_of(work, struct crypt_queue, work);
+	struct wireguard_peer *peer;
+	struct noise_keypair *keypair;
+	struct sk_buff *first;
+	enum packet_state state;
+
+	while ((first = __ptr_ring_peek(&queue->ring)) != NULL &&
+	       (state = atomic_read_acquire(&PACKET_CB(first)->state)) !=
+		       PACKET_STATE_UNCRYPTED) {
+		__ptr_ring_discard_one(&queue->ring);
+		peer = PACKET_PEER(first);
+		keypair = PACKET_CB(first)->keypair;
+
+		if (likely(state == PACKET_STATE_CRYPTED))
+			packet_create_data_done(first, peer);
+		else
+			skb_free_null_queue(first);
+
+		noise_keypair_put(keypair, false);
+		peer_put(peer);
+	}
+}
+
+void packet_encrypt_worker(struct work_struct *work)
+{
+	struct crypt_queue *queue =
+		container_of(work, struct multicore_worker, work)->ptr;
+	struct sk_buff *first, *skb, *next;
+	simd_context_t simd_context = simd_get();
+
+	while ((first = ptr_ring_consume_bh(&queue->ring)) != NULL) {
+		enum packet_state state = PACKET_STATE_CRYPTED;
+
+		skb_walk_null_queue_safe (first, skb, next) {
+			if (likely(skb_encrypt(skb, PACKET_CB(first)->keypair,
+					       simd_context)))
+				skb_reset(skb);
+			else {
+				state = PACKET_STATE_DEAD;
+				break;
+			}
+		}
+		queue_enqueue_per_peer(&PACKET_PEER(first)->tx_queue, first,
+				       state);
+
+		simd_context = simd_relax(simd_context);
+	}
+	simd_put(simd_context);
+}
+
+static void packet_create_data(struct sk_buff *first)
+{
+	struct wireguard_peer *peer = PACKET_PEER(first);
+	struct wireguard_device *wg = peer->device;
+	int ret = -EINVAL;
+
+	rcu_read_lock_bh();
+	if (unlikely(peer->is_dead))
+		goto err;
+
+	ret = queue_enqueue_per_device_and_peer(&wg->encrypt_queue,
+						&peer->tx_queue, first,
+						wg->packet_crypt_wq,
+						&wg->encrypt_queue.last_cpu);
+	if (unlikely(ret == -EPIPE))
+		queue_enqueue_per_peer(&peer->tx_queue, first,
+				       PACKET_STATE_DEAD);
+err:
+	rcu_read_unlock_bh();
+	if (likely(!ret || ret == -EPIPE))
+		return;
+	noise_keypair_put(PACKET_CB(first)->keypair, false);
+	peer_put(peer);
+	skb_free_null_queue(first);
+}
+
+void packet_send_staged_packets(struct wireguard_peer *peer)
+{
+	struct noise_symmetric_key *key;
+	struct noise_keypair *keypair;
+	struct sk_buff_head packets;
+	struct sk_buff *skb;
+
+	/* Steal the current queue into our local one. */
+	__skb_queue_head_init(&packets);
+	spin_lock_bh(&peer->staged_packet_queue.lock);
+	skb_queue_splice_init(&peer->staged_packet_queue, &packets);
+	spin_unlock_bh(&peer->staged_packet_queue.lock);
+	if (unlikely(skb_queue_empty(&packets)))
+		return;
+
+	/* First we make sure we have a valid reference to a valid key. */
+	rcu_read_lock_bh();
+	keypair = noise_keypair_get(
+		rcu_dereference_bh(peer->keypairs.current_keypair));
+	rcu_read_unlock_bh();
+	if (unlikely(!keypair))
+		goto out_nokey;
+	key = &keypair->sending;
+	if (unlikely(!key->is_valid))
+		goto out_nokey;
+	if (unlikely(has_expired(key->birthdate, REJECT_AFTER_TIME)))
+		goto out_invalid;
+
+	/* After we know we have a somewhat valid key, we now try to assign
+	 * nonces to all of the packets in the queue. If we can't assign nonces
+	 * for all of them, we just consider it a failure and wait for the next
+	 * handshake.
+	 */
+	skb_queue_walk (&packets, skb) {
+		/* 0 for no outer TOS: no leak. TODO: should we use flowi->tos
+		 * as outer? */
+		PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb);
+		PACKET_CB(skb)->nonce =
+				atomic64_inc_return(&key->counter.counter) - 1;
+		if (unlikely(PACKET_CB(skb)->nonce >= REJECT_AFTER_MESSAGES))
+			goto out_invalid;
+	}
+
+	packets.prev->next = NULL;
+	peer_get(keypair->entry.peer);
+	PACKET_CB(packets.next)->keypair = keypair;
+	packet_create_data(packets.next);
+	return;
+
+out_invalid:
+	key->is_valid = false;
+out_nokey:
+	noise_keypair_put(keypair, false);
+
+	/* We orphan the packets if we're waiting on a handshake, so that they
+	 * don't block a socket's pool.
+	 */
+	skb_queue_walk (&packets, skb)
+		skb_orphan(skb);
+	/* Then we put them back on the top of the queue. We're not too
+	 * concerned about accidentally getting things a little out of order if
+	 * packets are being added really fast, because this queue is for before
+	 * packets can even be sent and it's small anyway.
+	 */
+	spin_lock_bh(&peer->staged_packet_queue.lock);
+	skb_queue_splice(&packets, &peer->staged_packet_queue);
+	spin_unlock_bh(&peer->staged_packet_queue.lock);
+
+	/* If we're exiting because there's something wrong with the key, it
+	 * means we should initiate a new handshake.
+	 */
+	packet_send_queued_handshake_initiation(peer, false);
+}
diff --git a/drivers/net/wireguard/socket.c b/drivers/net/wireguard/socket.c
new file mode 100644
index 000000000000..2e9e44f3b1d2
--- /dev/null
+++ b/drivers/net/wireguard/socket.c
@@ -0,0 +1,435 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "device.h"
+#include "peer.h"
+#include "socket.h"
+#include "queueing.h"
+#include "messages.h"
+
+#include <linux/ctype.h>
+#include <linux/net.h>
+#include <linux/if_vlan.h>
+#include <linux/if_ether.h>
+#include <linux/inetdevice.h>
+#include <net/udp_tunnel.h>
+#include <net/ipv6.h>
+
+static inline int send4(struct wireguard_device *wg, struct sk_buff *skb,
+			struct endpoint *endpoint, u8 ds,
+			struct dst_cache *cache)
+{
+	struct flowi4 fl = {
+		.saddr = endpoint->src4.s_addr,
+		.daddr = endpoint->addr4.sin_addr.s_addr,
+		.fl4_dport = endpoint->addr4.sin_port,
+		.flowi4_mark = wg->fwmark,
+		.flowi4_proto = IPPROTO_UDP
+	};
+	struct rtable *rt = NULL;
+	struct sock *sock;
+	int ret = 0;
+
+	skb->next = skb->prev = NULL;
+	skb->dev = wg->dev;
+	skb->mark = wg->fwmark;
+
+	rcu_read_lock_bh();
+	sock = rcu_dereference_bh(wg->sock4);
+
+	if (unlikely(!sock)) {
+		ret = -ENONET;
+		goto err;
+	}
+
+	fl.fl4_sport = inet_sk(sock)->inet_sport;
+
+	if (cache)
+		rt = dst_cache_get_ip4(cache, &fl.saddr);
+
+	if (!rt) {
+		security_sk_classify_flow(sock, flowi4_to_flowi(&fl));
+		if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0,
+						fl.saddr, RT_SCOPE_HOST))) {
+			endpoint->src4.s_addr = 0;
+			*(__force __be32 *)&endpoint->src_if4 = 0;
+			fl.saddr = 0;
+			if (cache)
+				dst_cache_reset(cache);
+		}
+		rt = ip_route_output_flow(sock_net(sock), &fl, sock);
+		if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) &&
+				PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) &&
+				rt->dst.dev->ifindex != endpoint->src_if4)))) {
+			endpoint->src4.s_addr = 0;
+			*(__force __be32 *)&endpoint->src_if4 = 0;
+			fl.saddr = 0;
+			if (cache)
+				dst_cache_reset(cache);
+			if (!IS_ERR(rt))
+				ip_rt_put(rt);
+			rt = ip_route_output_flow(sock_net(sock), &fl, sock);
+		}
+		if (unlikely(IS_ERR(rt))) {
+			ret = PTR_ERR(rt);
+			net_dbg_ratelimited("%s: No route to %pISpfsc, error %d\n",
+					    wg->dev->name, &endpoint->addr, ret);
+			goto err;
+		} else if (unlikely(rt->dst.dev == skb->dev)) {
+			ip_rt_put(rt);
+			ret = -ELOOP;
+			net_dbg_ratelimited("%s: Avoiding routing loop to %pISpfsc\n",
+					    wg->dev->name, &endpoint->addr);
+			goto err;
+		}
+		if (cache)
+			dst_cache_set_ip4(cache, &rt->dst, fl.saddr);
+	}
+	udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds,
+			    ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport,
+			    fl.fl4_dport, false, false);
+	goto out;
+
+err:
+	kfree_skb(skb);
+out:
+	rcu_read_unlock_bh();
+	return ret;
+}
+
+static inline int send6(struct wireguard_device *wg, struct sk_buff *skb,
+			struct endpoint *endpoint, u8 ds,
+			struct dst_cache *cache)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+	struct flowi6 fl = {
+		.saddr = endpoint->src6,
+		.daddr = endpoint->addr6.sin6_addr,
+		.fl6_dport = endpoint->addr6.sin6_port,
+		.flowi6_mark = wg->fwmark,
+		.flowi6_oif = endpoint->addr6.sin6_scope_id,
+		.flowi6_proto = IPPROTO_UDP
+		/* TODO: addr->sin6_flowinfo */
+	};
+	struct dst_entry *dst = NULL;
+	struct sock *sock;
+	int ret = 0;
+
+	skb->next = skb->prev = NULL;
+	skb->dev = wg->dev;
+	skb->mark = wg->fwmark;
+
+	rcu_read_lock_bh();
+	sock = rcu_dereference_bh(wg->sock6);
+
+	if (unlikely(!sock)) {
+		ret = -ENONET;
+		goto err;
+	}
+
+	fl.fl6_sport = inet_sk(sock)->inet_sport;
+
+	if (cache)
+		dst = dst_cache_get_ip6(cache, &fl.saddr);
+
+	if (!dst) {
+		security_sk_classify_flow(sock, flowi6_to_flowi(&fl));
+		if (unlikely(!ipv6_addr_any(&fl.saddr) &&
+			     !ipv6_chk_addr(sock_net(sock), &fl.saddr, NULL, 0))) {
+			endpoint->src6 = fl.saddr = in6addr_any;
+			if (cache)
+				dst_cache_reset(cache);
+		}
+		ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst,
+						 &fl);
+		if (unlikely(ret)) {
+			net_dbg_ratelimited("%s: No route to %pISpfsc, error %d\n",
+					    wg->dev->name, &endpoint->addr, ret);
+			goto err;
+		} else if (unlikely(dst->dev == skb->dev)) {
+			dst_release(dst);
+			ret = -ELOOP;
+			net_dbg_ratelimited("%s: Avoiding routing loop to %pISpfsc\n",
+					    wg->dev->name, &endpoint->addr);
+			goto err;
+		}
+		if (cache)
+			dst_cache_set_ip6(cache, dst, &fl.saddr);
+	}
+
+	udp_tunnel6_xmit_skb(dst, sock, skb, skb->dev, &fl.saddr, &fl.daddr, ds,
+			     ip6_dst_hoplimit(dst), 0, fl.fl6_sport,
+			     fl.fl6_dport, false);
+	goto out;
+
+err:
+	kfree_skb(skb);
+out:
+	rcu_read_unlock_bh();
+	return ret;
+#else
+	return -EAFNOSUPPORT;
+#endif
+}
+
+int socket_send_skb_to_peer(struct wireguard_peer *peer, struct sk_buff *skb,
+			    u8 ds)
+{
+	size_t skb_len = skb->len;
+	int ret = -EAFNOSUPPORT;
+
+	read_lock_bh(&peer->endpoint_lock);
+	if (peer->endpoint.addr.sa_family == AF_INET)
+		ret = send4(peer->device, skb, &peer->endpoint, ds,
+			    &peer->endpoint_cache);
+	else if (peer->endpoint.addr.sa_family == AF_INET6)
+		ret = send6(peer->device, skb, &peer->endpoint, ds,
+			    &peer->endpoint_cache);
+	else
+		dev_kfree_skb(skb);
+	if (likely(!ret))
+		peer->tx_bytes += skb_len;
+	read_unlock_bh(&peer->endpoint_lock);
+
+	return ret;
+}
+
+int socket_send_buffer_to_peer(struct wireguard_peer *peer, void *buffer,
+			       size_t len, u8 ds)
+{
+	struct sk_buff *skb = alloc_skb(len + SKB_HEADER_LEN, GFP_ATOMIC);
+
+	if (unlikely(!skb))
+		return -ENOMEM;
+
+	skb_reserve(skb, SKB_HEADER_LEN);
+	skb_set_inner_network_header(skb, 0);
+	skb_put_data(skb, buffer, len);
+	return socket_send_skb_to_peer(peer, skb, ds);
+}
+
+int socket_send_buffer_as_reply_to_skb(struct wireguard_device *wg,
+				       struct sk_buff *in_skb, void *buffer,
+				       size_t len)
+{
+	int ret = 0;
+	struct sk_buff *skb;
+	struct endpoint endpoint;
+
+	if (unlikely(!in_skb))
+		return -EINVAL;
+	ret = socket_endpoint_from_skb(&endpoint, in_skb);
+	if (unlikely(ret < 0))
+		return ret;
+
+	skb = alloc_skb(len + SKB_HEADER_LEN, GFP_ATOMIC);
+	if (unlikely(!skb))
+		return -ENOMEM;
+	skb_reserve(skb, SKB_HEADER_LEN);
+	skb_set_inner_network_header(skb, 0);
+	skb_put_data(skb, buffer, len);
+
+	if (endpoint.addr.sa_family == AF_INET)
+		ret = send4(wg, skb, &endpoint, 0, NULL);
+	else if (endpoint.addr.sa_family == AF_INET6)
+		ret = send6(wg, skb, &endpoint, 0, NULL);
+	/* No other possibilities if the endpoint is valid, which it is,
+	 * as we checked above.
+	 */
+
+	return ret;
+}
+
+int socket_endpoint_from_skb(struct endpoint *endpoint,
+			     const struct sk_buff *skb)
+{
+	memset(endpoint, 0, sizeof(*endpoint));
+	if (skb->protocol == htons(ETH_P_IP)) {
+		endpoint->addr4.sin_family = AF_INET;
+		endpoint->addr4.sin_port = udp_hdr(skb)->source;
+		endpoint->addr4.sin_addr.s_addr = ip_hdr(skb)->saddr;
+		endpoint->src4.s_addr = ip_hdr(skb)->daddr;
+		endpoint->src_if4 = skb->skb_iif;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		endpoint->addr6.sin6_family = AF_INET6;
+		endpoint->addr6.sin6_port = udp_hdr(skb)->source;
+		endpoint->addr6.sin6_addr = ipv6_hdr(skb)->saddr;
+		endpoint->addr6.sin6_scope_id = ipv6_iface_scope_id(
+			&ipv6_hdr(skb)->saddr, skb->skb_iif);
+		endpoint->src6 = ipv6_hdr(skb)->daddr;
+	} else
+		return -EINVAL;
+	return 0;
+}
+
+static inline bool endpoint_eq(const struct endpoint *a,
+			       const struct endpoint *b)
+{
+	return (a->addr.sa_family == AF_INET && b->addr.sa_family == AF_INET &&
+		a->addr4.sin_port == b->addr4.sin_port &&
+		a->addr4.sin_addr.s_addr == b->addr4.sin_addr.s_addr &&
+		a->src4.s_addr == b->src4.s_addr && a->src_if4 == b->src_if4) ||
+	       (a->addr.sa_family == AF_INET6 &&
+		b->addr.sa_family == AF_INET6 &&
+		a->addr6.sin6_port == b->addr6.sin6_port &&
+		ipv6_addr_equal(&a->addr6.sin6_addr, &b->addr6.sin6_addr) &&
+		a->addr6.sin6_scope_id == b->addr6.sin6_scope_id &&
+		ipv6_addr_equal(&a->src6, &b->src6)) ||
+	       unlikely(!a->addr.sa_family && !b->addr.sa_family);
+}
+
+void socket_set_peer_endpoint(struct wireguard_peer *peer,
+			      const struct endpoint *endpoint)
+{
+	/* First we check unlocked, in order to optimize, since it's pretty rare
+	 * that an endpoint will change. If we happen to be mid-write, and two
+	 * CPUs wind up writing the same thing or something slightly different,
+	 * it doesn't really matter much either.
+	 */
+	if (endpoint_eq(endpoint, &peer->endpoint))
+		return;
+	write_lock_bh(&peer->endpoint_lock);
+	if (endpoint->addr.sa_family == AF_INET) {
+		peer->endpoint.addr4 = endpoint->addr4;
+		peer->endpoint.src4 = endpoint->src4;
+		peer->endpoint.src_if4 = endpoint->src_if4;
+	} else if (endpoint->addr.sa_family == AF_INET6) {
+		peer->endpoint.addr6 = endpoint->addr6;
+		peer->endpoint.src6 = endpoint->src6;
+	} else
+		goto out;
+	dst_cache_reset(&peer->endpoint_cache);
+out:
+	write_unlock_bh(&peer->endpoint_lock);
+}
+
+void socket_set_peer_endpoint_from_skb(struct wireguard_peer *peer,
+				       const struct sk_buff *skb)
+{
+	struct endpoint endpoint;
+
+	if (!socket_endpoint_from_skb(&endpoint, skb))
+		socket_set_peer_endpoint(peer, &endpoint);
+}
+
+void socket_clear_peer_endpoint_src(struct wireguard_peer *peer)
+{
+	write_lock_bh(&peer->endpoint_lock);
+	memset(&peer->endpoint.src6, 0, sizeof(peer->endpoint.src6));
+	dst_cache_reset(&peer->endpoint_cache);
+	write_unlock_bh(&peer->endpoint_lock);
+}
+
+static int receive(struct sock *sk, struct sk_buff *skb)
+{
+	struct wireguard_device *wg;
+
+	if (unlikely(!sk))
+		goto err;
+	wg = sk->sk_user_data;
+	if (unlikely(!wg))
+		goto err;
+	packet_receive(wg, skb);
+	return 0;
+
+err:
+	kfree_skb(skb);
+	return 0;
+}
+
+static inline void sock_free(struct sock *sock)
+{
+	if (unlikely(!sock))
+		return;
+	sk_clear_memalloc(sock);
+	udp_tunnel_sock_release(sock->sk_socket);
+}
+
+static inline void set_sock_opts(struct socket *sock)
+{
+	sock->sk->sk_allocation = GFP_ATOMIC;
+	sock->sk->sk_sndbuf = INT_MAX;
+	sk_set_memalloc(sock->sk);
+}
+
+int socket_init(struct wireguard_device *wg, u16 port)
+{
+	int ret;
+	struct udp_tunnel_sock_cfg cfg = {
+		.sk_user_data = wg,
+		.encap_type = 1,
+		.encap_rcv = receive
+	};
+	struct socket *new4 = NULL, *new6 = NULL;
+	struct udp_port_cfg port4 = {
+		.family = AF_INET,
+		.local_ip.s_addr = htonl(INADDR_ANY),
+		.local_udp_port = htons(port),
+		.use_udp_checksums = true
+	};
+#if IS_ENABLED(CONFIG_IPV6)
+	int retries = 0;
+	struct udp_port_cfg port6 = {
+		.family = AF_INET6,
+		.local_ip6 = IN6ADDR_ANY_INIT,
+		.use_udp6_tx_checksums = true,
+		.use_udp6_rx_checksums = true,
+		.ipv6_v6only = true
+	};
+#endif
+
+#if IS_ENABLED(CONFIG_IPV6)
+retry:
+#endif
+
+	ret = udp_sock_create(wg->creating_net, &port4, &new4);
+	if (ret < 0) {
+		pr_err("%s: Could not create IPv4 socket\n", wg->dev->name);
+		return ret;
+	}
+	set_sock_opts(new4);
+	setup_udp_tunnel_sock(wg->creating_net, new4, &cfg);
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (ipv6_mod_enabled()) {
+		port6.local_udp_port = inet_sk(new4->sk)->inet_sport;
+		ret = udp_sock_create(wg->creating_net, &port6, &new6);
+		if (ret < 0) {
+			udp_tunnel_sock_release(new4);
+			if (ret == -EADDRINUSE && !port && retries++ < 100)
+				goto retry;
+			pr_err("%s: Could not create IPv6 socket\n",
+			       wg->dev->name);
+			return ret;
+		}
+		set_sock_opts(new6);
+		setup_udp_tunnel_sock(wg->creating_net, new6, &cfg);
+	}
+#endif
+
+	socket_reinit(wg, new4 ? new4->sk : NULL, new6 ? new6->sk : NULL);
+	return 0;
+}
+
+void socket_reinit(struct wireguard_device *wg, struct sock *new4,
+		   struct sock *new6)
+{
+	struct sock *old4, *old6;
+
+	mutex_lock(&wg->socket_update_lock);
+	old4 = rcu_dereference_protected(wg->sock4,
+				lockdep_is_held(&wg->socket_update_lock));
+	old6 = rcu_dereference_protected(wg->sock6,
+				lockdep_is_held(&wg->socket_update_lock));
+	rcu_assign_pointer(wg->sock4, new4);
+	rcu_assign_pointer(wg->sock6, new6);
+	if (new4)
+		wg->incoming_port = ntohs(inet_sk(new4)->inet_sport);
+	mutex_unlock(&wg->socket_update_lock);
+	synchronize_rcu_bh();
+	synchronize_net();
+	sock_free(old4);
+	sock_free(old6);
+}
diff --git a/drivers/net/wireguard/socket.h b/drivers/net/wireguard/socket.h
new file mode 100644
index 000000000000..d873ffad9ea3
--- /dev/null
+++ b/drivers/net/wireguard/socket.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_SOCKET_H
+#define _WG_SOCKET_H
+
+#include <linux/netdevice.h>
+#include <linux/udp.h>
+#include <linux/if_vlan.h>
+#include <linux/if_ether.h>
+
+int socket_init(struct wireguard_device *wg, u16 port);
+void socket_reinit(struct wireguard_device *wg, struct sock *new4,
+		   struct sock *new6);
+int socket_send_buffer_to_peer(struct wireguard_peer *peer, void *data,
+			       size_t len, u8 ds);
+int socket_send_skb_to_peer(struct wireguard_peer *peer, struct sk_buff *skb,
+			    u8 ds);
+int socket_send_buffer_as_reply_to_skb(struct wireguard_device *wg,
+				       struct sk_buff *in_skb, void *out_buffer,
+				       size_t len);
+
+int socket_endpoint_from_skb(struct endpoint *endpoint,
+			     const struct sk_buff *skb);
+void socket_set_peer_endpoint(struct wireguard_peer *peer,
+			      const struct endpoint *endpoint);
+void socket_set_peer_endpoint_from_skb(struct wireguard_peer *peer,
+				       const struct sk_buff *skb);
+void socket_clear_peer_endpoint_src(struct wireguard_peer *peer);
+
+#if defined(CONFIG_DYNAMIC_DEBUG) || defined(DEBUG)
+#define net_dbg_skb_ratelimited(fmt, dev, skb, ...) do {                       \
+		struct endpoint __endpoint;                                    \
+		socket_endpoint_from_skb(&__endpoint, skb);                    \
+		net_dbg_ratelimited(fmt, dev, &__endpoint.addr,                \
+				    ##__VA_ARGS__);                            \
+	} while (0)
+#else
+#define net_dbg_skb_ratelimited(fmt, skb, ...)
+#endif
+
+#endif /* _WG_SOCKET_H */
diff --git a/drivers/net/wireguard/timers.c b/drivers/net/wireguard/timers.c
new file mode 100644
index 000000000000..fead499a7321
--- /dev/null
+++ b/drivers/net/wireguard/timers.c
@@ -0,0 +1,256 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#include "timers.h"
+#include "device.h"
+#include "peer.h"
+#include "queueing.h"
+#include "socket.h"
+
+/*
+ * - Timer for retransmitting the handshake if we don't hear back after
+ * `REKEY_TIMEOUT + jitter` ms.
+ *
+ * - Timer for sending empty packet if we have received a packet but after have
+ * not sent one for `KEEPALIVE_TIMEOUT` ms.
+ *
+ * - Timer for initiating new handshake if we have sent a packet but after have
+ * not received one (even empty) for `(KEEPALIVE_TIMEOUT + REKEY_TIMEOUT)` ms.
+ *
+ * - Timer for zeroing out all ephemeral keys after `(REJECT_AFTER_TIME * 3)` ms
+ * if no new keys have been received.
+ *
+ * - Timer for, if enabled, sending an empty authenticated packet every user-
+ * specified seconds.
+ */
+
+#define peer_get_from_timer(timer_name)                                        \
+	struct wireguard_peer *peer;                                           \
+	rcu_read_lock_bh();                                                    \
+	peer = peer_get_maybe_zero(from_timer(peer, timer, timer_name));       \
+	rcu_read_unlock_bh();                                                  \
+	if (unlikely(!peer))                                                   \
+		return;
+
+static inline void mod_peer_timer(struct wireguard_peer *peer,
+				  struct timer_list *timer,
+				  unsigned long expires)
+{
+	rcu_read_lock_bh();
+	if (likely(netif_running(peer->device->dev) && !peer->is_dead))
+		mod_timer(timer, expires);
+	rcu_read_unlock_bh();
+}
+
+static inline void del_peer_timer(struct wireguard_peer *peer,
+				  struct timer_list *timer)
+{
+	rcu_read_lock_bh();
+	if (likely(netif_running(peer->device->dev) && !peer->is_dead))
+		del_timer(timer);
+	rcu_read_unlock_bh();
+}
+
+static void expired_retransmit_handshake(struct timer_list *timer)
+{
+	peer_get_from_timer(timer_retransmit_handshake);
+
+	if (peer->timer_handshake_attempts > MAX_TIMER_HANDSHAKES) {
+		pr_debug("%s: Handshake for peer %llu (%pISpfsc) did not complete after %d attempts, giving up\n",
+			 peer->device->dev->name, peer->internal_id,
+			 &peer->endpoint.addr, MAX_TIMER_HANDSHAKES + 2);
+
+		del_peer_timer(peer, &peer->timer_send_keepalive);
+		/* We drop all packets without a keypair and don't try again,
+		 * if we try unsuccessfully for too long to make a handshake.
+		 */
+		skb_queue_purge(&peer->staged_packet_queue);
+
+		/* We set a timer for destroying any residue that might be left
+		 * of a partial exchange.
+		 */
+		if (!timer_pending(&peer->timer_zero_key_material))
+			mod_peer_timer(peer, &peer->timer_zero_key_material,
+				       jiffies + REJECT_AFTER_TIME * 3 * HZ);
+	} else {
+		++peer->timer_handshake_attempts;
+		pr_debug("%s: Handshake for peer %llu (%pISpfsc) did not complete after %d seconds, retrying (try %d)\n",
+			 peer->device->dev->name, peer->internal_id,
+			 &peer->endpoint.addr, REKEY_TIMEOUT,
+			 peer->timer_handshake_attempts + 1);
+
+		/* We clear the endpoint address src address, in case this is
+		 * the cause of trouble.
+		 */
+		socket_clear_peer_endpoint_src(peer);
+
+		packet_send_queued_handshake_initiation(peer, true);
+	}
+	peer_put(peer);
+}
+
+static void expired_send_keepalive(struct timer_list *timer)
+{
+	peer_get_from_timer(timer_send_keepalive);
+
+	packet_send_keepalive(peer);
+	if (peer->timer_need_another_keepalive) {
+		peer->timer_need_another_keepalive = false;
+		mod_peer_timer(peer, &peer->timer_send_keepalive,
+			       jiffies + KEEPALIVE_TIMEOUT * HZ);
+	}
+	peer_put(peer);
+}
+
+static void expired_new_handshake(struct timer_list *timer)
+{
+	peer_get_from_timer(timer_new_handshake);
+
+	pr_debug("%s: Retrying handshake with peer %llu (%pISpfsc) because we stopped hearing back after %d seconds\n",
+		 peer->device->dev->name, peer->internal_id,
+		 &peer->endpoint.addr, KEEPALIVE_TIMEOUT + REKEY_TIMEOUT);
+	/* We clear the endpoint address src address, in case this is the cause
+	 * of trouble.
+	 */
+	socket_clear_peer_endpoint_src(peer);
+	packet_send_queued_handshake_initiation(peer, false);
+	peer_put(peer);
+}
+
+static void expired_zero_key_material(struct timer_list *timer)
+{
+	peer_get_from_timer(timer_zero_key_material);
+
+	rcu_read_lock_bh();
+	if (!peer->is_dead) {
+		 /* Should take our reference. */
+		if (!queue_work(peer->device->handshake_send_wq,
+				&peer->clear_peer_work))
+			/* If the work was already on the queue, we want to drop the extra reference */
+			peer_put(peer);
+	}
+	rcu_read_unlock_bh();
+}
+static void queued_expired_zero_key_material(struct work_struct *work)
+{
+	struct wireguard_peer *peer =
+		container_of(work, struct wireguard_peer, clear_peer_work);
+
+	pr_debug("%s: Zeroing out all keys for peer %llu (%pISpfsc), since we haven't received a new one in %d seconds\n",
+		 peer->device->dev->name, peer->internal_id,
+		 &peer->endpoint.addr, REJECT_AFTER_TIME * 3);
+	noise_handshake_clear(&peer->handshake);
+	noise_keypairs_clear(&peer->keypairs);
+	peer_put(peer);
+}
+
+static void expired_send_persistent_keepalive(struct timer_list *timer)
+{
+	peer_get_from_timer(timer_persistent_keepalive);
+
+	if (likely(peer->persistent_keepalive_interval))
+		packet_send_keepalive(peer);
+	peer_put(peer);
+}
+
+/* Should be called after an authenticated data packet is sent. */
+void timers_data_sent(struct wireguard_peer *peer)
+{
+	if (!timer_pending(&peer->timer_new_handshake))
+		mod_peer_timer(peer, &peer->timer_new_handshake,
+			jiffies + (KEEPALIVE_TIMEOUT + REKEY_TIMEOUT) * HZ);
+}
+
+/* Should be called after an authenticated data packet is received. */
+void timers_data_received(struct wireguard_peer *peer)
+{
+	if (likely(netif_running(peer->device->dev))) {
+		if (!timer_pending(&peer->timer_send_keepalive))
+			mod_peer_timer(peer, &peer->timer_send_keepalive,
+				       jiffies + KEEPALIVE_TIMEOUT * HZ);
+		else
+			peer->timer_need_another_keepalive = true;
+	}
+}
+
+/* Should be called after any type of authenticated packet is sent, whether
+ * keepalive, data, or handshake.
+ */
+void timers_any_authenticated_packet_sent(struct wireguard_peer *peer)
+{
+	del_peer_timer(peer, &peer->timer_send_keepalive);
+}
+
+/* Should be called after any type of authenticated packet is received, whether
+ * keepalive, data, or handshake.
+ */
+void timers_any_authenticated_packet_received(struct wireguard_peer *peer)
+{
+	del_peer_timer(peer, &peer->timer_new_handshake);
+}
+
+/* Should be called after a handshake initiation message is sent. */
+void timers_handshake_initiated(struct wireguard_peer *peer)
+{
+	mod_peer_timer(
+		peer, &peer->timer_retransmit_handshake,
+		jiffies + REKEY_TIMEOUT * HZ +
+			prandom_u32_max(REKEY_TIMEOUT_JITTER_MAX_JIFFIES));
+}
+
+/* Should be called after a handshake response message is received and processed
+ * or when getting key confirmation via the first data message.
+ */
+void timers_handshake_complete(struct wireguard_peer *peer)
+{
+	del_peer_timer(peer, &peer->timer_retransmit_handshake);
+	peer->timer_handshake_attempts = 0;
+	peer->sent_lastminute_handshake = false;
+	getnstimeofday(&peer->walltime_last_handshake);
+}
+
+/* Should be called after an ephemeral key is created, which is before sending a
+ * handshake response or after receiving a handshake response.
+ */
+void timers_session_derived(struct wireguard_peer *peer)
+{
+	mod_peer_timer(peer, &peer->timer_zero_key_material,
+		       jiffies + REJECT_AFTER_TIME * 3 * HZ);
+}
+
+/* Should be called before a packet with authentication, whether
+ * keepalive, data, or handshakem is sent, or after one is received.
+ */
+void timers_any_authenticated_packet_traversal(struct wireguard_peer *peer)
+{
+	if (peer->persistent_keepalive_interval)
+		mod_peer_timer(peer, &peer->timer_persistent_keepalive,
+			jiffies + peer->persistent_keepalive_interval * HZ);
+}
+
+void timers_init(struct wireguard_peer *peer)
+{
+	timer_setup(&peer->timer_retransmit_handshake,
+		    expired_retransmit_handshake, 0);
+	timer_setup(&peer->timer_send_keepalive, expired_send_keepalive, 0);
+	timer_setup(&peer->timer_new_handshake, expired_new_handshake, 0);
+	timer_setup(&peer->timer_zero_key_material, expired_zero_key_material, 0);
+	timer_setup(&peer->timer_persistent_keepalive,
+		    expired_send_persistent_keepalive, 0);
+	INIT_WORK(&peer->clear_peer_work, queued_expired_zero_key_material);
+	peer->timer_handshake_attempts = 0;
+	peer->sent_lastminute_handshake = false;
+	peer->timer_need_another_keepalive = false;
+}
+
+void timers_stop(struct wireguard_peer *peer)
+{
+	del_timer_sync(&peer->timer_retransmit_handshake);
+	del_timer_sync(&peer->timer_send_keepalive);
+	del_timer_sync(&peer->timer_new_handshake);
+	del_timer_sync(&peer->timer_zero_key_material);
+	del_timer_sync(&peer->timer_persistent_keepalive);
+	flush_work(&peer->clear_peer_work);
+}
diff --git a/drivers/net/wireguard/timers.h b/drivers/net/wireguard/timers.h
new file mode 100644
index 000000000000..483529c9d873
--- /dev/null
+++ b/drivers/net/wireguard/timers.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ */
+
+#ifndef _WG_TIMERS_H
+#define _WG_TIMERS_H
+
+#include <linux/ktime.h>
+
+struct wireguard_peer;
+
+void timers_init(struct wireguard_peer *peer);
+void timers_stop(struct wireguard_peer *peer);
+void timers_data_sent(struct wireguard_peer *peer);
+void timers_data_received(struct wireguard_peer *peer);
+void timers_any_authenticated_packet_sent(struct wireguard_peer *peer);
+void timers_any_authenticated_packet_received(struct wireguard_peer *peer);
+void timers_handshake_initiated(struct wireguard_peer *peer);
+void timers_handshake_complete(struct wireguard_peer *peer);
+void timers_session_derived(struct wireguard_peer *peer);
+void timers_any_authenticated_packet_traversal(struct wireguard_peer *peer);
+
+static inline bool has_expired(u64 birthday_nanoseconds, u64 expiration_seconds)
+{
+	return (s64)(birthday_nanoseconds + expiration_seconds * NSEC_PER_SEC)
+		<= (s64)ktime_get_boot_fast_ns();
+}
+
+#endif /* _WG_TIMERS_H */
diff --git a/drivers/net/wireguard/version.h b/drivers/net/wireguard/version.h
new file mode 100644
index 000000000000..fba9c7ab9423
--- /dev/null
+++ b/drivers/net/wireguard/version.h
@@ -0,0 +1 @@
+#define WIREGUARD_VERSION "0.0.20180910"
diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h
new file mode 100644
index 000000000000..3d73ad714e52
--- /dev/null
+++ b/include/uapi/linux/wireguard.h
@@ -0,0 +1,190 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR MIT)
+ *
+ * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+ *
+ * Documentation
+ * =============
+ *
+ * The below enums and macros are for interfacing with WireGuard, using generic
+ * netlink, with family WG_GENL_NAME and version WG_GENL_VERSION. It defines two
+ * methods: get and set. Note that while they share many common attributes,
+ * these two functions actually accept a slightly different set of inputs and
+ * outputs.
+ *
+ * WG_CMD_GET_DEVICE
+ * -----------------
+ *
+ * May only be called via NLM_F_REQUEST | NLM_F_DUMP. The command should contain
+ * one but not both of:
+ *
+ *    WGDEVICE_A_IFINDEX: NLA_U32
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *
+ * The kernel will then return several messages (NLM_F_MULTI) containing the
+ * following tree of nested items:
+ *
+ *    WGDEVICE_A_IFINDEX: NLA_U32
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *    WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN
+ *    WGDEVICE_A_PUBLIC_KEY: len WG_KEY_LEN
+ *    WGDEVICE_A_LISTEN_PORT: NLA_U16
+ *    WGDEVICE_A_FWMARK: NLA_U32
+ *    WGDEVICE_A_PEERS: NLA_NESTED
+ *        0: NLA_NESTED
+ *            WGPEER_A_PUBLIC_KEY: len WG_KEY_LEN
+ *            WGPEER_A_PRESHARED_KEY: len WG_KEY_LEN
+ *            WGPEER_A_ENDPOINT: struct sockaddr_in or struct sockaddr_in6
+ *            WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL: NLA_U16
+ *            WGPEER_A_LAST_HANDSHAKE_TIME: struct timespec
+ *            WGPEER_A_RX_BYTES: NLA_U64
+ *            WGPEER_A_TX_BYTES: NLA_U64
+ *            WGPEER_A_ALLOWEDIPS: NLA_NESTED
+ *                0: NLA_NESTED
+ *                    WGALLOWEDIP_A_FAMILY: NLA_U16
+ *                    WGALLOWEDIP_A_IPADDR: struct in_addr or struct in6_addr
+ *                    WGALLOWEDIP_A_CIDR_MASK: NLA_U8
+ *                1: NLA_NESTED
+ *                    ...
+ *                2: NLA_NESTED
+ *                    ...
+ *                ...
+ *            WGPEER_A_PROTOCOL_VERSION: NLA_U32
+ *        1: NLA_NESTED
+ *            ...
+ *        ...
+ *
+ * It is possible that all of the allowed IPs of a single peer will not
+ * fit within a single netlink message. In that case, the same peer will
+ * be written in the following message, except it will only contain
+ * WGPEER_A_PUBLIC_KEY and WGPEER_A_ALLOWEDIPS. This may occur several
+ * times in a row for the same peer. It is then up to the receiver to
+ * coalesce adjacent peers. Likewise, it is possible that all peers will
+ * not fit within a single message. So, subsequent peers will be sent
+ * in following messages, except those will only contain WGDEVICE_A_IFNAME
+ * and WGDEVICE_A_PEERS. It is then up to the receiver to coalesce these
+ * messages to form the complete list of peers.
+ *
+ * Since this is an NLA_F_DUMP command, the final message will always be
+ * NLMSG_DONE, even if an error occurs. However, this NLMSG_DONE message
+ * contains an integer error code. It is either zero or a negative error
+ * code corresponding to the errno.
+ *
+ * WG_CMD_SET_DEVICE
+ * -----------------
+ *
+ * May only be called via NLM_F_REQUEST. The command should contain the
+ * following tree of nested items, containing one but not both of
+ * WGDEVICE_A_IFINDEX and WGDEVICE_A_IFNAME:
+ *
+ *    WGDEVICE_A_IFINDEX: NLA_U32
+ *    WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1
+ *    WGDEVICE_A_FLAGS: NLA_U32, 0 or WGDEVICE_F_REPLACE_PEERS if all current
+ *                      peers should be removed prior to adding the list below.
+ *    WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN, all zeros to remove
+ *    WGDEVICE_A_LISTEN_PORT: NLA_U16, 0 to choose randomly
+ *    WGDEVICE_A_FWMARK: NLA_U32, 0 to disable
+ *    WGDEVICE_A_PEERS: NLA_NESTED
+ *        0: NLA_NESTED
+ *            WGPEER_A_PUBLIC_KEY: len WG_KEY_LEN
+ *            WGPEER_A_FLAGS: NLA_U32, 0 and/or WGPEER_F_REMOVE_ME if the
+ *                            specified peer should be removed rather than
+ *                            added/updated and/or WGPEER_F_REPLACE_ALLOWEDIPS
+ *                            if all current allowed IPs of this peer should be
+ *                            removed prior to adding the list below.
+ *            WGPEER_A_PRESHARED_KEY: len WG_KEY_LEN, all zeros to remove
+ *            WGPEER_A_ENDPOINT: struct sockaddr_in or struct sockaddr_in6
+ *            WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL: NLA_U16, 0 to disable
+ *            WGPEER_A_ALLOWEDIPS: NLA_NESTED
+ *                0: NLA_NESTED
+ *                    WGALLOWEDIP_A_FAMILY: NLA_U16
+ *                    WGALLOWEDIP_A_IPADDR: struct in_addr or struct in6_addr
+ *                    WGALLOWEDIP_A_CIDR_MASK: NLA_U8
+ *                1: NLA_NESTED
+ *                    ...
+ *                2: NLA_NESTED
+ *                    ...
+ *                ...
+ *            WGPEER_A_PROTOCOL_VERSION: NLA_U32, should not be set or used at
+ *                                       all by most users of this API, as the
+ *                                       most recent protocol will be used when
+ *                                       this is unset. Otherwise, must be set
+ *                                       to 1.
+ *        1: NLA_NESTED
+ *            ...
+ *        ...
+ *
+ * It is possible that the amount of configuration data exceeds that of
+ * the maximum message length accepted by the kernel. In that case, several
+ * messages should be sent one after another, with each successive one
+ * filling in information not contained in the prior. Note that if
+ * WGDEVICE_F_REPLACE_PEERS is specified in the first message, it probably
+ * should not be specified in fragments that come after, so that the list
+ * of peers is only cleared the first time but appened after. Likewise for
+ * peers, if WGPEER_F_REPLACE_ALLOWEDIPS is specified in the first message
+ * of a peer, it likely should not be specified in subsequent fragments.
+ *
+ * If an error occurs, NLMSG_ERROR will reply containing an errno.
+ */
+
+#ifndef _WG_UAPI_WIREGUARD_H
+#define _WG_UAPI_WIREGUARD_H
+
+#define WG_GENL_NAME "wireguard"
+#define WG_GENL_VERSION 1
+
+#define WG_KEY_LEN 32
+
+enum wg_cmd {
+	WG_CMD_GET_DEVICE,
+	WG_CMD_SET_DEVICE,
+	__WG_CMD_MAX
+};
+#define WG_CMD_MAX (__WG_CMD_MAX - 1)
+
+enum wgdevice_flag {
+	WGDEVICE_F_REPLACE_PEERS = 1U << 0
+};
+enum wgdevice_attribute {
+	WGDEVICE_A_UNSPEC,
+	WGDEVICE_A_IFINDEX,
+	WGDEVICE_A_IFNAME,
+	WGDEVICE_A_PRIVATE_KEY,
+	WGDEVICE_A_PUBLIC_KEY,
+	WGDEVICE_A_FLAGS,
+	WGDEVICE_A_LISTEN_PORT,
+	WGDEVICE_A_FWMARK,
+	WGDEVICE_A_PEERS,
+	__WGDEVICE_A_LAST
+};
+#define WGDEVICE_A_MAX (__WGDEVICE_A_LAST - 1)
+
+enum wgpeer_flag {
+	WGPEER_F_REMOVE_ME = 1U << 0,
+	WGPEER_F_REPLACE_ALLOWEDIPS = 1U << 1
+};
+enum wgpeer_attribute {
+	WGPEER_A_UNSPEC,
+	WGPEER_A_PUBLIC_KEY,
+	WGPEER_A_PRESHARED_KEY,
+	WGPEER_A_FLAGS,
+	WGPEER_A_ENDPOINT,
+	WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL,
+	WGPEER_A_LAST_HANDSHAKE_TIME,
+	WGPEER_A_RX_BYTES,
+	WGPEER_A_TX_BYTES,
+	WGPEER_A_ALLOWEDIPS,
+	WGPEER_A_PROTOCOL_VERSION,
+	__WGPEER_A_LAST
+};
+#define WGPEER_A_MAX (__WGPEER_A_LAST - 1)
+
+enum wgallowedip_attribute {
+	WGALLOWEDIP_A_UNSPEC,
+	WGALLOWEDIP_A_FAMILY,
+	WGALLOWEDIP_A_IPADDR,
+	WGALLOWEDIP_A_CIDR_MASK,
+	__WGALLOWEDIP_A_LAST
+};
+#define WGALLOWEDIP_A_MAX (__WGALLOWEDIP_A_LAST - 1)
+
+#endif /* _WG_UAPI_WIREGUARD_H */
diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh
new file mode 100755
index 000000000000..568612c45acc
--- /dev/null
+++ b/tools/testing/selftests/wireguard/netns.sh
@@ -0,0 +1,499 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
+#
+# This script tests the below topology:
+#
+# ┌─────────────────────┐   ┌──────────────────────────────────┐   ┌─────────────────────┐
+# │   $ns1 namespace    │   │          $ns0 namespace          │   │   $ns2 namespace    │
+# │                     │   │                                  │   │                     │
+# │┌────────┐           │   │            ┌────────┐            │   │           ┌────────┐│
+# ││  wg0   │───────────┼───┼────────────│   lo   │────────────┼───┼───────────│  wg0   ││
+# │├────────┴──────────┐│   │    ┌───────┴────────┴────────┐   │   │┌──────────┴────────┤│
+# ││192.168.241.1/24   ││   │    │(ns1)         (ns2)      │   │   ││192.168.241.2/24   ││
+# ││fd00::1/24         ││   │    │127.0.0.1:1   127.0.0.1:2│   │   ││fd00::2/24         ││
+# │└───────────────────┘│   │    │[::]:1        [::]:2     │   │   │└───────────────────┘│
+# └─────────────────────┘   │    └─────────────────────────┘   │   └─────────────────────┘
+#                           └──────────────────────────────────┘
+#
+# After the topology is prepared we run a series of TCP/UDP iperf3 tests between the
+# wireguard peers in $ns1 and $ns2. Note that $ns0 is the endpoint for the wg0
+# interfaces in $ns1 and $ns2. See https://www.wireguard.com/netns/ for further
+# details on how this is accomplished.
+set -e
+
+exec 3>&1
+export WG_HIDE_KEYS=never
+netns0="wg-test-$$-0"
+netns1="wg-test-$$-1"
+netns2="wg-test-$$-2"
+pretty() { echo -e "\x1b[32m\x1b[1m[+] ${1:+NS$1: }${2}\x1b[0m" >&3; }
+pp() { pretty "" "$*"; "$@"; }
+maybe_exec() { if [[ $BASHPID -eq $$ ]]; then "$@"; else exec "$@"; fi; }
+n0() { pretty 0 "$*"; maybe_exec ip netns exec $netns0 "$@"; }
+n1() { pretty 1 "$*"; maybe_exec ip netns exec $netns1 "$@"; }
+n2() { pretty 2 "$*"; maybe_exec ip netns exec $netns2 "$@"; }
+ip0() { pretty 0 "ip $*"; ip -n $netns0 "$@"; }
+ip1() { pretty 1 "ip $*"; ip -n $netns1 "$@"; }
+ip2() { pretty 2 "ip $*"; ip -n $netns2 "$@"; }
+sleep() { read -t "$1" -N 0 || true; }
+waitiperf() { pretty "${1//*-}" "wait for iperf:5201"; while [[ $(ss -N "$1" -tlp 'sport = 5201') != *iperf3* ]]; do sleep 0.1; done; }
+waitncatudp() { pretty "${1//*-}" "wait for udp:1111"; while [[ $(ss -N "$1" -ulp 'sport = 1111') != *ncat* ]]; do sleep 0.1; done; }
+waitncattcp() { pretty "${1//*-}" "wait for tcp:1111"; while [[ $(ss -N "$1" -tlp 'sport = 1111') != *ncat* ]]; do sleep 0.1; done; }
+waitiface() { pretty "${1//*-}" "wait for $2 to come up"; ip netns exec "$1" bash -c "while [[ \$(< \"/sys/class/net/$2/operstate\") != up ]]; do read -t .1 -N 0 || true; done;"; }
+
+cleanup() {
+	set +e
+	exec 2>/dev/null
+	printf "$orig_message_cost" > /proc/sys/net/core/message_cost
+	ip0 link del dev wg0
+	ip1 link del dev wg0
+	ip2 link del dev wg0
+	local to_kill="$(ip netns pids $netns0) $(ip netns pids $netns1) $(ip netns pids $netns2)"
+	[[ -n $to_kill ]] && kill $to_kill
+	pp ip netns del $netns1
+	pp ip netns del $netns2
+	pp ip netns del $netns0
+	exit
+}
+
+orig_message_cost="$(< /proc/sys/net/core/message_cost)"
+trap cleanup EXIT
+printf 0 > /proc/sys/net/core/message_cost
+
+ip netns del $netns0 2>/dev/null || true
+ip netns del $netns1 2>/dev/null || true
+ip netns del $netns2 2>/dev/null || true
+pp ip netns add $netns0
+pp ip netns add $netns1
+pp ip netns add $netns2
+ip0 link set up dev lo
+
+ip0 link add dev wg0 type wireguard
+ip0 link set wg0 netns $netns1
+ip0 link add dev wg0 type wireguard
+ip0 link set wg0 netns $netns2
+key1="$(pp wg genkey)"
+key2="$(pp wg genkey)"
+pub1="$(pp wg pubkey <<<"$key1")"
+pub2="$(pp wg pubkey <<<"$key2")"
+psk="$(pp wg genpsk)"
+[[ -n $key1 && -n $key2 && -n $psk ]]
+
+configure_peers() {
+	ip1 addr add 192.168.241.1/24 dev wg0
+	ip1 addr add fd00::1/24 dev wg0
+
+	ip2 addr add 192.168.241.2/24 dev wg0
+	ip2 addr add fd00::2/24 dev wg0
+
+	n1 wg set wg0 \
+		private-key <(echo "$key1") \
+		listen-port 1 \
+		peer "$pub2" \
+			preshared-key <(echo "$psk") \
+			allowed-ips 192.168.241.2/32,fd00::2/128
+	n2 wg set wg0 \
+		private-key <(echo "$key2") \
+		listen-port 2 \
+		peer "$pub1" \
+			preshared-key <(echo "$psk") \
+			allowed-ips 192.168.241.1/32,fd00::1/128
+
+	ip1 link set up dev wg0
+	ip2 link set up dev wg0
+}
+configure_peers
+
+tests() {
+	# Ping over IPv4
+	n2 ping -c 10 -f -W 1 192.168.241.1
+	n1 ping -c 10 -f -W 1 192.168.241.2
+
+	# Ping over IPv6
+	n2 ping6 -c 10 -f -W 1 fd00::1
+	n1 ping6 -c 10 -f -W 1 fd00::2
+
+	# TCP over IPv4
+	n2 iperf3 -s -1 -B 192.168.241.2 &
+	waitiperf $netns2
+	n1 iperf3 -Z -t 3 -c 192.168.241.2
+
+	# TCP over IPv6
+	n1 iperf3 -s -1 -B fd00::1 &
+	waitiperf $netns1
+	n2 iperf3 -Z -t 3 -c fd00::1
+
+	# UDP over IPv4
+	n1 iperf3 -s -1 -B 192.168.241.1 &
+	waitiperf $netns1
+	n2 iperf3 -Z -t 3 -b 0 -u -c 192.168.241.1
+
+	# UDP over IPv6
+	n2 iperf3 -s -1 -B fd00::2 &
+	waitiperf $netns2
+	n1 iperf3 -Z -t 3 -b 0 -u -c fd00::2
+}
+
+[[ $(ip1 link show dev wg0) =~ mtu\ ([0-9]+) ]] && orig_mtu="${BASH_REMATCH[1]}"
+big_mtu=$(( 34816 - 1500 + $orig_mtu ))
+
+# Test using IPv4 as outer transport
+n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2
+n2 wg set wg0 peer "$pub1" endpoint 127.0.0.1:1
+# Before calling tests, we first make sure that the stats counters are working
+n2 ping -c 10 -f -W 1 192.168.241.1
+{ read _; read _; read _; read rx_bytes _; read _; read tx_bytes _; } < <(ip2 -stats link show dev wg0)
+(( rx_bytes == 1372 && (tx_bytes == 1428 || tx_bytes == 1460) ))
+{ read _; read _; read _; read rx_bytes _; read _; read tx_bytes _; } < <(ip1 -stats link show dev wg0)
+(( tx_bytes == 1372 && (rx_bytes == 1428 || rx_bytes == 1460) ))
+read _ rx_bytes tx_bytes < <(n2 wg show wg0 transfer)
+(( rx_bytes == 1372 && (tx_bytes == 1428 || tx_bytes == 1460) ))
+read _ rx_bytes tx_bytes < <(n1 wg show wg0 transfer)
+(( tx_bytes == 1372 && (rx_bytes == 1428 || rx_bytes == 1460) ))
+
+tests
+ip1 link set wg0 mtu $big_mtu
+ip2 link set wg0 mtu $big_mtu
+tests
+
+ip1 link set wg0 mtu $orig_mtu
+ip2 link set wg0 mtu $orig_mtu
+
+# Test using IPv6 as outer transport
+n1 wg set wg0 peer "$pub2" endpoint [::1]:2
+n2 wg set wg0 peer "$pub1" endpoint [::1]:1
+tests
+ip1 link set wg0 mtu $big_mtu
+ip2 link set wg0 mtu $big_mtu
+tests
+
+# Test that route MTUs work with the padding
+ip1 link set wg0 mtu 1300
+ip2 link set wg0 mtu 1300
+n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2
+n2 wg set wg0 peer "$pub1" endpoint 127.0.0.1:1
+n0 iptables -A INPUT -m length --length 1360 -j DROP
+n1 ip route add 192.168.241.2/32 dev wg0 mtu 1299
+n2 ip route add 192.168.241.1/32 dev wg0 mtu 1299
+n2 ping -c 1 -W 1 -s 1269 192.168.241.1
+n2 ip route delete 192.168.241.1/32 dev wg0 mtu 1299
+n1 ip route delete 192.168.241.2/32 dev wg0 mtu 1299
+n0 iptables -F INPUT
+
+ip1 link set wg0 mtu $orig_mtu
+ip2 link set wg0 mtu $orig_mtu
+
+# Test using IPv4 that roaming works
+ip0 -4 addr del 127.0.0.1/8 dev lo
+ip0 -4 addr add 127.212.121.99/8 dev lo
+n1 wg set wg0 listen-port 9999
+n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2
+n1 ping6 -W 1 -c 1 fd00::2
+[[ $(n2 wg show wg0 endpoints) == "$pub1	127.212.121.99:9999" ]]
+
+# Test using IPv6 that roaming works
+n1 wg set wg0 listen-port 9998
+n1 wg set wg0 peer "$pub2" endpoint [::1]:2
+n1 ping -W 1 -c 1 192.168.241.2
+[[ $(n2 wg show wg0 endpoints) == "$pub1	[::1]:9998" ]]
+
+# Test that crypto-RP filter works
+n1 wg set wg0 peer "$pub2" allowed-ips 192.168.241.0/24
+exec 4< <(n1 ncat -l -u -p 1111)
+nmap_pid=$!
+waitncatudp $netns1
+n2 ncat -u 192.168.241.1 1111 <<<"X"
+read -r -N 1 -t 1 out <&4 && [[ $out == "X" ]]
+kill $nmap_pid
+more_specific_key="$(pp wg genkey | pp wg pubkey)"
+n1 wg set wg0 peer "$more_specific_key" allowed-ips 192.168.241.2/32
+n2 wg set wg0 listen-port 9997
+exec 4< <(n1 ncat -l -u -p 1111)
+nmap_pid=$!
+waitncatudp $netns1
+n2 ncat -u 192.168.241.1 1111 <<<"X"
+! read -r -N 1 -t 1 out <&4 || false
+kill $nmap_pid
+n1 wg set wg0 peer "$more_specific_key" remove
+[[ $(n1 wg show wg0 endpoints) == "$pub2	[::1]:9997" ]]
+
+ip1 link del wg0
+ip2 link del wg0
+
+# Test using NAT. We now change the topology to this:
+# ┌────────────────────────────────────────┐    ┌────────────────────────────────────────────────┐     ┌────────────────────────────────────────┐
+# │             $ns1 namespace             │    │                 $ns0 namespace                 │     │             $ns2 namespace             │
+# │                                        │    │                                                │     │                                        │
+# │  ┌─────┐             ┌─────┐           │    │    ┌──────┐              ┌──────┐              │     │  ┌─────┐            ┌─────┐            │
+# │  │ wg0 │─────────────│vethc│───────────┼────┼────│vethrc│              │vethrs│──────────────┼─────┼──│veths│────────────│ wg0 │            │
+# │  ├─────┴──────────┐  ├─────┴──────────┐│    │    ├──────┴─────────┐    ├──────┴────────────┐ │     │  ├─────┴──────────┐ ├─────┴──────────┐ │
+# │  │192.168.241.1/24│  │192.168.1.100/24││    │    │192.168.1.100/24│    │10.0.0.1/24        │ │     │  │10.0.0.100/24   │ │192.168.241.2/24│ │
+# │  │fd00::1/24      │  │                ││    │    │                │    │SNAT:192.168.1.0/24│ │     │  │                │ │fd00::2/24      │ │
+# │  └────────────────┘  └────────────────┘│    │    └────────────────┘    └───────────────────┘ │     │  └────────────────┘ └────────────────┘ │
+# └────────────────────────────────────────┘    └────────────────────────────────────────────────┘     └────────────────────────────────────────┘
+
+ip1 link add dev wg0 type wireguard
+ip2 link add dev wg0 type wireguard
+configure_peers
+
+ip0 link add vethrc type veth peer name vethc
+ip0 link add vethrs type veth peer name veths
+ip0 link set vethc netns $netns1
+ip0 link set veths netns $netns2
+ip0 link set vethrc up
+ip0 link set vethrs up
+ip0 addr add 192.168.1.1/24 dev vethrc
+ip0 addr add 10.0.0.1/24 dev vethrs
+ip1 addr add 192.168.1.100/24 dev vethc
+ip1 link set vethc up
+ip1 route add default via 192.168.1.1
+ip2 addr add 10.0.0.100/24 dev veths
+ip2 link set veths up
+waitiface $netns0 vethrc
+waitiface $netns0 vethrs
+waitiface $netns1 vethc
+waitiface $netns2 veths
+
+n0 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward'
+n0 bash -c 'printf 2 > /proc/sys/net/netfilter/nf_conntrack_udp_timeout'
+n0 bash -c 'printf 2 > /proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream'
+n0 iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1
+
+n1 wg set wg0 peer "$pub2" endpoint 10.0.0.100:2 persistent-keepalive 1
+n1 ping -W 1 -c 1 192.168.241.2
+n2 ping -W 1 -c 1 192.168.241.1
+[[ $(n2 wg show wg0 endpoints) == "$pub1	10.0.0.1:1" ]]
+# Demonstrate n2 can still send packets to n1, since persistent-keepalive will prevent connection tracking entry from expiring (to see entries: `n0 conntrack -L`).
+pp sleep 3
+n2 ping -W 1 -c 1 192.168.241.1
+
+n0 iptables -t nat -F
+ip0 link del vethrc
+ip0 link del vethrs
+ip1 link del wg0
+ip2 link del wg0
+
+# Test that saddr routing is sticky but not too sticky, changing to this topology:
+# ┌────────────────────────────────────────┐    ┌────────────────────────────────────────┐
+# │             $ns1 namespace             │    │             $ns2 namespace             │
+# │                                        │    │                                        │
+# │  ┌─────┐             ┌─────┐           │    │  ┌─────┐            ┌─────┐            │
+# │  │ wg0 │─────────────│veth1│───────────┼────┼──│veth2│────────────│ wg0 │            │
+# │  ├─────┴──────────┐  ├─────┴──────────┐│    │  ├─────┴──────────┐ ├─────┴──────────┐ │
+# │  │192.168.241.1/24│  │10.0.0.1/24     ││    │  │10.0.0.2/24     │ │192.168.241.2/24│ │
+# │  │fd00::1/24      │  │fd00:aa::1/96   ││    │  │fd00:aa::2/96   │ │fd00::2/24      │ │
+# │  └────────────────┘  └────────────────┘│    │  └────────────────┘ └────────────────┘ │
+# └────────────────────────────────────────┘    └────────────────────────────────────────┘
+
+ip1 link add dev wg0 type wireguard
+ip2 link add dev wg0 type wireguard
+configure_peers
+ip1 link add veth1 type veth peer name veth2
+ip1 link set veth2 netns $netns2
+n1 bash -c 'printf 0 > /proc/sys/net/ipv6/conf/all/accept_dad'
+n2 bash -c 'printf 0 > /proc/sys/net/ipv6/conf/all/accept_dad'
+n1 bash -c 'printf 0 > /proc/sys/net/ipv6/conf/veth1/accept_dad'
+n2 bash -c 'printf 0 > /proc/sys/net/ipv6/conf/veth2/accept_dad'
+n1 bash -c 'printf 1 > /proc/sys/net/ipv4/conf/veth1/promote_secondaries'
+
+# First we check that we aren't overly sticky and can fall over to new IPs when old ones are removed
+ip1 addr add 10.0.0.1/24 dev veth1
+ip1 addr add fd00:aa::1/96 dev veth1
+ip2 addr add 10.0.0.2/24 dev veth2
+ip2 addr add fd00:aa::2/96 dev veth2
+ip1 link set veth1 up
+ip2 link set veth2 up
+waitiface $netns1 veth1
+waitiface $netns2 veth2
+n1 wg set wg0 peer "$pub2" endpoint 10.0.0.2:2
+n1 ping -W 1 -c 1 192.168.241.2
+ip1 addr add 10.0.0.10/24 dev veth1
+ip1 addr del 10.0.0.1/24 dev veth1
+n1 ping -W 1 -c 1 192.168.241.2
+n1 wg set wg0 peer "$pub2" endpoint [fd00:aa::2]:2
+n1 ping -W 1 -c 1 192.168.241.2
+ip1 addr add fd00:aa::10/96 dev veth1
+ip1 addr del fd00:aa::1/96 dev veth1
+n1 ping -W 1 -c 1 192.168.241.2
+
+# Now we show that we can successfully do reply to sender routing
+ip1 link set veth1 down
+ip2 link set veth2 down
+ip1 addr flush dev veth1
+ip2 addr flush dev veth2
+ip1 addr add 10.0.0.1/24 dev veth1
+ip1 addr add 10.0.0.2/24 dev veth1
+ip1 addr add fd00:aa::1/96 dev veth1
+ip1 addr add fd00:aa::2/96 dev veth1
+ip2 addr add 10.0.0.3/24 dev veth2
+ip2 addr add fd00:aa::3/96 dev veth2
+ip1 link set veth1 up
+ip2 link set veth2 up
+waitiface $netns1 veth1
+waitiface $netns2 veth2
+n2 wg set wg0 peer "$pub1" endpoint 10.0.0.1:1
+n2 ping -W 1 -c 1 192.168.241.1
+[[ $(n2 wg show wg0 endpoints) == "$pub1	10.0.0.1:1" ]]
+n2 wg set wg0 peer "$pub1" endpoint [fd00:aa::1]:1
+n2 ping -W 1 -c 1 192.168.241.1
+[[ $(n2 wg show wg0 endpoints) == "$pub1	[fd00:aa::1]:1" ]]
+n2 wg set wg0 peer "$pub1" endpoint 10.0.0.2:1
+n2 ping -W 1 -c 1 192.168.241.1
+[[ $(n2 wg show wg0 endpoints) == "$pub1	10.0.0.2:1" ]]
+n2 wg set wg0 peer "$pub1" endpoint [fd00:aa::2]:1
+n2 ping -W 1 -c 1 192.168.241.1
+[[ $(n2 wg show wg0 endpoints) == "$pub1	[fd00:aa::2]:1" ]]
+
+# What happens if the inbound destination address belongs to a different interface as the default route?
+ip1 link add dummy0 type dummy
+ip1 addr add 10.50.0.1/24 dev dummy0
+ip1 link set dummy0 up
+ip2 route add 10.50.0.0/24 dev veth2
+n2 wg set wg0 peer "$pub1" endpoint 10.50.0.1:1
+n2 ping -W 1 -c 1 192.168.241.1
+[[ $(n2 wg show wg0 endpoints) == "$pub1	10.50.0.1:1" ]]
+
+ip1 link del dummy0
+ip1 addr flush dev veth1
+ip2 addr flush dev veth2
+ip1 route flush dev veth1
+ip2 route flush dev veth2
+
+# Now we see what happens if another interface route takes precedence over an ongoing one
+ip1 link add veth3 type veth peer name veth4
+ip1 link set veth4 netns $netns2
+ip1 addr add 10.0.0.1/24 dev veth1
+ip2 addr add 10.0.0.2/24 dev veth2
+ip1 addr add 10.0.0.3/24 dev veth3
+ip1 link set veth1 up
+ip2 link set veth2 up
+ip1 link set veth3 up
+ip2 link set veth4 up
+waitiface $netns1 veth1
+waitiface $netns2 veth2
+waitiface $netns1 veth3
+waitiface $netns2 veth4
+ip1 route flush dev veth1
+ip1 route flush dev veth3
+ip1 route add 10.0.0.0/24 dev veth1 src 10.0.0.1 metric 2
+n1 wg set wg0 peer "$pub2" endpoint 10.0.0.2:2
+n1 ping -W 1 -c 1 192.168.241.2
+[[ $(n2 wg show wg0 endpoints) == "$pub1	10.0.0.1:1" ]]
+ip1 route add 10.0.0.0/24 dev veth3 src 10.0.0.3 metric 1
+n1 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter'
+n2 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/veth4/rp_filter'
+n1 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/all/rp_filter'
+n2 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/all/rp_filter'
+n1 ping -W 1 -c 1 192.168.241.2
+[[ $(n2 wg show wg0 endpoints) == "$pub1	10.0.0.3:1" ]]
+
+ip1 link del veth1
+ip1 link del veth3
+ip1 link del wg0
+ip2 link del wg0
+
+# We test that Netlink/IPC is working properly by doing things that usually cause split responses
+ip0 link add dev wg0 type wireguard
+config=( "[Interface]" "PrivateKey=$(wg genkey)" "[Peer]" "PublicKey=$(wg genkey)" )
+for a in {1..255}; do
+	for b in {0..255}; do
+		config+=( "AllowedIPs=$a.$b.0.0/16,$a::$b/128" )
+	done
+done
+n0 wg setconf wg0 <(printf '%s\n' "${config[@]}")
+i=0
+for ip in $(n0 wg show wg0 allowed-ips); do
+	((++i))
+done
+((i == 255*256*2+1))
+ip0 link del wg0
+ip0 link add dev wg0 type wireguard
+config=( "[Interface]" "PrivateKey=$(wg genkey)" )
+for a in {1..40}; do
+	config+=( "[Peer]" "PublicKey=$(wg genkey)" )
+	for b in {1..52}; do
+		config+=( "AllowedIPs=$a.$b.0.0/16" )
+	done
+done
+n0 wg setconf wg0 <(printf '%s\n' "${config[@]}")
+i=0
+while read -r line; do
+	j=0
+	for ip in $line; do
+		((++j))
+	done
+	((j == 53))
+	((++i))
+done < <(n0 wg show wg0 allowed-ips)
+((i == 40))
+ip0 link del wg0
+ip0 link add wg0 type wireguard
+config=( )
+for i in {1..29}; do
+	config+=( "[Peer]" "PublicKey=$(wg genkey)" )
+done
+config+=( "[Peer]" "PublicKey=$(wg genkey)" "AllowedIPs=255.2.3.4/32,abcd::255/128" )
+n0 wg setconf wg0 <(printf '%s\n' "${config[@]}")
+n0 wg showconf wg0 > /dev/null
+ip0 link del wg0
+
+allowedips=( )
+for i in {1..197}; do
+        allowedips+=( abcd::$i )
+done
+saved_ifs="$IFS"
+IFS=,
+allowedips="${allowedips[*]}"
+IFS="$saved_ifs"
+ip0 link add wg0 type wireguard
+n0 wg set wg0 peer "$pub1"
+n0 wg set wg0 peer "$pub2" allowed-ips "$allowedips"
+{
+	read -r pub allowedips
+	[[ $pub == "$pub1" && $allowedips == "(none)" ]]
+	read -r pub allowedips
+	[[ $pub == "$pub2" ]]
+	i=0
+	for _ in $allowedips; do
+		((++i))
+	done
+	((i == 197))
+} < <(n0 wg show wg0 allowed-ips)
+ip0 link del wg0
+
+! n0 wg show doesnotexist || false
+
+ip0 link add wg0 type wireguard
+n0 wg set wg0 private-key <(echo "$key1") peer "$pub2" preshared-key <(echo "$psk")
+[[ $(n0 wg show wg0 private-key) == "$key1" ]]
+[[ $(n0 wg show wg0 preshared-keys) == "$pub2	$psk" ]]
+n0 wg set wg0 private-key /dev/null peer "$pub2" preshared-key /dev/null
+[[ $(n0 wg show wg0 private-key) == "(none)" ]]
+[[ $(n0 wg show wg0 preshared-keys) == "$pub2	(none)" ]]
+n0 wg set wg0 peer "$pub2"
+n0 wg set wg0 private-key <(echo "$key2")
+[[ $(n0 wg show wg0 public-key) == "$pub2" ]]
+[[ -z $(n0 wg show wg0 peers) ]]
+n0 wg set wg0 peer "$pub2"
+[[ -z $(n0 wg show wg0 peers) ]]
+n0 wg set wg0 private-key <(echo "$key1")
+n0 wg set wg0 peer "$pub2"
+[[ $(n0 wg show wg0 peers) == "$pub2" ]]
+ip0 link del wg0
+
+declare -A objects
+while read -t 0.1 -r line 2>/dev/null || [[ $? -ne 142 ]]; do
+	[[ $line =~ .*(wg[0-9]+:\ [A-Z][a-z]+\ [0-9]+)\ .*(created|destroyed).* ]] || continue
+	objects["${BASH_REMATCH[1]}"]+="${BASH_REMATCH[2]}"
+done < /dev/kmsg
+alldeleted=1
+for object in "${!objects[@]}"; do
+	if [[ ${objects["$object"]} != *createddestroyed ]]; then
+		echo "Error: $object: merely ${objects["$object"]}" >&3
+		alldeleted=0
+	fi
+done
+[[ $alldeleted -eq 1 ]]
+pretty "" "Objects that were created were also destroyed."
-- 
2.19.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations
  2018-09-14 16:22 ` [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations Jason A. Donenfeld
@ 2018-09-14 17:27   ` Ard Biesheuvel
  2018-09-14 17:45     ` Jason A. Donenfeld
  0 siblings, 1 reply; 38+ messages in thread
From: Ard Biesheuvel @ 2018-09-14 17:27 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linux Kernel Mailing List, <netdev@vger.kernel.org>,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, David S. Miller,
	Greg Kroah-Hartman, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Andy Polyakov, Russell King,
	linux-arm-kernel

On 14 September 2018 at 18:22, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> These NEON and non-NEON implementations come from Andy Polyakov's
> implementation. They are exactly the same as Andy Polyakov's original,
> with the following exceptions:
>
> - Entries and exits use the proper kernel convention macro.
> - CPU feature checking is done in C by the glue code, so that has been
>   removed from the assembly.
> - The function names have been renamed to fit kernel conventions.
> - Labels have been renamed to fit kernel conventions.
> - The neon code can jump to the scalar code when it makes sense to do
>   so.
>
> After '/^#/d;/^\..*[^:]$/d', the code has the following diff in actual
> instructions from the original.
>

As I asked in response to v3, could we please have this as a separate
patch on top? The diff below is corrupted.

Also, both Andy and Eric have offered to get involved in upstreaming
these changes to OpenSSL, so there is no delta to begin with.

> ARM:
>
> -poly1305_init:
> -.Lpoly1305_init:
> +ENTRY(poly1305_init_arm)
>         stmdb   sp!,{r4-r11}
>
>         eor     r3,r3,r3
> @@ -18,8 +25,6 @@
>         moveq   r0,#0
>         beq     .Lno_key
>
> -       adr     r11,.Lpoly1305_init
> -       ldr     r12,.LOPENSSL_armcap
>         ldrb    r4,[r1,#0]
>         mov     r10,#0x0fffffff
>         ldrb    r5,[r1,#1]
> @@ -34,8 +39,6 @@
>         ldrb    r7,[r1,#6]
>         and     r4,r4,r10
>
> -       ldr     r12,[r11,r12]           @ OPENSSL_armcap_P
> -       ldr     r12,[r12]
>         ldrb    r8,[r1,#7]
>         orr     r5,r5,r6,lsl#8
>         ldrb    r6,[r1,#8]
> @@ -45,22 +48,6 @@
>         ldrb    r8,[r1,#10]
>         and     r5,r5,r3
>
> -       tst     r12,#ARMV7_NEON         @ check for NEON
> -       adr     r9,poly1305_blocks_neon
> -       adr     r11,poly1305_blocks
> -       it      ne
> -       movne   r11,r9
> -       adr     r12,poly1305_emit
> -       adr     r10,poly1305_emit_neon
> -       it      ne
> -       movne   r12,r10
> -       itete   eq
> -       addeq   r12,r11,#(poly1305_emit-.Lpoly1305_init)
> -       addne   r12,r11,#(poly1305_emit_neon-.Lpoly1305_init)
> -       addeq   r11,r11,#(poly1305_blocks-.Lpoly1305_init)
> -       addne   r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init)
> -       orr     r12,r12,#1      @ thumb-ify address
> -       orr     r11,r11,#1
>         ldrb    r9,[r1,#11]
>         orr     r6,r6,r7,lsl#8
>         ldrb    r7,[r1,#12]
> @@ -79,17 +66,16 @@
>         str     r6,[r0,#8]
>         and     r7,r7,r3
>         str     r7,[r0,#12]
> -       stmia   r2,{r11,r12}            @ fill functions table
> -       mov     r0,#1
> -       mov     r0,#0
>  .Lno_key:
>         ldmia   sp!,{r4-r11}
>         bx      lr                              @ bx    lr
>         tst     lr,#1
>         moveq   pc,lr                   @ be binary compatible with V4, yet
>         .word   0xe12fff1e                      @ interoperable with Thumb ISA:-)
> -poly1305_blocks:
> -.Lpoly1305_blocks:
> +ENDPROC(poly1305_init_arm)
> +
> +ENTRY(poly1305_blocks_arm)
> +.Lpoly1305_blocks_arm:
>         stmdb   sp!,{r3-r11,lr}
>
>         ands    r2,r2,#-16
> @@ -231,10 +217,11 @@
>         tst     lr,#1
>         moveq   pc,lr                   @ be binary compatible with V4, yet
>         .word   0xe12fff1e                      @ interoperable with Thumb ISA:-)
> -poly1305_emit:
> +ENDPROC(poly1305_blocks_arm)
> +
> +ENTRY(poly1305_emit_arm)
>         stmdb   sp!,{r4-r11}
>  .Lpoly1305_emit_enter:
> -
>         ldmia   r0,{r3-r7}
>         adds    r8,r3,#5                @ compare to modulus
>         adcs    r9,r4,#0
> @@ -305,8 +292,12 @@
>         tst     lr,#1
>         moveq   pc,lr                   @ be binary compatible with V4, yet
>         .word   0xe12fff1e                      @ interoperable with Thumb ISA:-)
> +ENDPROC(poly1305_emit_arm)
> +
> +
>
> -poly1305_init_neon:
> +ENTRY(poly1305_init_neon)
> +.Lpoly1305_init_neon:
>         ldr     r4,[r0,#20]             @ load key base 2^32
>         ldr     r5,[r0,#24]
>         ldr     r6,[r0,#28]
> @@ -515,8 +506,9 @@
>         vst1.32         {d8[1]},[r7]
>
>         bx      lr                              @ bx    lr
> +ENDPROC(poly1305_init_neon)
>
> -poly1305_blocks_neon:
> +ENTRY(poly1305_blocks_neon)
>         ldr     ip,[r0,#36]             @ is_base2_26
>         ands    r2,r2,#-16
>         beq     .Lno_data_neon
> @@ -524,7 +516,7 @@
>         cmp     r2,#64
>         bhs     .Lenter_neon
>         tst     ip,ip                   @ is_base2_26?
> -       beq     .Lpoly1305_blocks
> +       beq     .Lpoly1305_blocks_arm
>
>  .Lenter_neon:
>         stmdb   sp!,{r4-r7}
> @@ -534,7 +526,7 @@
>         bne     .Lbase2_26_neon
>
>         stmdb   sp!,{r1-r3,lr}
> -       bl      poly1305_init_neon
> +       bl      .Lpoly1305_init_neon
>
>         ldr     r4,[r0,#0]              @ load hash value base 2^32
>         ldr     r5,[r0,#4]
> @@ -989,8 +981,9 @@
>         ldmia   sp!,{r4-r7}
>  .Lno_data_neon:
>         bx      lr                                      @ bx    lr
> +ENDPROC(poly1305_blocks_neon)
>
> -poly1305_emit_neon:
> +ENTRY(poly1305_emit_neon)
>         ldr     ip,[r0,#36]             @ is_base2_26
>
>         stmdb   sp!,{r4-r11}
> @@ -1055,6 +1048,6 @@
>
>         ldmia   sp!,{r4-r11}
>         bx      lr                              @ bx    lr
> +ENDPROC(poly1305_emit_neon)
>
> ARM64:
>
> -poly1305_init:
> +ENTRY(poly1305_init_arm)
>         cmp     x1,xzr
>         stp     xzr,xzr,[x0]            // zero hash value
>         stp     xzr,xzr,[x0,#16]        // [along with is_base2_26]
> @@ -11,14 +15,9 @@
>         csel    x0,xzr,x0,eq
>         b.eq    .Lno_key
>
> -       ldrsw   x11,.LOPENSSL_armcap_P
> -       ldr     x11,.LOPENSSL_armcap_P

In the original, this looks like

#ifdef __ILP32__
        ldrsw $t1,.LOPENSSL_armcap_P
#else
        ldr $t1,.LOPENSSL_armcap_P
#endif


so I guess git commit ate those lines.

> -       adr     x10,.LOPENSSL_armcap_P
> -
>         ldp     x7,x8,[x1]              // load key
>         mov     x9,#0xfffffffc0fffffff
>         movk    x9,#0x0fff,lsl#48
> -       ldr     w17,[x10,x11]
>         rev     x7,x7                   // flip bytes
>         rev     x8,x8
>         and     x7,x7,x9                // &=0ffffffc0fffffff
> @@ -26,24 +25,11 @@
>         and     x8,x8,x9                // &=0ffffffc0ffffffc
>         stp     x7,x8,[x0,#32]  // save key value
>
> -       tst     w17,#ARMV7_NEON
> -
> -       adr     x12,poly1305_blocks
> -       adr     x7,poly1305_blocks_neon
> -       adr     x13,poly1305_emit
> -       adr     x8,poly1305_emit_neon
> -
> -       csel    x12,x12,x7,eq
> -       csel    x13,x13,x8,eq
> -
> -       stp     w12,w13,[x2]
> -       stp     x12,x13,[x2]
> -
> -       mov     x0,#1
>  .Lno_key:
>         ret
> +ENDPROC(poly1305_init_arm)
>
> -poly1305_blocks:
> +ENTRY(poly1305_blocks_arm)
>         ands    x2,x2,#-16
>         b.eq    .Lno_data
>
> @@ -100,8 +86,9 @@
>
>  .Lno_data:
>         ret
> +ENDPROC(poly1305_blocks_arm)
>
> -poly1305_emit:
> +ENTRY(poly1305_emit_arm)
>         ldp     x4,x5,[x0]              // load hash base 2^64
>         ldr     x6,[x0,#16]
>         ldp     x10,x11,[x2]    // load nonce
> @@ -124,7 +111,9 @@
>         stp     x4,x5,[x1]              // write result
>
>         ret
> -poly1305_mult:
> +ENDPROC(poly1305_emit_arm)
> +
> +__poly1305_mult:
>         mul     x12,x4,x7               // h0*r0
>         umulh   x13,x4,x7
>
> @@ -158,7 +147,7 @@
>
>         ret
>
> -poly1305_splat:
> +__poly1305_splat:
>         and     x12,x4,#0x03ffffff      // base 2^64 -> base 2^26
>         ubfx    x13,x4,#26,#26
>         extr    x14,x5,x4,#52
> @@ -182,11 +171,11 @@
>
>         ret
>
> -poly1305_blocks_neon:
> +ENTRY(poly1305_blocks_neon)
>         ldr     x17,[x0,#24]
>         cmp     x2,#128
>         b.hs    .Lblocks_neon
> -       cbz     x17,poly1305_blocks
> +       cbz     x17,poly1305_blocks_arm
>
>  .Lblocks_neon:
>         stp     x29,x30,[sp,#-80]!
> @@ -232,7 +221,7 @@
>         adcs    x5,x5,x13
>         adc     x6,x6,x3
>
> -       bl      poly1305_mult
> +       bl      __poly1305_mult
>         ldr     x30,[sp,#8]
>
>         cbz     x3,.Lstore_base2_64_neon
> @@ -274,7 +263,7 @@
>         adcs    x5,x5,x13
>         adc     x6,x6,x3
>
> -       bl      poly1305_mult
> +       bl      __poly1305_mult
>
>  .Linit_neon:
>         and     x10,x4,#0x03ffffff      // base 2^64 -> base 2^26
> @@ -301,19 +290,19 @@
>         mov     x5,x8
>         mov     x6,xzr
>         add     x0,x0,#48+12
> -       bl      poly1305_splat
> +       bl      __poly1305_splat
>
> -       bl      poly1305_mult           // r^2
> +       bl      __poly1305_mult         // r^2
>         sub     x0,x0,#4
> -       bl      poly1305_splat
> +       bl      __poly1305_splat
>
> -       bl      poly1305_mult           // r^3
> +       bl      __poly1305_mult         // r^3
>         sub     x0,x0,#4
> -       bl      poly1305_splat
> +       bl      __poly1305_splat
>
> -       bl      poly1305_mult           // r^4
> +       bl      __poly1305_mult         // r^4
>         sub     x0,x0,#4
> -       bl      poly1305_splat
> +       bl      __poly1305_splat
>         ldr     x30,[sp,#8]
>
>         add     x16,x1,#32
> @@ -743,10 +732,11 @@
>  .Lno_data_neon:
>         ldr     x29,[sp],#80
>         ret
> +ENDPROC(poly1305_blocks_neon)
>
> -poly1305_emit_neon:
> +ENTRY(poly1305_emit_neon)
>         ldr     x17,[x0,#24]
> -       cbz     x17,poly1305_emit
> +       cbz     x17,poly1305_emit_arm
>
>         ldp     w10,w11,[x0]            // load hash value base 2^26
>         ldp     w12,w13,[x0,#8]
> @@ -788,6 +778,6 @@
>         stp     x4,x5,[x1]              // write result
>
>         ret
> +ENDPROC(poly1305_emit_neon)
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Samuel Neves <sneves@dei.uc.pt>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
> Cc: Andy Polyakov <appro@openssl.org>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: linux-arm-kernel@lists.infradead.org
> ---
>  lib/zinc/Makefile                     |    8 +
>  lib/zinc/poly1305/poly1305-arm-glue.h |   69 ++
>  lib/zinc/poly1305/poly1305-arm.S      | 1117 +++++++++++++++++++++++++
>  lib/zinc/poly1305/poly1305-arm64.S    |  822 ++++++++++++++++++
>  4 files changed, 2016 insertions(+)
>  create mode 100644 lib/zinc/poly1305/poly1305-arm-glue.h
>  create mode 100644 lib/zinc/poly1305/poly1305-arm.S
>  create mode 100644 lib/zinc/poly1305/poly1305-arm64.S
>
> diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile
> index d1e3892e06d9..f37df89a3f87 100644
> --- a/lib/zinc/Makefile
> +++ b/lib/zinc/Makefile
> @@ -25,6 +25,14 @@ endif
>
>  ifeq ($(CONFIG_ZINC_POLY1305),y)
>  zinc-y += poly1305/poly1305.o
> +ifeq ($(CONFIG_ZINC_ARCH_ARM),y)
> +zinc-y += poly1305/poly1305-arm.o
> +CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-arm-glue.h
> +endif
> +ifeq ($(CONFIG_ZINC_ARCH_ARM64),y)
> +zinc-y += poly1305/poly1305-arm64.o
> +CFLAGS_poly1305.o += -include $(srctree)/$(src)/poly1305/poly1305-arm-glue.h
> +endif
>  endif
>

I still don't like the GCC -includes, especially because these .h
files contain function and variable definitions so they are not
actually header files to begin with.

Also, you mentioned in the commit log that you got rid of defines and
made the code more modular, but as far as I can tell, libzinc is still
a single monolithic binary that is essentially always builtin once we
move random.c to it.

>  zinc-y += main.o
> diff --git a/lib/zinc/poly1305/poly1305-arm-glue.h b/lib/zinc/poly1305/poly1305-arm-glue.h
> new file mode 100644
> index 000000000000..53f8fec7f858
> --- /dev/null
> +++ b/lib/zinc/poly1305/poly1305-arm-glue.h
> @@ -0,0 +1,69 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + */
> +
> +#include <zinc/poly1305.h>
> +#include <asm/hwcap.h>
> +#include <asm/neon.h>
> +
> +asmlinkage void poly1305_init_arm(void *ctx, const u8 key[16]);
> +asmlinkage void poly1305_blocks_arm(void *ctx, const u8 *inp, const size_t len,
> +                                   const u32 padbit);
> +asmlinkage void poly1305_emit_arm(void *ctx, u8 mac[16], const u32 nonce[4]);
> +#if IS_ENABLED(CONFIG_KERNEL_MODE_NEON) &&                                     \
> +       (defined(CONFIG_64BIT) || __LINUX_ARM_ARCH__ >= 7)
> +#define ARM_USE_NEON
> +asmlinkage void poly1305_blocks_neon(void *ctx, const u8 *inp, const size_t len,
> +                                    const u32 padbit);
> +asmlinkage void poly1305_emit_neon(void *ctx, u8 mac[16], const u32 nonce[4]);
> +#endif
> +
> +static bool poly1305_use_neon __ro_after_init;
> +
> +void __init poly1305_fpu_init(void)
> +{
> +#if defined(CONFIG_ARM64)
> +       poly1305_use_neon = elf_hwcap & HWCAP_ASIMD;
> +#elif defined(CONFIG_ARM)
> +       poly1305_use_neon = elf_hwcap & HWCAP_NEON;
> +#endif
> +}
> +
> +static inline bool poly1305_init_arch(void *ctx,
> +                                     const u8 key[POLY1305_KEY_SIZE],
> +                                     simd_context_t simd_context)
> +{
> +       poly1305_init_arm(ctx, key);
> +       return true;
> +}
> +
> +static inline bool poly1305_blocks_arch(void *ctx, const u8 *inp,
> +                                       const size_t len, const u32 padbit,
> +                                       simd_context_t simd_context)
> +{
> +#if defined(ARM_USE_NEON)
> +       if (simd_context == HAVE_FULL_SIMD && poly1305_use_neon) {
> +               poly1305_blocks_neon(ctx, inp, len, padbit);
> +               return true;
> +       }
> +#endif
> +       poly1305_blocks_arm(ctx, inp, len, padbit);
> +       return true;
> +}
> +
> +static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE],
> +                                     const u32 nonce[4],
> +                                     simd_context_t simd_context)
> +{
> +#if defined(ARM_USE_NEON)
> +       if (simd_context == HAVE_FULL_SIMD && poly1305_use_neon) {
> +               poly1305_emit_neon(ctx, mac, nonce);
> +               return true;
> +       }
> +#endif
> +       poly1305_emit_arm(ctx, mac, nonce);
> +       return true;
> +}
> +
> +#define HAVE_POLY1305_ARCH_IMPLEMENTATION

We shouldn't #define HAVE_xxx constants in code but only in Kconfig.

> diff --git a/lib/zinc/poly1305/poly1305-arm.S b/lib/zinc/poly1305/poly1305-arm.S
> new file mode 100644
> index 000000000000..110f4317b5d7
> --- /dev/null
> +++ b/lib/zinc/poly1305/poly1305-arm.S
> @@ -0,0 +1,1117 @@
> +/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> + *
> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
> + *
> + * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
> + */
> +
> +#include <linux/linkage.h>
> +
> +.text
> +#if defined(__thumb2__)
> +.syntax        unified
> +.thumb
> +#else
> +.code  32
> +#endif
> +
> +.align 5
> +ENTRY(poly1305_init_arm)
> +       stmdb   sp!,{r4-r11}
> +
> +       eor     r3,r3,r3
> +       cmp     r1,#0
> +       str     r3,[r0,#0]              @ zero hash value
> +       str     r3,[r0,#4]
> +       str     r3,[r0,#8]
> +       str     r3,[r0,#12]
> +       str     r3,[r0,#16]
> +       str     r3,[r0,#36]             @ is_base2_26
> +       add     r0,r0,#20
> +
> +#ifdef __thumb2__
> +       it      eq
> +#endif
> +       moveq   r0,#0
> +       beq     .Lno_key
> +
> +       ldrb    r4,[r1,#0]
> +       mov     r10,#0x0fffffff
> +       ldrb    r5,[r1,#1]
> +       and     r3,r10,#-4              @ 0x0ffffffc
> +       ldrb    r6,[r1,#2]
> +       ldrb    r7,[r1,#3]
> +       orr     r4,r4,r5,lsl#8
> +       ldrb    r5,[r1,#4]
> +       orr     r4,r4,r6,lsl#16
> +       ldrb    r6,[r1,#5]
> +       orr     r4,r4,r7,lsl#24
> +       ldrb    r7,[r1,#6]
> +       and     r4,r4,r10
> +
> +       ldrb    r8,[r1,#7]
> +       orr     r5,r5,r6,lsl#8
> +       ldrb    r6,[r1,#8]
> +       orr     r5,r5,r7,lsl#16
> +       ldrb    r7,[r1,#9]
> +       orr     r5,r5,r8,lsl#24
> +       ldrb    r8,[r1,#10]
> +       and     r5,r5,r3
> +
> +       ldrb    r9,[r1,#11]
> +       orr     r6,r6,r7,lsl#8
> +       ldrb    r7,[r1,#12]
> +       orr     r6,r6,r8,lsl#16
> +       ldrb    r8,[r1,#13]
> +       orr     r6,r6,r9,lsl#24
> +       ldrb    r9,[r1,#14]
> +       and     r6,r6,r3
> +
> +       ldrb    r10,[r1,#15]
> +       orr     r7,r7,r8,lsl#8
> +       str     r4,[r0,#0]
> +       orr     r7,r7,r9,lsl#16
> +       str     r5,[r0,#4]
> +       orr     r7,r7,r10,lsl#24
> +       str     r6,[r0,#8]
> +       and     r7,r7,r3
> +       str     r7,[r0,#12]
> +.Lno_key:
> +       ldmia   sp!,{r4-r11}
> +#if __LINUX_ARM_ARCH__ >= 5
> +       bx      lr                              @ bx    lr
> +#else
> +       tst     lr,#1
> +       moveq   pc,lr                   @ be binary compatible with V4, yet
> +       .word   0xe12fff1e                      @ interoperable with Thumb ISA:-)
> +#endif
> +ENDPROC(poly1305_init_arm)
> +
> +.align 5
> +ENTRY(poly1305_blocks_arm)
> +.Lpoly1305_blocks_arm:
> +       stmdb   sp!,{r3-r11,lr}
> +
> +       ands    r2,r2,#-16
> +       beq     .Lno_data
> +
> +       cmp     r3,#0
> +       add     r2,r2,r1                @ end pointer
> +       sub     sp,sp,#32
> +
> +       ldmia   r0,{r4-r12}             @ load context
> +
> +       str     r0,[sp,#12]             @ offload stuff
> +       mov     lr,r1
> +       str     r2,[sp,#16]
> +       str     r10,[sp,#20]
> +       str     r11,[sp,#24]
> +       str     r12,[sp,#28]
> +       b       .Loop
> +
> +.Loop:
> +#if __LINUX_ARM_ARCH__ < 7
> +       ldrb    r0,[lr],#16             @ load input
> +#ifdef __thumb2__
> +       it      hi
> +#endif
> +       addhi   r8,r8,#1                @ 1<<128
> +       ldrb    r1,[lr,#-15]
> +       ldrb    r2,[lr,#-14]
> +       ldrb    r3,[lr,#-13]
> +       orr     r1,r0,r1,lsl#8
> +       ldrb    r0,[lr,#-12]
> +       orr     r2,r1,r2,lsl#16
> +       ldrb    r1,[lr,#-11]
> +       orr     r3,r2,r3,lsl#24
> +       ldrb    r2,[lr,#-10]
> +       adds    r4,r4,r3                @ accumulate input
> +
> +       ldrb    r3,[lr,#-9]
> +       orr     r1,r0,r1,lsl#8
> +       ldrb    r0,[lr,#-8]
> +       orr     r2,r1,r2,lsl#16
> +       ldrb    r1,[lr,#-7]
> +       orr     r3,r2,r3,lsl#24
> +       ldrb    r2,[lr,#-6]
> +       adcs    r5,r5,r3
> +
> +       ldrb    r3,[lr,#-5]
> +       orr     r1,r0,r1,lsl#8
> +       ldrb    r0,[lr,#-4]
> +       orr     r2,r1,r2,lsl#16
> +       ldrb    r1,[lr,#-3]
> +       orr     r3,r2,r3,lsl#24
> +       ldrb    r2,[lr,#-2]
> +       adcs    r6,r6,r3
> +
> +       ldrb    r3,[lr,#-1]
> +       orr     r1,r0,r1,lsl#8
> +       str     lr,[sp,#8]              @ offload input pointer
> +       orr     r2,r1,r2,lsl#16
> +       add     r10,r10,r10,lsr#2
> +       orr     r3,r2,r3,lsl#24
> +#else
> +       ldr     r0,[lr],#16             @ load input
> +#ifdef __thumb2__
> +       it      hi
> +#endif
> +       addhi   r8,r8,#1                @ padbit
> +       ldr     r1,[lr,#-12]
> +       ldr     r2,[lr,#-8]
> +       ldr     r3,[lr,#-4]
> +#ifdef __ARMEB__
> +       rev     r0,r0
> +       rev     r1,r1
> +       rev     r2,r2
> +       rev     r3,r3
> +#endif
> +       adds    r4,r4,r0                @ accumulate input
> +       str     lr,[sp,#8]              @ offload input pointer
> +       adcs    r5,r5,r1
> +       add     r10,r10,r10,lsr#2
> +       adcs    r6,r6,r2
> +#endif
> +       add     r11,r11,r11,lsr#2
> +       adcs    r7,r7,r3
> +       add     r12,r12,r12,lsr#2
> +
> +       umull   r2,r3,r5,r9
> +        adc    r8,r8,#0
> +       umull   r0,r1,r4,r9
> +       umlal   r2,r3,r8,r10
> +       umlal   r0,r1,r7,r10
> +       ldr     r10,[sp,#20]            @ reload r10
> +       umlal   r2,r3,r6,r12
> +       umlal   r0,r1,r5,r12
> +       umlal   r2,r3,r7,r11
> +       umlal   r0,r1,r6,r11
> +       umlal   r2,r3,r4,r10
> +       str     r0,[sp,#0]              @ future r4
> +        mul    r0,r11,r8
> +       ldr     r11,[sp,#24]            @ reload r11
> +       adds    r2,r2,r1                @ d1+=d0>>32
> +        eor    r1,r1,r1
> +       adc     lr,r3,#0                @ future r6
> +       str     r2,[sp,#4]              @ future r5
> +
> +       mul     r2,r12,r8
> +       eor     r3,r3,r3
> +       umlal   r0,r1,r7,r12
> +       ldr     r12,[sp,#28]            @ reload r12
> +       umlal   r2,r3,r7,r9
> +       umlal   r0,r1,r6,r9
> +       umlal   r2,r3,r6,r10
> +       umlal   r0,r1,r5,r10
> +       umlal   r2,r3,r5,r11
> +       umlal   r0,r1,r4,r11
> +       umlal   r2,r3,r4,r12
> +       ldr     r4,[sp,#0]
> +       mul     r8,r9,r8
> +       ldr     r5,[sp,#4]
> +
> +       adds    r6,lr,r0                @ d2+=d1>>32
> +       ldr     lr,[sp,#8]              @ reload input pointer
> +       adc     r1,r1,#0
> +       adds    r7,r2,r1                @ d3+=d2>>32
> +       ldr     r0,[sp,#16]             @ reload end pointer
> +       adc     r3,r3,#0
> +       add     r8,r8,r3                @ h4+=d3>>32
> +
> +       and     r1,r8,#-4
> +       and     r8,r8,#3
> +       add     r1,r1,r1,lsr#2          @ *=5
> +       adds    r4,r4,r1
> +       adcs    r5,r5,#0
> +       adcs    r6,r6,#0
> +       adcs    r7,r7,#0
> +       adc     r8,r8,#0
> +
> +       cmp     r0,lr                   @ done yet?
> +       bhi     .Loop
> +
> +       ldr     r0,[sp,#12]
> +       add     sp,sp,#32
> +       stmia   r0,{r4-r8}              @ store the result
> +
> +.Lno_data:
> +#if __LINUX_ARM_ARCH__ >= 5
> +       ldmia   sp!,{r3-r11,pc}
> +#else
> +       ldmia   sp!,{r3-r11,lr}
> +       tst     lr,#1
> +       moveq   pc,lr                   @ be binary compatible with V4, yet
> +       .word   0xe12fff1e                      @ interoperable with Thumb ISA:-)
> +#endif
> +ENDPROC(poly1305_blocks_arm)
> +
> +.align 5
> +ENTRY(poly1305_emit_arm)
> +       stmdb   sp!,{r4-r11}
> +.Lpoly1305_emit_enter:
> +       ldmia   r0,{r3-r7}
> +       adds    r8,r3,#5                @ compare to modulus
> +       adcs    r9,r4,#0
> +       adcs    r10,r5,#0
> +       adcs    r11,r6,#0
> +       adc     r7,r7,#0
> +       tst     r7,#4                   @ did it carry/borrow?
> +
> +#ifdef __thumb2__
> +       it      ne
> +#endif
> +       movne   r3,r8
> +       ldr     r8,[r2,#0]
> +#ifdef __thumb2__
> +       it      ne
> +#endif
> +       movne   r4,r9
> +       ldr     r9,[r2,#4]
> +#ifdef __thumb2__
> +       it      ne
> +#endif
> +       movne   r5,r10
> +       ldr     r10,[r2,#8]
> +#ifdef __thumb2__
> +       it      ne
> +#endif
> +       movne   r6,r11
> +       ldr     r11,[r2,#12]
> +
> +       adds    r3,r3,r8
> +       adcs    r4,r4,r9
> +       adcs    r5,r5,r10
> +       adc     r6,r6,r11
> +
> +#if __LINUX_ARM_ARCH__ >= 7
> +#ifdef __ARMEB__
> +       rev     r3,r3
> +       rev     r4,r4
> +       rev     r5,r5
> +       rev     r6,r6
> +#endif
> +       str     r3,[r1,#0]
> +       str     r4,[r1,#4]
> +       str     r5,[r1,#8]
> +       str     r6,[r1,#12]
> +#else
> +       strb    r3,[r1,#0]
> +       mov     r3,r3,lsr#8
> +       strb    r4,[r1,#4]
> +       mov     r4,r4,lsr#8
> +       strb    r5,[r1,#8]
> +       mov     r5,r5,lsr#8
> +       strb    r6,[r1,#12]
> +       mov     r6,r6,lsr#8
> +
> +       strb    r3,[r1,#1]
> +       mov     r3,r3,lsr#8
> +       strb    r4,[r1,#5]
> +       mov     r4,r4,lsr#8
> +       strb    r5,[r1,#9]
> +       mov     r5,r5,lsr#8
> +       strb    r6,[r1,#13]
> +       mov     r6,r6,lsr#8
> +
> +       strb    r3,[r1,#2]
> +       mov     r3,r3,lsr#8
> +       strb    r4,[r1,#6]
> +       mov     r4,r4,lsr#8
> +       strb    r5,[r1,#10]
> +       mov     r5,r5,lsr#8
> +       strb    r6,[r1,#14]
> +       mov     r6,r6,lsr#8
> +
> +       strb    r3,[r1,#3]
> +       strb    r4,[r1,#7]
> +       strb    r5,[r1,#11]
> +       strb    r6,[r1,#15]
> +#endif
> +       ldmia   sp!,{r4-r11}
> +#if __LINUX_ARM_ARCH__ >= 5
> +       bx      lr                              @ bx    lr
> +#else
> +       tst     lr,#1
> +       moveq   pc,lr                   @ be binary compatible with V4, yet
> +       .word   0xe12fff1e                      @ interoperable with Thumb ISA:-)
> +#endif
> +ENDPROC(poly1305_emit_arm)
> +
> +
> +#if __LINUX_ARM_ARCH__ >= 7
> +.fpu   neon
> +
> +.align 5
> +ENTRY(poly1305_init_neon)
> +.Lpoly1305_init_neon:
> +       ldr     r4,[r0,#20]             @ load key base 2^32
> +       ldr     r5,[r0,#24]
> +       ldr     r6,[r0,#28]
> +       ldr     r7,[r0,#32]
> +
> +       and     r2,r4,#0x03ffffff       @ base 2^32 -> base 2^26
> +       mov     r3,r4,lsr#26
> +       mov     r4,r5,lsr#20
> +       orr     r3,r3,r5,lsl#6
> +       mov     r5,r6,lsr#14
> +       orr     r4,r4,r6,lsl#12
> +       mov     r6,r7,lsr#8
> +       orr     r5,r5,r7,lsl#18
> +       and     r3,r3,#0x03ffffff
> +       and     r4,r4,#0x03ffffff
> +       and     r5,r5,#0x03ffffff
> +
> +       vdup.32 d0,r2                   @ r^1 in both lanes
> +       add     r2,r3,r3,lsl#2          @ *5
> +       vdup.32 d1,r3
> +       add     r3,r4,r4,lsl#2
> +       vdup.32 d2,r2
> +       vdup.32 d3,r4
> +       add     r4,r5,r5,lsl#2
> +       vdup.32 d4,r3
> +       vdup.32 d5,r5
> +       add     r5,r6,r6,lsl#2
> +       vdup.32 d6,r4
> +       vdup.32 d7,r6
> +       vdup.32 d8,r5
> +
> +       mov     r5,#2           @ counter
> +
> +.Lsquare_neon:
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
> +       @ d1 = h1*r0 + h0*r1   + h4*5*r2 + h3*5*r3 + h2*5*r4
> +       @ d2 = h2*r0 + h1*r1   + h0*r2   + h4*5*r3 + h3*5*r4
> +       @ d3 = h3*r0 + h2*r1   + h1*r2   + h0*r3   + h4*5*r4
> +       @ d4 = h4*r0 + h3*r1   + h2*r2   + h1*r3   + h0*r4
> +
> +       vmull.u32       q5,d0,d0[1]
> +       vmull.u32       q6,d1,d0[1]
> +       vmull.u32       q7,d3,d0[1]
> +       vmull.u32       q8,d5,d0[1]
> +       vmull.u32       q9,d7,d0[1]
> +
> +       vmlal.u32       q5,d7,d2[1]
> +       vmlal.u32       q6,d0,d1[1]
> +       vmlal.u32       q7,d1,d1[1]
> +       vmlal.u32       q8,d3,d1[1]
> +       vmlal.u32       q9,d5,d1[1]
> +
> +       vmlal.u32       q5,d5,d4[1]
> +       vmlal.u32       q6,d7,d4[1]
> +       vmlal.u32       q8,d1,d3[1]
> +       vmlal.u32       q7,d0,d3[1]
> +       vmlal.u32       q9,d3,d3[1]
> +
> +       vmlal.u32       q5,d3,d6[1]
> +       vmlal.u32       q8,d0,d5[1]
> +       vmlal.u32       q6,d5,d6[1]
> +       vmlal.u32       q7,d7,d6[1]
> +       vmlal.u32       q9,d1,d5[1]
> +
> +       vmlal.u32       q8,d7,d8[1]
> +       vmlal.u32       q5,d1,d8[1]
> +       vmlal.u32       q6,d3,d8[1]
> +       vmlal.u32       q7,d5,d8[1]
> +       vmlal.u32       q9,d0,d7[1]
> +
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
> +       @ and P. Schwabe
> +       @
> +       @ H0>>+H1>>+H2>>+H3>>+H4
> +       @ H3>>+H4>>*5+H0>>+H1
> +       @
> +       @ Trivia.
> +       @
> +       @ Result of multiplication of n-bit number by m-bit number is
> +       @ n+m bits wide. However! Even though 2^n is a n+1-bit number,
> +       @ m-bit number multiplied by 2^n is still n+m bits wide.
> +       @
> +       @ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2,
> +       @ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit
> +       @ one is n+1 bits wide.
> +       @
> +       @ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that
> +       @ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4
> +       @ can be 27. However! In cases when their width exceeds 26 bits
> +       @ they are limited by 2^26+2^6. This in turn means that *sum*
> +       @ of the products with these values can still be viewed as sum
> +       @ of 52-bit numbers as long as the amount of addends is not a
> +       @ power of 2. For example,
> +       @
> +       @ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4,
> +       @
> +       @ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or
> +       @ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than
> +       @ 8 * (2^52) or 2^55. However, the value is then multiplied by
> +       @ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12),
> +       @ which is less than 32 * (2^52) or 2^57. And when processing
> +       @ data we are looking at triple as many addends...
> +       @
> +       @ In key setup procedure pre-reduced H0 is limited by 5*4+1 and
> +       @ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the
> +       @ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while
> +       @ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32
> +       @ instruction accepts 2x32-bit input and writes 2x64-bit result.
> +       @ This means that result of reduction have to be compressed upon
> +       @ loop wrap-around. This can be done in the process of reduction
> +       @ to minimize amount of instructions [as well as amount of
> +       @ 128-bit instructions, which benefits low-end processors], but
> +       @ one has to watch for H2 (which is narrower than H0) and 5*H4
> +       @ not being wider than 58 bits, so that result of right shift
> +       @ by 26 bits fits in 32 bits. This is also useful on x86,
> +       @ because it allows to use paddd in place for paddq, which
> +       @ benefits Atom, where paddq is ridiculously slow.
> +
> +       vshr.u64        q15,q8,#26
> +       vmovn.i64       d16,q8
> +        vshr.u64       q4,q5,#26
> +        vmovn.i64      d10,q5
> +       vadd.i64        q9,q9,q15               @ h3 -> h4
> +       vbic.i32        d16,#0xfc000000 @ &=0x03ffffff
> +        vadd.i64       q6,q6,q4                @ h0 -> h1
> +        vbic.i32       d10,#0xfc000000
> +
> +       vshrn.u64       d30,q9,#26
> +       vmovn.i64       d18,q9
> +        vshr.u64       q4,q6,#26
> +        vmovn.i64      d12,q6
> +        vadd.i64       q7,q7,q4                @ h1 -> h2
> +       vbic.i32        d18,#0xfc000000
> +        vbic.i32       d12,#0xfc000000
> +
> +       vadd.i32        d10,d10,d30
> +       vshl.u32        d30,d30,#2
> +        vshrn.u64      d8,q7,#26
> +        vmovn.i64      d14,q7
> +       vadd.i32        d10,d10,d30     @ h4 -> h0
> +        vadd.i32       d16,d16,d8      @ h2 -> h3
> +        vbic.i32       d14,#0xfc000000
> +
> +       vshr.u32        d30,d10,#26
> +       vbic.i32        d10,#0xfc000000
> +        vshr.u32       d8,d16,#26
> +        vbic.i32       d16,#0xfc000000
> +       vadd.i32        d12,d12,d30     @ h0 -> h1
> +        vadd.i32       d18,d18,d8      @ h3 -> h4
> +
> +       subs            r5,r5,#1
> +       beq             .Lsquare_break_neon
> +
> +       add             r6,r0,#(48+0*9*4)
> +       add             r7,r0,#(48+1*9*4)
> +
> +       vtrn.32         d0,d10          @ r^2:r^1
> +       vtrn.32         d3,d14
> +       vtrn.32         d5,d16
> +       vtrn.32         d1,d12
> +       vtrn.32         d7,d18
> +
> +       vshl.u32        d4,d3,#2                @ *5
> +       vshl.u32        d6,d5,#2
> +       vshl.u32        d2,d1,#2
> +       vshl.u32        d8,d7,#2
> +       vadd.i32        d4,d4,d3
> +       vadd.i32        d2,d2,d1
> +       vadd.i32        d6,d6,d5
> +       vadd.i32        d8,d8,d7
> +
> +       vst4.32         {d0[0],d1[0],d2[0],d3[0]},[r6]!
> +       vst4.32         {d0[1],d1[1],d2[1],d3[1]},[r7]!
> +       vst4.32         {d4[0],d5[0],d6[0],d7[0]},[r6]!
> +       vst4.32         {d4[1],d5[1],d6[1],d7[1]},[r7]!
> +       vst1.32         {d8[0]},[r6,:32]
> +       vst1.32         {d8[1]},[r7,:32]
> +
> +       b               .Lsquare_neon
> +
> +.align 4
> +.Lsquare_break_neon:
> +       add             r6,r0,#(48+2*4*9)
> +       add             r7,r0,#(48+3*4*9)
> +
> +       vmov            d0,d10          @ r^4:r^3
> +       vshl.u32        d2,d12,#2               @ *5
> +       vmov            d1,d12
> +       vshl.u32        d4,d14,#2
> +       vmov            d3,d14
> +       vshl.u32        d6,d16,#2
> +       vmov            d5,d16
> +       vshl.u32        d8,d18,#2
> +       vmov            d7,d18
> +       vadd.i32        d2,d2,d12
> +       vadd.i32        d4,d4,d14
> +       vadd.i32        d6,d6,d16
> +       vadd.i32        d8,d8,d18
> +
> +       vst4.32         {d0[0],d1[0],d2[0],d3[0]},[r6]!
> +       vst4.32         {d0[1],d1[1],d2[1],d3[1]},[r7]!
> +       vst4.32         {d4[0],d5[0],d6[0],d7[0]},[r6]!
> +       vst4.32         {d4[1],d5[1],d6[1],d7[1]},[r7]!
> +       vst1.32         {d8[0]},[r6]
> +       vst1.32         {d8[1]},[r7]
> +
> +       bx      lr                              @ bx    lr
> +ENDPROC(poly1305_init_neon)
> +
> +.align 5
> +ENTRY(poly1305_blocks_neon)
> +       ldr     ip,[r0,#36]             @ is_base2_26
> +       ands    r2,r2,#-16
> +       beq     .Lno_data_neon
> +
> +       cmp     r2,#64
> +       bhs     .Lenter_neon
> +       tst     ip,ip                   @ is_base2_26?
> +       beq     .Lpoly1305_blocks_arm
> +
> +.Lenter_neon:
> +       stmdb   sp!,{r4-r7}
> +       vstmdb  sp!,{d8-d15}            @ ABI specification says so
> +
> +       tst     ip,ip                   @ is_base2_26?
> +       bne     .Lbase2_26_neon
> +
> +       stmdb   sp!,{r1-r3,lr}
> +       bl      .Lpoly1305_init_neon
> +
> +       ldr     r4,[r0,#0]              @ load hash value base 2^32
> +       ldr     r5,[r0,#4]
> +       ldr     r6,[r0,#8]
> +       ldr     r7,[r0,#12]
> +       ldr     ip,[r0,#16]
> +
> +       and     r2,r4,#0x03ffffff       @ base 2^32 -> base 2^26
> +       mov     r3,r4,lsr#26
> +        veor   d10,d10,d10
> +       mov     r4,r5,lsr#20
> +       orr     r3,r3,r5,lsl#6
> +        veor   d12,d12,d12
> +       mov     r5,r6,lsr#14
> +       orr     r4,r4,r6,lsl#12
> +        veor   d14,d14,d14
> +       mov     r6,r7,lsr#8
> +       orr     r5,r5,r7,lsl#18
> +        veor   d16,d16,d16
> +       and     r3,r3,#0x03ffffff
> +       orr     r6,r6,ip,lsl#24
> +        veor   d18,d18,d18
> +       and     r4,r4,#0x03ffffff
> +       mov     r1,#1
> +       and     r5,r5,#0x03ffffff
> +       str     r1,[r0,#36]             @ is_base2_26
> +
> +       vmov.32 d10[0],r2
> +       vmov.32 d12[0],r3
> +       vmov.32 d14[0],r4
> +       vmov.32 d16[0],r5
> +       vmov.32 d18[0],r6
> +       adr     r5,.Lzeros
> +
> +       ldmia   sp!,{r1-r3,lr}
> +       b       .Lbase2_32_neon
> +
> +.align 4
> +.Lbase2_26_neon:
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ load hash value
> +
> +       veor            d10,d10,d10
> +       veor            d12,d12,d12
> +       veor            d14,d14,d14
> +       veor            d16,d16,d16
> +       veor            d18,d18,d18
> +       vld4.32         {d10[0],d12[0],d14[0],d16[0]},[r0]!
> +       adr             r5,.Lzeros
> +       vld1.32         {d18[0]},[r0]
> +       sub             r0,r0,#16               @ rewind
> +
> +.Lbase2_32_neon:
> +       add             r4,r1,#32
> +       mov             r3,r3,lsl#24
> +       tst             r2,#31
> +       beq             .Leven
> +
> +       vld4.32         {d20[0],d22[0],d24[0],d26[0]},[r1]!
> +       vmov.32         d28[0],r3
> +       sub             r2,r2,#16
> +       add             r4,r1,#32
> +
> +#ifdef __ARMEB__
> +       vrev32.8        q10,q10
> +       vrev32.8        q13,q13
> +       vrev32.8        q11,q11
> +       vrev32.8        q12,q12
> +#endif
> +       vsri.u32        d28,d26,#8      @ base 2^32 -> base 2^26
> +       vshl.u32        d26,d26,#18
> +
> +       vsri.u32        d26,d24,#14
> +       vshl.u32        d24,d24,#12
> +       vadd.i32        d29,d28,d18     @ add hash value and move to #hi
> +
> +       vbic.i32        d26,#0xfc000000
> +       vsri.u32        d24,d22,#20
> +       vshl.u32        d22,d22,#6
> +
> +       vbic.i32        d24,#0xfc000000
> +       vsri.u32        d22,d20,#26
> +       vadd.i32        d27,d26,d16
> +
> +       vbic.i32        d20,#0xfc000000
> +       vbic.i32        d22,#0xfc000000
> +       vadd.i32        d25,d24,d14
> +
> +       vadd.i32        d21,d20,d10
> +       vadd.i32        d23,d22,d12
> +
> +       mov             r7,r5
> +       add             r6,r0,#48
> +
> +       cmp             r2,r2
> +       b               .Long_tail
> +
> +.align 4
> +.Leven:
> +       subs            r2,r2,#64
> +       it              lo
> +       movlo           r4,r5
> +
> +       vmov.i32        q14,#1<<24              @ padbit, yes, always
> +       vld4.32         {d20,d22,d24,d26},[r1]  @ inp[0:1]
> +       add             r1,r1,#64
> +       vld4.32         {d21,d23,d25,d27},[r4]  @ inp[2:3] (or 0)
> +       add             r4,r4,#64
> +       itt             hi
> +       addhi           r7,r0,#(48+1*9*4)
> +       addhi           r6,r0,#(48+3*9*4)
> +
> +#ifdef __ARMEB__
> +       vrev32.8        q10,q10
> +       vrev32.8        q13,q13
> +       vrev32.8        q11,q11
> +       vrev32.8        q12,q12
> +#endif
> +       vsri.u32        q14,q13,#8              @ base 2^32 -> base 2^26
> +       vshl.u32        q13,q13,#18
> +
> +       vsri.u32        q13,q12,#14
> +       vshl.u32        q12,q12,#12
> +
> +       vbic.i32        q13,#0xfc000000
> +       vsri.u32        q12,q11,#20
> +       vshl.u32        q11,q11,#6
> +
> +       vbic.i32        q12,#0xfc000000
> +       vsri.u32        q11,q10,#26
> +
> +       vbic.i32        q10,#0xfc000000
> +       vbic.i32        q11,#0xfc000000
> +
> +       bls             .Lskip_loop
> +
> +       vld4.32         {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^2
> +       vld4.32         {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4
> +       vld4.32         {d4[1],d5[1],d6[1],d7[1]},[r7]!
> +       vld4.32         {d4[0],d5[0],d6[0],d7[0]},[r6]!
> +       b               .Loop_neon
> +
> +.align 5
> +.Loop_neon:
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
> +       @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
> +       @   ___________________/
> +       @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
> +       @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
> +       @   ___________________/ ____________________/
> +       @
> +       @ Note that we start with inp[2:3]*r^2. This is because it
> +       @ doesn't depend on reduction in previous iteration.
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ d4 = h4*r0 + h3*r1   + h2*r2   + h1*r3   + h0*r4
> +       @ d3 = h3*r0 + h2*r1   + h1*r2   + h0*r3   + h4*5*r4
> +       @ d2 = h2*r0 + h1*r1   + h0*r2   + h4*5*r3 + h3*5*r4
> +       @ d1 = h1*r0 + h0*r1   + h4*5*r2 + h3*5*r3 + h2*5*r4
> +       @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4
> +
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ inp[2:3]*r^2
> +
> +       vadd.i32        d24,d24,d14     @ accumulate inp[0:1]
> +       vmull.u32       q7,d25,d0[1]
> +       vadd.i32        d20,d20,d10
> +       vmull.u32       q5,d21,d0[1]
> +       vadd.i32        d26,d26,d16
> +       vmull.u32       q8,d27,d0[1]
> +       vmlal.u32       q7,d23,d1[1]
> +       vadd.i32        d22,d22,d12
> +       vmull.u32       q6,d23,d0[1]
> +
> +       vadd.i32        d28,d28,d18
> +       vmull.u32       q9,d29,d0[1]
> +       subs            r2,r2,#64
> +       vmlal.u32       q5,d29,d2[1]
> +       it              lo
> +       movlo           r4,r5
> +       vmlal.u32       q8,d25,d1[1]
> +       vld1.32         d8[1],[r7,:32]
> +       vmlal.u32       q6,d21,d1[1]
> +       vmlal.u32       q9,d27,d1[1]
> +
> +       vmlal.u32       q5,d27,d4[1]
> +       vmlal.u32       q8,d23,d3[1]
> +       vmlal.u32       q9,d25,d3[1]
> +       vmlal.u32       q6,d29,d4[1]
> +       vmlal.u32       q7,d21,d3[1]
> +
> +       vmlal.u32       q8,d21,d5[1]
> +       vmlal.u32       q5,d25,d6[1]
> +       vmlal.u32       q9,d23,d5[1]
> +       vmlal.u32       q6,d27,d6[1]
> +       vmlal.u32       q7,d29,d6[1]
> +
> +       vmlal.u32       q8,d29,d8[1]
> +       vmlal.u32       q5,d23,d8[1]
> +       vmlal.u32       q9,d21,d7[1]
> +       vmlal.u32       q6,d25,d8[1]
> +       vmlal.u32       q7,d27,d8[1]
> +
> +       vld4.32         {d21,d23,d25,d27},[r4]  @ inp[2:3] (or 0)
> +       add             r4,r4,#64
> +
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ (hash+inp[0:1])*r^4 and accumulate
> +
> +       vmlal.u32       q8,d26,d0[0]
> +       vmlal.u32       q5,d20,d0[0]
> +       vmlal.u32       q9,d28,d0[0]
> +       vmlal.u32       q6,d22,d0[0]
> +       vmlal.u32       q7,d24,d0[0]
> +       vld1.32         d8[0],[r6,:32]
> +
> +       vmlal.u32       q8,d24,d1[0]
> +       vmlal.u32       q5,d28,d2[0]
> +       vmlal.u32       q9,d26,d1[0]
> +       vmlal.u32       q6,d20,d1[0]
> +       vmlal.u32       q7,d22,d1[0]
> +
> +       vmlal.u32       q8,d22,d3[0]
> +       vmlal.u32       q5,d26,d4[0]
> +       vmlal.u32       q9,d24,d3[0]
> +       vmlal.u32       q6,d28,d4[0]
> +       vmlal.u32       q7,d20,d3[0]
> +
> +       vmlal.u32       q8,d20,d5[0]
> +       vmlal.u32       q5,d24,d6[0]
> +       vmlal.u32       q9,d22,d5[0]
> +       vmlal.u32       q6,d26,d6[0]
> +       vmlal.u32       q8,d28,d8[0]
> +
> +       vmlal.u32       q7,d28,d6[0]
> +       vmlal.u32       q5,d22,d8[0]
> +       vmlal.u32       q9,d20,d7[0]
> +       vmov.i32        q14,#1<<24              @ padbit, yes, always
> +       vmlal.u32       q6,d24,d8[0]
> +       vmlal.u32       q7,d26,d8[0]
> +
> +       vld4.32         {d20,d22,d24,d26},[r1]  @ inp[0:1]
> +       add             r1,r1,#64
> +#ifdef __ARMEB__
> +       vrev32.8        q10,q10
> +       vrev32.8        q11,q11
> +       vrev32.8        q12,q12
> +       vrev32.8        q13,q13
> +#endif
> +
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ lazy reduction interleaved with base 2^32 -> base 2^26 of
> +       @ inp[0:3] previously loaded to q10-q13 and smashed to q10-q14.
> +
> +       vshr.u64        q15,q8,#26
> +       vmovn.i64       d16,q8
> +        vshr.u64       q4,q5,#26
> +        vmovn.i64      d10,q5
> +       vadd.i64        q9,q9,q15               @ h3 -> h4
> +       vbic.i32        d16,#0xfc000000
> +         vsri.u32      q14,q13,#8              @ base 2^32 -> base 2^26
> +        vadd.i64       q6,q6,q4                @ h0 -> h1
> +         vshl.u32      q13,q13,#18
> +        vbic.i32       d10,#0xfc000000
> +
> +       vshrn.u64       d30,q9,#26
> +       vmovn.i64       d18,q9
> +        vshr.u64       q4,q6,#26
> +        vmovn.i64      d12,q6
> +        vadd.i64       q7,q7,q4                @ h1 -> h2
> +         vsri.u32      q13,q12,#14
> +       vbic.i32        d18,#0xfc000000
> +         vshl.u32      q12,q12,#12
> +        vbic.i32       d12,#0xfc000000
> +
> +       vadd.i32        d10,d10,d30
> +       vshl.u32        d30,d30,#2
> +         vbic.i32      q13,#0xfc000000
> +        vshrn.u64      d8,q7,#26
> +        vmovn.i64      d14,q7
> +       vaddl.u32       q5,d10,d30      @ h4 -> h0 [widen for a sec]
> +         vsri.u32      q12,q11,#20
> +        vadd.i32       d16,d16,d8      @ h2 -> h3
> +         vshl.u32      q11,q11,#6
> +        vbic.i32       d14,#0xfc000000
> +         vbic.i32      q12,#0xfc000000
> +
> +       vshrn.u64       d30,q5,#26              @ re-narrow
> +       vmovn.i64       d10,q5
> +         vsri.u32      q11,q10,#26
> +         vbic.i32      q10,#0xfc000000
> +        vshr.u32       d8,d16,#26
> +        vbic.i32       d16,#0xfc000000
> +       vbic.i32        d10,#0xfc000000
> +       vadd.i32        d12,d12,d30     @ h0 -> h1
> +        vadd.i32       d18,d18,d8      @ h3 -> h4
> +         vbic.i32      q11,#0xfc000000
> +
> +       bhi             .Loop_neon
> +
> +.Lskip_loop:
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
> +
> +       add             r7,r0,#(48+0*9*4)
> +       add             r6,r0,#(48+1*9*4)
> +       adds            r2,r2,#32
> +       it              ne
> +       movne           r2,#0
> +       bne             .Long_tail
> +
> +       vadd.i32        d25,d24,d14     @ add hash value and move to #hi
> +       vadd.i32        d21,d20,d10
> +       vadd.i32        d27,d26,d16
> +       vadd.i32        d23,d22,d12
> +       vadd.i32        d29,d28,d18
> +
> +.Long_tail:
> +       vld4.32         {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^1
> +       vld4.32         {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^2
> +
> +       vadd.i32        d24,d24,d14     @ can be redundant
> +       vmull.u32       q7,d25,d0
> +       vadd.i32        d20,d20,d10
> +       vmull.u32       q5,d21,d0
> +       vadd.i32        d26,d26,d16
> +       vmull.u32       q8,d27,d0
> +       vadd.i32        d22,d22,d12
> +       vmull.u32       q6,d23,d0
> +       vadd.i32        d28,d28,d18
> +       vmull.u32       q9,d29,d0
> +
> +       vmlal.u32       q5,d29,d2
> +       vld4.32         {d4[1],d5[1],d6[1],d7[1]},[r7]!
> +       vmlal.u32       q8,d25,d1
> +       vld4.32         {d4[0],d5[0],d6[0],d7[0]},[r6]!
> +       vmlal.u32       q6,d21,d1
> +       vmlal.u32       q9,d27,d1
> +       vmlal.u32       q7,d23,d1
> +
> +       vmlal.u32       q8,d23,d3
> +       vld1.32         d8[1],[r7,:32]
> +       vmlal.u32       q5,d27,d4
> +       vld1.32         d8[0],[r6,:32]
> +       vmlal.u32       q9,d25,d3
> +       vmlal.u32       q6,d29,d4
> +       vmlal.u32       q7,d21,d3
> +
> +       vmlal.u32       q8,d21,d5
> +        it             ne
> +        addne          r7,r0,#(48+2*9*4)
> +       vmlal.u32       q5,d25,d6
> +        it             ne
> +        addne          r6,r0,#(48+3*9*4)
> +       vmlal.u32       q9,d23,d5
> +       vmlal.u32       q6,d27,d6
> +       vmlal.u32       q7,d29,d6
> +
> +       vmlal.u32       q8,d29,d8
> +        vorn           q0,q0,q0        @ all-ones, can be redundant
> +       vmlal.u32       q5,d23,d8
> +        vshr.u64       q0,q0,#38
> +       vmlal.u32       q9,d21,d7
> +       vmlal.u32       q6,d25,d8
> +       vmlal.u32       q7,d27,d8
> +
> +       beq             .Lshort_tail
> +
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ (hash+inp[0:1])*r^4:r^3 and accumulate
> +
> +       vld4.32         {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^3
> +       vld4.32         {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4
> +
> +       vmlal.u32       q7,d24,d0
> +       vmlal.u32       q5,d20,d0
> +       vmlal.u32       q8,d26,d0
> +       vmlal.u32       q6,d22,d0
> +       vmlal.u32       q9,d28,d0
> +
> +       vmlal.u32       q5,d28,d2
> +       vld4.32         {d4[1],d5[1],d6[1],d7[1]},[r7]!
> +       vmlal.u32       q8,d24,d1
> +       vld4.32         {d4[0],d5[0],d6[0],d7[0]},[r6]!
> +       vmlal.u32       q6,d20,d1
> +       vmlal.u32       q9,d26,d1
> +       vmlal.u32       q7,d22,d1
> +
> +       vmlal.u32       q8,d22,d3
> +       vld1.32         d8[1],[r7,:32]
> +       vmlal.u32       q5,d26,d4
> +       vld1.32         d8[0],[r6,:32]
> +       vmlal.u32       q9,d24,d3
> +       vmlal.u32       q6,d28,d4
> +       vmlal.u32       q7,d20,d3
> +
> +       vmlal.u32       q8,d20,d5
> +       vmlal.u32       q5,d24,d6
> +       vmlal.u32       q9,d22,d5
> +       vmlal.u32       q6,d26,d6
> +       vmlal.u32       q7,d28,d6
> +
> +       vmlal.u32       q8,d28,d8
> +        vorn           q0,q0,q0        @ all-ones
> +       vmlal.u32       q5,d22,d8
> +        vshr.u64       q0,q0,#38
> +       vmlal.u32       q9,d20,d7
> +       vmlal.u32       q6,d24,d8
> +       vmlal.u32       q7,d26,d8
> +
> +.Lshort_tail:
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ horizontal addition
> +
> +       vadd.i64        d16,d16,d17
> +       vadd.i64        d10,d10,d11
> +       vadd.i64        d18,d18,d19
> +       vadd.i64        d12,d12,d13
> +       vadd.i64        d14,d14,d15
> +
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ lazy reduction, but without narrowing
> +
> +       vshr.u64        q15,q8,#26
> +       vand.i64        q8,q8,q0
> +        vshr.u64       q4,q5,#26
> +        vand.i64       q5,q5,q0
> +       vadd.i64        q9,q9,q15               @ h3 -> h4
> +        vadd.i64       q6,q6,q4                @ h0 -> h1
> +
> +       vshr.u64        q15,q9,#26
> +       vand.i64        q9,q9,q0
> +        vshr.u64       q4,q6,#26
> +        vand.i64       q6,q6,q0
> +        vadd.i64       q7,q7,q4                @ h1 -> h2
> +
> +       vadd.i64        q5,q5,q15
> +       vshl.u64        q15,q15,#2
> +        vshr.u64       q4,q7,#26
> +        vand.i64       q7,q7,q0
> +       vadd.i64        q5,q5,q15               @ h4 -> h0
> +        vadd.i64       q8,q8,q4                @ h2 -> h3
> +
> +       vshr.u64        q15,q5,#26
> +       vand.i64        q5,q5,q0
> +        vshr.u64       q4,q8,#26
> +        vand.i64       q8,q8,q0
> +       vadd.i64        q6,q6,q15               @ h0 -> h1
> +        vadd.i64       q9,q9,q4                @ h3 -> h4
> +
> +       cmp             r2,#0
> +       bne             .Leven
> +
> +       @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> +       @ store hash value
> +
> +       vst4.32         {d10[0],d12[0],d14[0],d16[0]},[r0]!
> +       vst1.32         {d18[0]},[r0]
> +
> +       vldmia  sp!,{d8-d15}                    @ epilogue
> +       ldmia   sp!,{r4-r7}
> +.Lno_data_neon:
> +       bx      lr                                      @ bx    lr
> +ENDPROC(poly1305_blocks_neon)
> +
> +.align 5
> +ENTRY(poly1305_emit_neon)
> +       ldr     ip,[r0,#36]             @ is_base2_26
> +
> +       stmdb   sp!,{r4-r11}
> +
> +       tst     ip,ip
> +       beq     .Lpoly1305_emit_enter
> +
> +       ldmia   r0,{r3-r7}
> +       eor     r8,r8,r8
> +
> +       adds    r3,r3,r4,lsl#26 @ base 2^26 -> base 2^32
> +       mov     r4,r4,lsr#6
> +       adcs    r4,r4,r5,lsl#20
> +       mov     r5,r5,lsr#12
> +       adcs    r5,r5,r6,lsl#14
> +       mov     r6,r6,lsr#18
> +       adcs    r6,r6,r7,lsl#8
> +       adc     r7,r8,r7,lsr#24 @ can be partially reduced ...
> +
> +       and     r8,r7,#-4               @ ... so reduce
> +       and     r7,r6,#3
> +       add     r8,r8,r8,lsr#2  @ *= 5
> +       adds    r3,r3,r8
> +       adcs    r4,r4,#0
> +       adcs    r5,r5,#0
> +       adcs    r6,r6,#0
> +       adc     r7,r7,#0
> +
> +       adds    r8,r3,#5                @ compare to modulus
> +       adcs    r9,r4,#0
> +       adcs    r10,r5,#0
> +       adcs    r11,r6,#0
> +       adc     r7,r7,#0
> +       tst     r7,#4                   @ did it carry/borrow?
> +
> +       it      ne
> +       movne   r3,r8
> +       ldr     r8,[r2,#0]
> +       it      ne
> +       movne   r4,r9
> +       ldr     r9,[r2,#4]
> +       it      ne
> +       movne   r5,r10
> +       ldr     r10,[r2,#8]
> +       it      ne
> +       movne   r6,r11
> +       ldr     r11,[r2,#12]
> +
> +       adds    r3,r3,r8                @ accumulate nonce
> +       adcs    r4,r4,r9
> +       adcs    r5,r5,r10
> +       adc     r6,r6,r11
> +
> +#ifdef __ARMEB__
> +       rev     r3,r3
> +       rev     r4,r4
> +       rev     r5,r5
> +       rev     r6,r6
> +#endif
> +       str     r3,[r1,#0]              @ store the result
> +       str     r4,[r1,#4]
> +       str     r5,[r1,#8]
> +       str     r6,[r1,#12]
> +
> +       ldmia   sp!,{r4-r11}
> +       bx      lr                              @ bx    lr
> +ENDPROC(poly1305_emit_neon)
> +
> +.align 5
> +.Lzeros:
> +.long  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
> +#endif
> diff --git a/lib/zinc/poly1305/poly1305-arm64.S b/lib/zinc/poly1305/poly1305-arm64.S
> new file mode 100644
> index 000000000000..c20023544183
> --- /dev/null
> +++ b/lib/zinc/poly1305/poly1305-arm64.S
> @@ -0,0 +1,822 @@
> +/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> + *
> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + * Copyright (C) 2006-2017 CRYPTOGAMS by <appro@openssl.org>. All Rights Reserved.
> + *
> + * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS.
> + */
> +
> +#include <linux/linkage.h>
> +.text
> +
> +.align 5
> +ENTRY(poly1305_init_arm)
> +       cmp     x1,xzr
> +       stp     xzr,xzr,[x0]            // zero hash value
> +       stp     xzr,xzr,[x0,#16]        // [along with is_base2_26]
> +
> +       csel    x0,xzr,x0,eq
> +       b.eq    .Lno_key
> +
> +       ldp     x7,x8,[x1]              // load key
> +       mov     x9,#0xfffffffc0fffffff
> +       movk    x9,#0x0fff,lsl#48
> +#ifdef __ARMEB__
> +       rev     x7,x7                   // flip bytes
> +       rev     x8,x8
> +#endif
> +       and     x7,x7,x9                // &=0ffffffc0fffffff
> +       and     x9,x9,#-4
> +       and     x8,x8,x9                // &=0ffffffc0ffffffc
> +       stp     x7,x8,[x0,#32]  // save key value
> +
> +.Lno_key:
> +       ret
> +ENDPROC(poly1305_init_arm)
> +
> +.align 5
> +ENTRY(poly1305_blocks_arm)
> +       ands    x2,x2,#-16
> +       b.eq    .Lno_data
> +
> +       ldp     x4,x5,[x0]              // load hash value
> +       ldp     x7,x8,[x0,#32]  // load key value
> +       ldr     x6,[x0,#16]
> +       add     x9,x8,x8,lsr#2  // s1 = r1 + (r1 >> 2)
> +       b       .Loop
> +
> +.align 5
> +.Loop:
> +       ldp     x10,x11,[x1],#16        // load input
> +       sub     x2,x2,#16
> +#ifdef __ARMEB__
> +       rev     x10,x10
> +       rev     x11,x11
> +#endif
> +       adds    x4,x4,x10               // accumulate input
> +       adcs    x5,x5,x11
> +
> +       mul     x12,x4,x7               // h0*r0
> +       adc     x6,x6,x3
> +       umulh   x13,x4,x7
> +
> +       mul     x10,x5,x9               // h1*5*r1
> +       umulh   x11,x5,x9
> +
> +       adds    x12,x12,x10
> +       mul     x10,x4,x8               // h0*r1
> +       adc     x13,x13,x11
> +       umulh   x14,x4,x8
> +
> +       adds    x13,x13,x10
> +       mul     x10,x5,x7               // h1*r0
> +       adc     x14,x14,xzr
> +       umulh   x11,x5,x7
> +
> +       adds    x13,x13,x10
> +       mul     x10,x6,x9               // h2*5*r1
> +       adc     x14,x14,x11
> +       mul     x11,x6,x7               // h2*r0
> +
> +       adds    x13,x13,x10
> +       adc     x14,x14,x11
> +
> +       and     x10,x14,#-4             // final reduction
> +       and     x6,x14,#3
> +       add     x10,x10,x14,lsr#2
> +       adds    x4,x12,x10
> +       adcs    x5,x13,xzr
> +       adc     x6,x6,xzr
> +
> +       cbnz    x2,.Loop
> +
> +       stp     x4,x5,[x0]              // store hash value
> +       str     x6,[x0,#16]
> +
> +.Lno_data:
> +       ret
> +ENDPROC(poly1305_blocks_arm)
> +
> +.align 5
> +ENTRY(poly1305_emit_arm)
> +       ldp     x4,x5,[x0]              // load hash base 2^64
> +       ldr     x6,[x0,#16]
> +       ldp     x10,x11,[x2]    // load nonce
> +
> +       adds    x12,x4,#5               // compare to modulus
> +       adcs    x13,x5,xzr
> +       adc     x14,x6,xzr
> +
> +       tst     x14,#-4                 // see if it's carried/borrowed
> +
> +       csel    x4,x4,x12,eq
> +       csel    x5,x5,x13,eq
> +
> +#ifdef __ARMEB__
> +       ror     x10,x10,#32             // flip nonce words
> +       ror     x11,x11,#32
> +#endif
> +       adds    x4,x4,x10               // accumulate nonce
> +       adc     x5,x5,x11
> +#ifdef __ARMEB__
> +       rev     x4,x4                   // flip output bytes
> +       rev     x5,x5
> +#endif
> +       stp     x4,x5,[x1]              // write result
> +
> +       ret
> +ENDPROC(poly1305_emit_arm)
> +
> +.align 5
> +__poly1305_mult:
> +       mul     x12,x4,x7               // h0*r0
> +       umulh   x13,x4,x7
> +
> +       mul     x10,x5,x9               // h1*5*r1
> +       umulh   x11,x5,x9
> +
> +       adds    x12,x12,x10
> +       mul     x10,x4,x8               // h0*r1
> +       adc     x13,x13,x11
> +       umulh   x14,x4,x8
> +
> +       adds    x13,x13,x10
> +       mul     x10,x5,x7               // h1*r0
> +       adc     x14,x14,xzr
> +       umulh   x11,x5,x7
> +
> +       adds    x13,x13,x10
> +       mul     x10,x6,x9               // h2*5*r1
> +       adc     x14,x14,x11
> +       mul     x11,x6,x7               // h2*r0
> +
> +       adds    x13,x13,x10
> +       adc     x14,x14,x11
> +
> +       and     x10,x14,#-4             // final reduction
> +       and     x6,x14,#3
> +       add     x10,x10,x14,lsr#2
> +       adds    x4,x12,x10
> +       adcs    x5,x13,xzr
> +       adc     x6,x6,xzr
> +
> +       ret
> +
> +__poly1305_splat:
> +       and     x12,x4,#0x03ffffff      // base 2^64 -> base 2^26
> +       ubfx    x13,x4,#26,#26
> +       extr    x14,x5,x4,#52
> +       and     x14,x14,#0x03ffffff
> +       ubfx    x15,x5,#14,#26
> +       extr    x16,x6,x5,#40
> +
> +       str     w12,[x0,#16*0]  // r0
> +       add     w12,w13,w13,lsl#2       // r1*5
> +       str     w13,[x0,#16*1]  // r1
> +       add     w13,w14,w14,lsl#2       // r2*5
> +       str     w12,[x0,#16*2]  // s1
> +       str     w14,[x0,#16*3]  // r2
> +       add     w14,w15,w15,lsl#2       // r3*5
> +       str     w13,[x0,#16*4]  // s2
> +       str     w15,[x0,#16*5]  // r3
> +       add     w15,w16,w16,lsl#2       // r4*5
> +       str     w14,[x0,#16*6]  // s3
> +       str     w16,[x0,#16*7]  // r4
> +       str     w15,[x0,#16*8]  // s4
> +
> +       ret
> +
> +.align 5
> +ENTRY(poly1305_blocks_neon)
> +       ldr     x17,[x0,#24]
> +       cmp     x2,#128
> +       b.hs    .Lblocks_neon
> +       cbz     x17,poly1305_blocks_arm
> +
> +.Lblocks_neon:
> +       stp     x29,x30,[sp,#-80]!
> +       add     x29,sp,#0
> +
> +       ands    x2,x2,#-16
> +       b.eq    .Lno_data_neon
> +
> +       cbz     x17,.Lbase2_64_neon
> +
> +       ldp     w10,w11,[x0]            // load hash value base 2^26
> +       ldp     w12,w13,[x0,#8]
> +       ldr     w14,[x0,#16]
> +
> +       tst     x2,#31
> +       b.eq    .Leven_neon
> +
> +       ldp     x7,x8,[x0,#32]  // load key value
> +
> +       add     x4,x10,x11,lsl#26       // base 2^26 -> base 2^64
> +       lsr     x5,x12,#12
> +       adds    x4,x4,x12,lsl#52
> +       add     x5,x5,x13,lsl#14
> +       adc     x5,x5,xzr
> +       lsr     x6,x14,#24
> +       adds    x5,x5,x14,lsl#40
> +       adc     x14,x6,xzr              // can be partially reduced...
> +
> +       ldp     x12,x13,[x1],#16        // load input
> +       sub     x2,x2,#16
> +       add     x9,x8,x8,lsr#2  // s1 = r1 + (r1 >> 2)
> +
> +       and     x10,x14,#-4             // ... so reduce
> +       and     x6,x14,#3
> +       add     x10,x10,x14,lsr#2
> +       adds    x4,x4,x10
> +       adcs    x5,x5,xzr
> +       adc     x6,x6,xzr
> +
> +#ifdef __ARMEB__
> +       rev     x12,x12
> +       rev     x13,x13
> +#endif
> +       adds    x4,x4,x12               // accumulate input
> +       adcs    x5,x5,x13
> +       adc     x6,x6,x3
> +
> +       bl      __poly1305_mult
> +       ldr     x30,[sp,#8]
> +
> +       cbz     x3,.Lstore_base2_64_neon
> +
> +       and     x10,x4,#0x03ffffff      // base 2^64 -> base 2^26
> +       ubfx    x11,x4,#26,#26
> +       extr    x12,x5,x4,#52
> +       and     x12,x12,#0x03ffffff
> +       ubfx    x13,x5,#14,#26
> +       extr    x14,x6,x5,#40
> +
> +       cbnz    x2,.Leven_neon
> +
> +       stp     w10,w11,[x0]            // store hash value base 2^26
> +       stp     w12,w13,[x0,#8]
> +       str     w14,[x0,#16]
> +       b       .Lno_data_neon
> +
> +.align 4
> +.Lstore_base2_64_neon:
> +       stp     x4,x5,[x0]              // store hash value base 2^64
> +       stp     x6,xzr,[x0,#16] // note that is_base2_26 is zeroed
> +       b       .Lno_data_neon
> +
> +.align 4
> +.Lbase2_64_neon:
> +       ldp     x7,x8,[x0,#32]  // load key value
> +
> +       ldp     x4,x5,[x0]              // load hash value base 2^64
> +       ldr     x6,[x0,#16]
> +
> +       tst     x2,#31
> +       b.eq    .Linit_neon
> +
> +       ldp     x12,x13,[x1],#16        // load input
> +       sub     x2,x2,#16
> +       add     x9,x8,x8,lsr#2  // s1 = r1 + (r1 >> 2)
> +#ifdef __ARMEB__
> +       rev     x12,x12
> +       rev     x13,x13
> +#endif
> +       adds    x4,x4,x12               // accumulate input
> +       adcs    x5,x5,x13
> +       adc     x6,x6,x3
> +
> +       bl      __poly1305_mult
> +
> +.Linit_neon:
> +       and     x10,x4,#0x03ffffff      // base 2^64 -> base 2^26
> +       ubfx    x11,x4,#26,#26
> +       extr    x12,x5,x4,#52
> +       and     x12,x12,#0x03ffffff
> +       ubfx    x13,x5,#14,#26
> +       extr    x14,x6,x5,#40
> +
> +       stp     d8,d9,[sp,#16]          // meet ABI requirements
> +       stp     d10,d11,[sp,#32]
> +       stp     d12,d13,[sp,#48]
> +       stp     d14,d15,[sp,#64]
> +
> +       fmov    d24,x10
> +       fmov    d25,x11
> +       fmov    d26,x12
> +       fmov    d27,x13
> +       fmov    d28,x14
> +
> +       ////////////////////////////////// initialize r^n table
> +       mov     x4,x7                   // r^1
> +       add     x9,x8,x8,lsr#2  // s1 = r1 + (r1 >> 2)
> +       mov     x5,x8
> +       mov     x6,xzr
> +       add     x0,x0,#48+12
> +       bl      __poly1305_splat
> +
> +       bl      __poly1305_mult         // r^2
> +       sub     x0,x0,#4
> +       bl      __poly1305_splat
> +
> +       bl      __poly1305_mult         // r^3
> +       sub     x0,x0,#4
> +       bl      __poly1305_splat
> +
> +       bl      __poly1305_mult         // r^4
> +       sub     x0,x0,#4
> +       bl      __poly1305_splat
> +       ldr     x30,[sp,#8]
> +
> +       add     x16,x1,#32
> +       adr     x17,.Lzeros
> +       subs    x2,x2,#64
> +       csel    x16,x17,x16,lo
> +
> +       mov     x4,#1
> +       str     x4,[x0,#-24]            // set is_base2_26
> +       sub     x0,x0,#48               // restore original x0
> +       b       .Ldo_neon
> +
> +.align 4
> +.Leven_neon:
> +       add     x16,x1,#32
> +       adr     x17,.Lzeros
> +       subs    x2,x2,#64
> +       csel    x16,x17,x16,lo
> +
> +       stp     d8,d9,[sp,#16]          // meet ABI requirements
> +       stp     d10,d11,[sp,#32]
> +       stp     d12,d13,[sp,#48]
> +       stp     d14,d15,[sp,#64]
> +
> +       fmov    d24,x10
> +       fmov    d25,x11
> +       fmov    d26,x12
> +       fmov    d27,x13
> +       fmov    d28,x14
> +
> +.Ldo_neon:
> +       ldp     x8,x12,[x16],#16        // inp[2:3] (or zero)
> +       ldp     x9,x13,[x16],#48
> +
> +       lsl     x3,x3,#24
> +       add     x15,x0,#48
> +
> +#ifdef __ARMEB__
> +       rev     x8,x8
> +       rev     x12,x12
> +       rev     x9,x9
> +       rev     x13,x13
> +#endif
> +       and     x4,x8,#0x03ffffff       // base 2^64 -> base 2^26
> +       and     x5,x9,#0x03ffffff
> +       ubfx    x6,x8,#26,#26
> +       ubfx    x7,x9,#26,#26
> +       add     x4,x4,x5,lsl#32         // bfi  x4,x5,#32,#32
> +       extr    x8,x12,x8,#52
> +       extr    x9,x13,x9,#52
> +       add     x6,x6,x7,lsl#32         // bfi  x6,x7,#32,#32
> +       fmov    d14,x4
> +       and     x8,x8,#0x03ffffff
> +       and     x9,x9,#0x03ffffff
> +       ubfx    x10,x12,#14,#26
> +       ubfx    x11,x13,#14,#26
> +       add     x12,x3,x12,lsr#40
> +       add     x13,x3,x13,lsr#40
> +       add     x8,x8,x9,lsl#32         // bfi  x8,x9,#32,#32
> +       fmov    d15,x6
> +       add     x10,x10,x11,lsl#32      // bfi  x10,x11,#32,#32
> +       add     x12,x12,x13,lsl#32      // bfi  x12,x13,#32,#32
> +       fmov    d16,x8
> +       fmov    d17,x10
> +       fmov    d18,x12
> +
> +       ldp     x8,x12,[x1],#16 // inp[0:1]
> +       ldp     x9,x13,[x1],#48
> +
> +       ld1     {v0.4s,v1.4s,v2.4s,v3.4s},[x15],#64
> +       ld1     {v4.4s,v5.4s,v6.4s,v7.4s},[x15],#64
> +       ld1     {v8.4s},[x15]
> +
> +#ifdef __ARMEB__
> +       rev     x8,x8
> +       rev     x12,x12
> +       rev     x9,x9
> +       rev     x13,x13
> +#endif
> +       and     x4,x8,#0x03ffffff       // base 2^64 -> base 2^26
> +       and     x5,x9,#0x03ffffff
> +       ubfx    x6,x8,#26,#26
> +       ubfx    x7,x9,#26,#26
> +       add     x4,x4,x5,lsl#32         // bfi  x4,x5,#32,#32
> +       extr    x8,x12,x8,#52
> +       extr    x9,x13,x9,#52
> +       add     x6,x6,x7,lsl#32         // bfi  x6,x7,#32,#32
> +       fmov    d9,x4
> +       and     x8,x8,#0x03ffffff
> +       and     x9,x9,#0x03ffffff
> +       ubfx    x10,x12,#14,#26
> +       ubfx    x11,x13,#14,#26
> +       add     x12,x3,x12,lsr#40
> +       add     x13,x3,x13,lsr#40
> +       add     x8,x8,x9,lsl#32         // bfi  x8,x9,#32,#32
> +       fmov    d10,x6
> +       add     x10,x10,x11,lsl#32      // bfi  x10,x11,#32,#32
> +       add     x12,x12,x13,lsl#32      // bfi  x12,x13,#32,#32
> +       movi    v31.2d,#-1
> +       fmov    d11,x8
> +       fmov    d12,x10
> +       fmov    d13,x12
> +       ushr    v31.2d,v31.2d,#38
> +
> +       b.ls    .Lskip_loop
> +
> +.align 4
> +.Loop_neon:
> +       ////////////////////////////////////////////////////////////////
> +       // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2
> +       // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r
> +       //   ___________________/
> +       // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2
> +       // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r
> +       //   ___________________/ ____________________/
> +       //
> +       // Note that we start with inp[2:3]*r^2. This is because it
> +       // doesn't depend on reduction in previous iteration.
> +       ////////////////////////////////////////////////////////////////
> +       // d4 = h0*r4 + h1*r3   + h2*r2   + h3*r1   + h4*r0
> +       // d3 = h0*r3 + h1*r2   + h2*r1   + h3*r0   + h4*5*r4
> +       // d2 = h0*r2 + h1*r1   + h2*r0   + h3*5*r4 + h4*5*r3
> +       // d1 = h0*r1 + h1*r0   + h2*5*r4 + h3*5*r3 + h4*5*r2
> +       // d0 = h0*r0 + h1*5*r4 + h2*5*r3 + h3*5*r2 + h4*5*r1
> +
> +       subs    x2,x2,#64
> +       umull   v23.2d,v14.2s,v7.s[2]
> +       csel    x16,x17,x16,lo
> +       umull   v22.2d,v14.2s,v5.s[2]
> +       umull   v21.2d,v14.2s,v3.s[2]
> +       ldp     x8,x12,[x16],#16        // inp[2:3] (or zero)
> +       umull   v20.2d,v14.2s,v1.s[2]
> +       ldp     x9,x13,[x16],#48
> +       umull   v19.2d,v14.2s,v0.s[2]
> +#ifdef __ARMEB__
> +       rev     x8,x8
> +       rev     x12,x12
> +       rev     x9,x9
> +       rev     x13,x13
> +#endif
> +
> +       umlal   v23.2d,v15.2s,v5.s[2]
> +       and     x4,x8,#0x03ffffff       // base 2^64 -> base 2^26
> +       umlal   v22.2d,v15.2s,v3.s[2]
> +       and     x5,x9,#0x03ffffff
> +       umlal   v21.2d,v15.2s,v1.s[2]
> +       ubfx    x6,x8,#26,#26
> +       umlal   v20.2d,v15.2s,v0.s[2]
> +       ubfx    x7,x9,#26,#26
> +       umlal   v19.2d,v15.2s,v8.s[2]
> +       add     x4,x4,x5,lsl#32         // bfi  x4,x5,#32,#32
> +
> +       umlal   v23.2d,v16.2s,v3.s[2]
> +       extr    x8,x12,x8,#52
> +       umlal   v22.2d,v16.2s,v1.s[2]
> +       extr    x9,x13,x9,#52
> +       umlal   v21.2d,v16.2s,v0.s[2]
> +       add     x6,x6,x7,lsl#32         // bfi  x6,x7,#32,#32
> +       umlal   v20.2d,v16.2s,v8.s[2]
> +       fmov    d14,x4
> +       umlal   v19.2d,v16.2s,v6.s[2]
> +       and     x8,x8,#0x03ffffff
> +
> +       umlal   v23.2d,v17.2s,v1.s[2]
> +       and     x9,x9,#0x03ffffff
> +       umlal   v22.2d,v17.2s,v0.s[2]
> +       ubfx    x10,x12,#14,#26
> +       umlal   v21.2d,v17.2s,v8.s[2]
> +       ubfx    x11,x13,#14,#26
> +       umlal   v20.2d,v17.2s,v6.s[2]
> +       add     x8,x8,x9,lsl#32         // bfi  x8,x9,#32,#32
> +       umlal   v19.2d,v17.2s,v4.s[2]
> +       fmov    d15,x6
> +
> +       add     v11.2s,v11.2s,v26.2s
> +       add     x12,x3,x12,lsr#40
> +       umlal   v23.2d,v18.2s,v0.s[2]
> +       add     x13,x3,x13,lsr#40
> +       umlal   v22.2d,v18.2s,v8.s[2]
> +       add     x10,x10,x11,lsl#32      // bfi  x10,x11,#32,#32
> +       umlal   v21.2d,v18.2s,v6.s[2]
> +       add     x12,x12,x13,lsl#32      // bfi  x12,x13,#32,#32
> +       umlal   v20.2d,v18.2s,v4.s[2]
> +       fmov    d16,x8
> +       umlal   v19.2d,v18.2s,v2.s[2]
> +       fmov    d17,x10
> +
> +       ////////////////////////////////////////////////////////////////
> +       // (hash+inp[0:1])*r^4 and accumulate
> +
> +       add     v9.2s,v9.2s,v24.2s
> +       fmov    d18,x12
> +       umlal   v22.2d,v11.2s,v1.s[0]
> +       ldp     x8,x12,[x1],#16 // inp[0:1]
> +       umlal   v19.2d,v11.2s,v6.s[0]
> +       ldp     x9,x13,[x1],#48
> +       umlal   v23.2d,v11.2s,v3.s[0]
> +       umlal   v20.2d,v11.2s,v8.s[0]
> +       umlal   v21.2d,v11.2s,v0.s[0]
> +#ifdef __ARMEB__
> +       rev     x8,x8
> +       rev     x12,x12
> +       rev     x9,x9
> +       rev     x13,x13
> +#endif
> +
> +       add     v10.2s,v10.2s,v25.2s
> +       umlal   v22.2d,v9.2s,v5.s[0]
> +       umlal   v23.2d,v9.2s,v7.s[0]
> +       and     x4,x8,#0x03ffffff       // base 2^64 -> base 2^26
> +       umlal   v21.2d,v9.2s,v3.s[0]
> +       and     x5,x9,#0x03ffffff
> +       umlal   v19.2d,v9.2s,v0.s[0]
> +       ubfx    x6,x8,#26,#26
> +       umlal   v20.2d,v9.2s,v1.s[0]
> +       ubfx    x7,x9,#26,#26
> +
> +       add     v12.2s,v12.2s,v27.2s
> +       add     x4,x4,x5,lsl#32         // bfi  x4,x5,#32,#32
> +       umlal   v22.2d,v10.2s,v3.s[0]
> +       extr    x8,x12,x8,#52
> +       umlal   v23.2d,v10.2s,v5.s[0]
> +       extr    x9,x13,x9,#52
> +       umlal   v19.2d,v10.2s,v8.s[0]
> +       add     x6,x6,x7,lsl#32         // bfi  x6,x7,#32,#32
> +       umlal   v21.2d,v10.2s,v1.s[0]
> +       fmov    d9,x4
> +       umlal   v20.2d,v10.2s,v0.s[0]
> +       and     x8,x8,#0x03ffffff
> +
> +       add     v13.2s,v13.2s,v28.2s
> +       and     x9,x9,#0x03ffffff
> +       umlal   v22.2d,v12.2s,v0.s[0]
> +       ubfx    x10,x12,#14,#26
> +       umlal   v19.2d,v12.2s,v4.s[0]
> +       ubfx    x11,x13,#14,#26
> +       umlal   v23.2d,v12.2s,v1.s[0]
> +       add     x8,x8,x9,lsl#32         // bfi  x8,x9,#32,#32
> +       umlal   v20.2d,v12.2s,v6.s[0]
> +       fmov    d10,x6
> +       umlal   v21.2d,v12.2s,v8.s[0]
> +       add     x12,x3,x12,lsr#40
> +
> +       umlal   v22.2d,v13.2s,v8.s[0]
> +       add     x13,x3,x13,lsr#40
> +       umlal   v19.2d,v13.2s,v2.s[0]
> +       add     x10,x10,x11,lsl#32      // bfi  x10,x11,#32,#32
> +       umlal   v23.2d,v13.2s,v0.s[0]
> +       add     x12,x12,x13,lsl#32      // bfi  x12,x13,#32,#32
> +       umlal   v20.2d,v13.2s,v4.s[0]
> +       fmov    d11,x8
> +       umlal   v21.2d,v13.2s,v6.s[0]
> +       fmov    d12,x10
> +       fmov    d13,x12
> +
> +       /////////////////////////////////////////////////////////////////
> +       // lazy reduction as discussed in "NEON crypto" by D.J. Bernstein
> +       // and P. Schwabe
> +       //
> +       // [see discussion in poly1305-armv4 module]
> +
> +       ushr    v29.2d,v22.2d,#26
> +       xtn     v27.2s,v22.2d
> +       ushr    v30.2d,v19.2d,#26
> +       and     v19.16b,v19.16b,v31.16b
> +       add     v23.2d,v23.2d,v29.2d    // h3 -> h4
> +       bic     v27.2s,#0xfc,lsl#24     // &=0x03ffffff
> +       add     v20.2d,v20.2d,v30.2d    // h0 -> h1
> +
> +       ushr    v29.2d,v23.2d,#26
> +       xtn     v28.2s,v23.2d
> +       ushr    v30.2d,v20.2d,#26
> +       xtn     v25.2s,v20.2d
> +       bic     v28.2s,#0xfc,lsl#24
> +       add     v21.2d,v21.2d,v30.2d    // h1 -> h2
> +
> +       add     v19.2d,v19.2d,v29.2d
> +       shl     v29.2d,v29.2d,#2
> +       shrn    v30.2s,v21.2d,#26
> +       xtn     v26.2s,v21.2d
> +       add     v19.2d,v19.2d,v29.2d    // h4 -> h0
> +       bic     v25.2s,#0xfc,lsl#24
> +       add     v27.2s,v27.2s,v30.2s            // h2 -> h3
> +       bic     v26.2s,#0xfc,lsl#24
> +
> +       shrn    v29.2s,v19.2d,#26
> +       xtn     v24.2s,v19.2d
> +       ushr    v30.2s,v27.2s,#26
> +       bic     v27.2s,#0xfc,lsl#24
> +       bic     v24.2s,#0xfc,lsl#24
> +       add     v25.2s,v25.2s,v29.2s            // h0 -> h1
> +       add     v28.2s,v28.2s,v30.2s            // h3 -> h4
> +
> +       b.hi    .Loop_neon
> +
> +.Lskip_loop:
> +       dup     v16.2d,v16.d[0]
> +       add     v11.2s,v11.2s,v26.2s
> +
> +       ////////////////////////////////////////////////////////////////
> +       // multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1
> +
> +       adds    x2,x2,#32
> +       b.ne    .Long_tail
> +
> +       dup     v16.2d,v11.d[0]
> +       add     v14.2s,v9.2s,v24.2s
> +       add     v17.2s,v12.2s,v27.2s
> +       add     v15.2s,v10.2s,v25.2s
> +       add     v18.2s,v13.2s,v28.2s
> +
> +.Long_tail:
> +       dup     v14.2d,v14.d[0]
> +       umull2  v19.2d,v16.4s,v6.4s
> +       umull2  v22.2d,v16.4s,v1.4s
> +       umull2  v23.2d,v16.4s,v3.4s
> +       umull2  v21.2d,v16.4s,v0.4s
> +       umull2  v20.2d,v16.4s,v8.4s
> +
> +       dup     v15.2d,v15.d[0]
> +       umlal2  v19.2d,v14.4s,v0.4s
> +       umlal2  v21.2d,v14.4s,v3.4s
> +       umlal2  v22.2d,v14.4s,v5.4s
> +       umlal2  v23.2d,v14.4s,v7.4s
> +       umlal2  v20.2d,v14.4s,v1.4s
> +
> +       dup     v17.2d,v17.d[0]
> +       umlal2  v19.2d,v15.4s,v8.4s
> +       umlal2  v22.2d,v15.4s,v3.4s
> +       umlal2  v21.2d,v15.4s,v1.4s
> +       umlal2  v23.2d,v15.4s,v5.4s
> +       umlal2  v20.2d,v15.4s,v0.4s
> +
> +       dup     v18.2d,v18.d[0]
> +       umlal2  v22.2d,v17.4s,v0.4s
> +       umlal2  v23.2d,v17.4s,v1.4s
> +       umlal2  v19.2d,v17.4s,v4.4s
> +       umlal2  v20.2d,v17.4s,v6.4s
> +       umlal2  v21.2d,v17.4s,v8.4s
> +
> +       umlal2  v22.2d,v18.4s,v8.4s
> +       umlal2  v19.2d,v18.4s,v2.4s
> +       umlal2  v23.2d,v18.4s,v0.4s
> +       umlal2  v20.2d,v18.4s,v4.4s
> +       umlal2  v21.2d,v18.4s,v6.4s
> +
> +       b.eq    .Lshort_tail
> +
> +       ////////////////////////////////////////////////////////////////
> +       // (hash+inp[0:1])*r^4:r^3 and accumulate
> +
> +       add     v9.2s,v9.2s,v24.2s
> +       umlal   v22.2d,v11.2s,v1.2s
> +       umlal   v19.2d,v11.2s,v6.2s
> +       umlal   v23.2d,v11.2s,v3.2s
> +       umlal   v20.2d,v11.2s,v8.2s
> +       umlal   v21.2d,v11.2s,v0.2s
> +
> +       add     v10.2s,v10.2s,v25.2s
> +       umlal   v22.2d,v9.2s,v5.2s
> +       umlal   v19.2d,v9.2s,v0.2s
> +       umlal   v23.2d,v9.2s,v7.2s
> +       umlal   v20.2d,v9.2s,v1.2s
> +       umlal   v21.2d,v9.2s,v3.2s
> +
> +       add     v12.2s,v12.2s,v27.2s
> +       umlal   v22.2d,v10.2s,v3.2s
> +       umlal   v19.2d,v10.2s,v8.2s
> +       umlal   v23.2d,v10.2s,v5.2s
> +       umlal   v20.2d,v10.2s,v0.2s
> +       umlal   v21.2d,v10.2s,v1.2s
> +
> +       add     v13.2s,v13.2s,v28.2s
> +       umlal   v22.2d,v12.2s,v0.2s
> +       umlal   v19.2d,v12.2s,v4.2s
> +       umlal   v23.2d,v12.2s,v1.2s
> +       umlal   v20.2d,v12.2s,v6.2s
> +       umlal   v21.2d,v12.2s,v8.2s
> +
> +       umlal   v22.2d,v13.2s,v8.2s
> +       umlal   v19.2d,v13.2s,v2.2s
> +       umlal   v23.2d,v13.2s,v0.2s
> +       umlal   v20.2d,v13.2s,v4.2s
> +       umlal   v21.2d,v13.2s,v6.2s
> +
> +.Lshort_tail:
> +       ////////////////////////////////////////////////////////////////
> +       // horizontal add
> +
> +       addp    v22.2d,v22.2d,v22.2d
> +       ldp     d8,d9,[sp,#16]          // meet ABI requirements
> +       addp    v19.2d,v19.2d,v19.2d
> +       ldp     d10,d11,[sp,#32]
> +       addp    v23.2d,v23.2d,v23.2d
> +       ldp     d12,d13,[sp,#48]
> +       addp    v20.2d,v20.2d,v20.2d
> +       ldp     d14,d15,[sp,#64]
> +       addp    v21.2d,v21.2d,v21.2d
> +
> +       ////////////////////////////////////////////////////////////////
> +       // lazy reduction, but without narrowing
> +
> +       ushr    v29.2d,v22.2d,#26
> +       and     v22.16b,v22.16b,v31.16b
> +       ushr    v30.2d,v19.2d,#26
> +       and     v19.16b,v19.16b,v31.16b
> +
> +       add     v23.2d,v23.2d,v29.2d    // h3 -> h4
> +       add     v20.2d,v20.2d,v30.2d    // h0 -> h1
> +
> +       ushr    v29.2d,v23.2d,#26
> +       and     v23.16b,v23.16b,v31.16b
> +       ushr    v30.2d,v20.2d,#26
> +       and     v20.16b,v20.16b,v31.16b
> +       add     v21.2d,v21.2d,v30.2d    // h1 -> h2
> +
> +       add     v19.2d,v19.2d,v29.2d
> +       shl     v29.2d,v29.2d,#2
> +       ushr    v30.2d,v21.2d,#26
> +       and     v21.16b,v21.16b,v31.16b
> +       add     v19.2d,v19.2d,v29.2d    // h4 -> h0
> +       add     v22.2d,v22.2d,v30.2d    // h2 -> h3
> +
> +       ushr    v29.2d,v19.2d,#26
> +       and     v19.16b,v19.16b,v31.16b
> +       ushr    v30.2d,v22.2d,#26
> +       and     v22.16b,v22.16b,v31.16b
> +       add     v20.2d,v20.2d,v29.2d    // h0 -> h1
> +       add     v23.2d,v23.2d,v30.2d    // h3 -> h4
> +
> +       ////////////////////////////////////////////////////////////////
> +       // write the result, can be partially reduced
> +
> +       st4     {v19.s,v20.s,v21.s,v22.s}[0],[x0],#16
> +       st1     {v23.s}[0],[x0]
> +
> +.Lno_data_neon:
> +       ldr     x29,[sp],#80
> +       ret
> +ENDPROC(poly1305_blocks_neon)
> +
> +.align 5
> +ENTRY(poly1305_emit_neon)
> +       ldr     x17,[x0,#24]
> +       cbz     x17,poly1305_emit_arm
> +
> +       ldp     w10,w11,[x0]            // load hash value base 2^26
> +       ldp     w12,w13,[x0,#8]
> +       ldr     w14,[x0,#16]
> +
> +       add     x4,x10,x11,lsl#26       // base 2^26 -> base 2^64
> +       lsr     x5,x12,#12
> +       adds    x4,x4,x12,lsl#52
> +       add     x5,x5,x13,lsl#14
> +       adc     x5,x5,xzr
> +       lsr     x6,x14,#24
> +       adds    x5,x5,x14,lsl#40
> +       adc     x6,x6,xzr               // can be partially reduced...
> +
> +       ldp     x10,x11,[x2]    // load nonce
> +
> +       and     x12,x6,#-4              // ... so reduce
> +       add     x12,x12,x6,lsr#2
> +       and     x6,x6,#3
> +       adds    x4,x4,x12
> +       adcs    x5,x5,xzr
> +       adc     x6,x6,xzr
> +
> +       adds    x12,x4,#5               // compare to modulus
> +       adcs    x13,x5,xzr
> +       adc     x14,x6,xzr
> +
> +       tst     x14,#-4                 // see if it's carried/borrowed
> +
> +       csel    x4,x4,x12,eq
> +       csel    x5,x5,x13,eq
> +
> +#ifdef __ARMEB__
> +       ror     x10,x10,#32             // flip nonce words
> +       ror     x11,x11,#32
> +#endif
> +       adds    x4,x4,x10               // accumulate nonce
> +       adc     x5,x5,x11
> +#ifdef __ARMEB__
> +       rev     x4,x4                   // flip output bytes
> +       rev     x5,x5
> +#endif
> +       stp     x4,x5,[x1]              // write result
> +
> +       ret
> +ENDPROC(poly1305_emit_neon)
> +
> +.align 5
> +.Lzeros:
> +.long  0,0,0,0,0,0,0,0
> --
> 2.19.0
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
  2018-09-14 16:22 ` [PATCH net-next v4 18/20] crypto: port ChaCha20 " Jason A. Donenfeld
@ 2018-09-14 17:38   ` Ard Biesheuvel
  2018-09-14 17:49     ` Jason A. Donenfeld
       [not found]   ` <201809160711.HzjdJedZ%fengguang.wu@intel.com>
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 38+ messages in thread
From: Ard Biesheuvel @ 2018-09-14 17:38 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linux Kernel Mailing List, <netdev@vger.kernel.org>,
	open list:HARDWARE RANDOM NUMBER GENERATOR CORE, David S. Miller,
	Greg Kroah-Hartman, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers

On 14 September 2018 at 18:22, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Now that ChaCha20 is in Zinc, we can have the crypto API code simply
> call into it. The crypto API expects to have a stored key per instance
> and independent nonces, so we follow suite and store the key and
> initialize the nonce independently.
>

From our exchange re v3:

>> Then there is the performance claim. We know for instance that the
>> OpenSSL ARM NEON code for ChaCha20 is faster on cores that happen to
>> possess a micro-architectural property that ALU instructions are
>> essentially free when they are interleaved with SIMD instructions. But
>> we also know that a) Cortex-A7, which is a relevant target, is not one
>> of those cores, and b) that chip designers are not likely to optimize
>> for that particular usage pattern so relying on it in generic code is
>> unwise in general.
>
> That's interesting. I'll bring this up with AndyP. FWIW, if you think
> you have a real and compelling claim here, I'd be much more likely to
> accept a different ChaCha20 implementation than I would be to accept a
> different Poly1305 implementation. (It's a *lot* harder to screw up
> ChaCha20 than it is to screw up Poly1305.)
>

so could we please bring that discussion to a close before we drop the ARM code?

I am fine with dropping the arm64 code btw.

> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Samuel Neves <sneves@dei.uc.pt>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
> Cc: Eric Biggers <ebiggers@google.com>
> ---
>  arch/arm/configs/exynos_defconfig       |   1 -
>  arch/arm/configs/multi_v7_defconfig     |   1 -
>  arch/arm/configs/omap2plus_defconfig    |   1 -
>  arch/arm/crypto/Kconfig                 |   6 -
>  arch/arm/crypto/Makefile                |   2 -
>  arch/arm/crypto/chacha20-neon-core.S    | 521 --------------------
>  arch/arm/crypto/chacha20-neon-glue.c    | 127 -----
>  arch/arm64/configs/defconfig            |   1 -
>  arch/arm64/crypto/Kconfig               |   6 -
>  arch/arm64/crypto/Makefile              |   3 -
>  arch/arm64/crypto/chacha20-neon-core.S  | 450 -----------------
>  arch/arm64/crypto/chacha20-neon-glue.c  | 133 -----
>  arch/x86/crypto/Makefile                |   3 -
>  arch/x86/crypto/chacha20-avx2-x86_64.S  | 448 -----------------
>  arch/x86/crypto/chacha20-ssse3-x86_64.S | 630 ------------------------
>  arch/x86/crypto/chacha20_glue.c         | 146 ------
>  crypto/Kconfig                          |  16 -
>  crypto/Makefile                         |   2 +-
>  crypto/chacha20_generic.c               | 136 -----
>  crypto/chacha20_zinc.c                  | 100 ++++
>  crypto/chacha20poly1305.c               |   2 +-
>  include/crypto/chacha20.h               |  12 -
>  22 files changed, 102 insertions(+), 2645 deletions(-)
>  delete mode 100644 arch/arm/crypto/chacha20-neon-core.S
>  delete mode 100644 arch/arm/crypto/chacha20-neon-glue.c
>  delete mode 100644 arch/arm64/crypto/chacha20-neon-core.S
>  delete mode 100644 arch/arm64/crypto/chacha20-neon-glue.c
>  delete mode 100644 arch/x86/crypto/chacha20-avx2-x86_64.S
>  delete mode 100644 arch/x86/crypto/chacha20-ssse3-x86_64.S
>  delete mode 100644 arch/x86/crypto/chacha20_glue.c
>  delete mode 100644 crypto/chacha20_generic.c
>  create mode 100644 crypto/chacha20_zinc.c
>
> diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig
> index 27ea6dfcf2f2..95929b5e7b10 100644
> --- a/arch/arm/configs/exynos_defconfig
> +++ b/arch/arm/configs/exynos_defconfig
> @@ -350,7 +350,6 @@ CONFIG_CRYPTO_SHA1_ARM_NEON=m
>  CONFIG_CRYPTO_SHA256_ARM=m
>  CONFIG_CRYPTO_SHA512_ARM=m
>  CONFIG_CRYPTO_AES_ARM_BS=m
> -CONFIG_CRYPTO_CHACHA20_NEON=m
>  CONFIG_CRC_CCITT=y
>  CONFIG_FONTS=y
>  CONFIG_FONT_7x14=y
> diff --git a/arch/arm/configs/multi_v7_defconfig b/arch/arm/configs/multi_v7_defconfig
> index fc33444e94f0..63be07724db3 100644
> --- a/arch/arm/configs/multi_v7_defconfig
> +++ b/arch/arm/configs/multi_v7_defconfig
> @@ -1000,4 +1000,3 @@ CONFIG_CRYPTO_AES_ARM_BS=m
>  CONFIG_CRYPTO_AES_ARM_CE=m
>  CONFIG_CRYPTO_GHASH_ARM_CE=m
>  CONFIG_CRYPTO_CRC32_ARM_CE=m
> -CONFIG_CRYPTO_CHACHA20_NEON=m
> diff --git a/arch/arm/configs/omap2plus_defconfig b/arch/arm/configs/omap2plus_defconfig
> index 6491419b1dad..f585a8ecc336 100644
> --- a/arch/arm/configs/omap2plus_defconfig
> +++ b/arch/arm/configs/omap2plus_defconfig
> @@ -547,7 +547,6 @@ CONFIG_CRYPTO_SHA512_ARM=m
>  CONFIG_CRYPTO_AES_ARM=m
>  CONFIG_CRYPTO_AES_ARM_BS=m
>  CONFIG_CRYPTO_GHASH_ARM_CE=m
> -CONFIG_CRYPTO_CHACHA20_NEON=m
>  CONFIG_CRC_CCITT=y
>  CONFIG_CRC_T10DIF=y
>  CONFIG_CRC_ITU_T=y
> diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
> index 925d1364727a..fb80fd89f0e7 100644
> --- a/arch/arm/crypto/Kconfig
> +++ b/arch/arm/crypto/Kconfig
> @@ -115,12 +115,6 @@ config CRYPTO_CRC32_ARM_CE
>         depends on KERNEL_MODE_NEON && CRC32
>         select CRYPTO_HASH
>
> -config CRYPTO_CHACHA20_NEON
> -       tristate "NEON accelerated ChaCha20 symmetric cipher"
> -       depends on KERNEL_MODE_NEON
> -       select CRYPTO_BLKCIPHER
> -       select CRYPTO_CHACHA20
> -
>  config CRYPTO_SPECK_NEON
>         tristate "NEON accelerated Speck cipher algorithms"
>         depends on KERNEL_MODE_NEON
> diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
> index 8de542c48ade..bbfa98447063 100644
> --- a/arch/arm/crypto/Makefile
> +++ b/arch/arm/crypto/Makefile
> @@ -9,7 +9,6 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
>  obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
>  obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
>  obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
> -obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
>  obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>
>  ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
> @@ -53,7 +52,6 @@ aes-arm-ce-y  := aes-ce-core.o aes-ce-glue.o
>  ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
>  crct10dif-arm-ce-y     := crct10dif-ce-core.o crct10dif-ce-glue.o
>  crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
> -chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
>  speck-neon-y := speck-neon-core.o speck-neon-glue.o
>
>  ifdef REGENERATE_ARM_CRYPTO
> diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S
> deleted file mode 100644
> index 451a849ad518..000000000000
> --- a/arch/arm/crypto/chacha20-neon-core.S
> +++ /dev/null
> @@ -1,521 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
> - *
> - * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Based on:
> - * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSE3 functions
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <linux/linkage.h>
> -
> -       .text
> -       .fpu            neon
> -       .align          5
> -
> -ENTRY(chacha20_block_xor_neon)
> -       // r0: Input state matrix, s
> -       // r1: 1 data block output, o
> -       // r2: 1 data block input, i
> -
> -       //
> -       // This function encrypts one ChaCha20 block by loading the state matrix
> -       // in four NEON registers. It performs matrix operation on four words in
> -       // parallel, but requireds shuffling to rearrange the words after each
> -       // round.
> -       //
> -
> -       // x0..3 = s0..3
> -       add             ip, r0, #0x20
> -       vld1.32         {q0-q1}, [r0]
> -       vld1.32         {q2-q3}, [ip]
> -
> -       vmov            q8, q0
> -       vmov            q9, q1
> -       vmov            q10, q2
> -       vmov            q11, q3
> -
> -       mov             r3, #10
> -
> -.Ldoubleround:
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       vadd.i32        q0, q0, q1
> -       veor            q3, q3, q0
> -       vrev32.16       q3, q3
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       vadd.i32        q2, q2, q3
> -       veor            q4, q1, q2
> -       vshl.u32        q1, q4, #12
> -       vsri.u32        q1, q4, #20
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       vadd.i32        q0, q0, q1
> -       veor            q4, q3, q0
> -       vshl.u32        q3, q4, #8
> -       vsri.u32        q3, q4, #24
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       vadd.i32        q2, q2, q3
> -       veor            q4, q1, q2
> -       vshl.u32        q1, q4, #7
> -       vsri.u32        q1, q4, #25
> -
> -       // x1 = shuffle32(x1, MASK(0, 3, 2, 1))
> -       vext.8          q1, q1, q1, #4
> -       // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       vext.8          q2, q2, q2, #8
> -       // x3 = shuffle32(x3, MASK(2, 1, 0, 3))
> -       vext.8          q3, q3, q3, #12
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       vadd.i32        q0, q0, q1
> -       veor            q3, q3, q0
> -       vrev32.16       q3, q3
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       vadd.i32        q2, q2, q3
> -       veor            q4, q1, q2
> -       vshl.u32        q1, q4, #12
> -       vsri.u32        q1, q4, #20
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       vadd.i32        q0, q0, q1
> -       veor            q4, q3, q0
> -       vshl.u32        q3, q4, #8
> -       vsri.u32        q3, q4, #24
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       vadd.i32        q2, q2, q3
> -       veor            q4, q1, q2
> -       vshl.u32        q1, q4, #7
> -       vsri.u32        q1, q4, #25
> -
> -       // x1 = shuffle32(x1, MASK(2, 1, 0, 3))
> -       vext.8          q1, q1, q1, #12
> -       // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       vext.8          q2, q2, q2, #8
> -       // x3 = shuffle32(x3, MASK(0, 3, 2, 1))
> -       vext.8          q3, q3, q3, #4
> -
> -       subs            r3, r3, #1
> -       bne             .Ldoubleround
> -
> -       add             ip, r2, #0x20
> -       vld1.8          {q4-q5}, [r2]
> -       vld1.8          {q6-q7}, [ip]
> -
> -       // o0 = i0 ^ (x0 + s0)
> -       vadd.i32        q0, q0, q8
> -       veor            q0, q0, q4
> -
> -       // o1 = i1 ^ (x1 + s1)
> -       vadd.i32        q1, q1, q9
> -       veor            q1, q1, q5
> -
> -       // o2 = i2 ^ (x2 + s2)
> -       vadd.i32        q2, q2, q10
> -       veor            q2, q2, q6
> -
> -       // o3 = i3 ^ (x3 + s3)
> -       vadd.i32        q3, q3, q11
> -       veor            q3, q3, q7
> -
> -       add             ip, r1, #0x20
> -       vst1.8          {q0-q1}, [r1]
> -       vst1.8          {q2-q3}, [ip]
> -
> -       bx              lr
> -ENDPROC(chacha20_block_xor_neon)
> -
> -       .align          5
> -ENTRY(chacha20_4block_xor_neon)
> -       push            {r4-r6, lr}
> -       mov             ip, sp                  // preserve the stack pointer
> -       sub             r3, sp, #0x20           // allocate a 32 byte buffer
> -       bic             r3, r3, #0x1f           // aligned to 32 bytes
> -       mov             sp, r3
> -
> -       // r0: Input state matrix, s
> -       // r1: 4 data blocks output, o
> -       // r2: 4 data blocks input, i
> -
> -       //
> -       // This function encrypts four consecutive ChaCha20 blocks by loading
> -       // the state matrix in NEON registers four times. The algorithm performs
> -       // each operation on the corresponding word of each state matrix, hence
> -       // requires no word shuffling. For final XORing step we transpose the
> -       // matrix by interleaving 32- and then 64-bit words, which allows us to
> -       // do XOR in NEON registers.
> -       //
> -
> -       // x0..15[0-3] = s0..3[0..3]
> -       add             r3, r0, #0x20
> -       vld1.32         {q0-q1}, [r0]
> -       vld1.32         {q2-q3}, [r3]
> -
> -       adr             r3, CTRINC
> -       vdup.32         q15, d7[1]
> -       vdup.32         q14, d7[0]
> -       vld1.32         {q11}, [r3, :128]
> -       vdup.32         q13, d6[1]
> -       vdup.32         q12, d6[0]
> -       vadd.i32        q12, q12, q11           // x12 += counter values 0-3
> -       vdup.32         q11, d5[1]
> -       vdup.32         q10, d5[0]
> -       vdup.32         q9, d4[1]
> -       vdup.32         q8, d4[0]
> -       vdup.32         q7, d3[1]
> -       vdup.32         q6, d3[0]
> -       vdup.32         q5, d2[1]
> -       vdup.32         q4, d2[0]
> -       vdup.32         q3, d1[1]
> -       vdup.32         q2, d1[0]
> -       vdup.32         q1, d0[1]
> -       vdup.32         q0, d0[0]
> -
> -       mov             r3, #10
> -
> -.Ldoubleround4:
> -       // x0 += x4, x12 = rotl32(x12 ^ x0, 16)
> -       // x1 += x5, x13 = rotl32(x13 ^ x1, 16)
> -       // x2 += x6, x14 = rotl32(x14 ^ x2, 16)
> -       // x3 += x7, x15 = rotl32(x15 ^ x3, 16)
> -       vadd.i32        q0, q0, q4
> -       vadd.i32        q1, q1, q5
> -       vadd.i32        q2, q2, q6
> -       vadd.i32        q3, q3, q7
> -
> -       veor            q12, q12, q0
> -       veor            q13, q13, q1
> -       veor            q14, q14, q2
> -       veor            q15, q15, q3
> -
> -       vrev32.16       q12, q12
> -       vrev32.16       q13, q13
> -       vrev32.16       q14, q14
> -       vrev32.16       q15, q15
> -
> -       // x8 += x12, x4 = rotl32(x4 ^ x8, 12)
> -       // x9 += x13, x5 = rotl32(x5 ^ x9, 12)
> -       // x10 += x14, x6 = rotl32(x6 ^ x10, 12)
> -       // x11 += x15, x7 = rotl32(x7 ^ x11, 12)
> -       vadd.i32        q8, q8, q12
> -       vadd.i32        q9, q9, q13
> -       vadd.i32        q10, q10, q14
> -       vadd.i32        q11, q11, q15
> -
> -       vst1.32         {q8-q9}, [sp, :256]
> -
> -       veor            q8, q4, q8
> -       veor            q9, q5, q9
> -       vshl.u32        q4, q8, #12
> -       vshl.u32        q5, q9, #12
> -       vsri.u32        q4, q8, #20
> -       vsri.u32        q5, q9, #20
> -
> -       veor            q8, q6, q10
> -       veor            q9, q7, q11
> -       vshl.u32        q6, q8, #12
> -       vshl.u32        q7, q9, #12
> -       vsri.u32        q6, q8, #20
> -       vsri.u32        q7, q9, #20
> -
> -       // x0 += x4, x12 = rotl32(x12 ^ x0, 8)
> -       // x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> -       // x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> -       // x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> -       vadd.i32        q0, q0, q4
> -       vadd.i32        q1, q1, q5
> -       vadd.i32        q2, q2, q6
> -       vadd.i32        q3, q3, q7
> -
> -       veor            q8, q12, q0
> -       veor            q9, q13, q1
> -       vshl.u32        q12, q8, #8
> -       vshl.u32        q13, q9, #8
> -       vsri.u32        q12, q8, #24
> -       vsri.u32        q13, q9, #24
> -
> -       veor            q8, q14, q2
> -       veor            q9, q15, q3
> -       vshl.u32        q14, q8, #8
> -       vshl.u32        q15, q9, #8
> -       vsri.u32        q14, q8, #24
> -       vsri.u32        q15, q9, #24
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -
> -       // x8 += x12, x4 = rotl32(x4 ^ x8, 7)
> -       // x9 += x13, x5 = rotl32(x5 ^ x9, 7)
> -       // x10 += x14, x6 = rotl32(x6 ^ x10, 7)
> -       // x11 += x15, x7 = rotl32(x7 ^ x11, 7)
> -       vadd.i32        q8, q8, q12
> -       vadd.i32        q9, q9, q13
> -       vadd.i32        q10, q10, q14
> -       vadd.i32        q11, q11, q15
> -
> -       vst1.32         {q8-q9}, [sp, :256]
> -
> -       veor            q8, q4, q8
> -       veor            q9, q5, q9
> -       vshl.u32        q4, q8, #7
> -       vshl.u32        q5, q9, #7
> -       vsri.u32        q4, q8, #25
> -       vsri.u32        q5, q9, #25
> -
> -       veor            q8, q6, q10
> -       veor            q9, q7, q11
> -       vshl.u32        q6, q8, #7
> -       vshl.u32        q7, q9, #7
> -       vsri.u32        q6, q8, #25
> -       vsri.u32        q7, q9, #25
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -
> -       // x0 += x5, x15 = rotl32(x15 ^ x0, 16)
> -       // x1 += x6, x12 = rotl32(x12 ^ x1, 16)
> -       // x2 += x7, x13 = rotl32(x13 ^ x2, 16)
> -       // x3 += x4, x14 = rotl32(x14 ^ x3, 16)
> -       vadd.i32        q0, q0, q5
> -       vadd.i32        q1, q1, q6
> -       vadd.i32        q2, q2, q7
> -       vadd.i32        q3, q3, q4
> -
> -       veor            q15, q15, q0
> -       veor            q12, q12, q1
> -       veor            q13, q13, q2
> -       veor            q14, q14, q3
> -
> -       vrev32.16       q15, q15
> -       vrev32.16       q12, q12
> -       vrev32.16       q13, q13
> -       vrev32.16       q14, q14
> -
> -       // x10 += x15, x5 = rotl32(x5 ^ x10, 12)
> -       // x11 += x12, x6 = rotl32(x6 ^ x11, 12)
> -       // x8 += x13, x7 = rotl32(x7 ^ x8, 12)
> -       // x9 += x14, x4 = rotl32(x4 ^ x9, 12)
> -       vadd.i32        q10, q10, q15
> -       vadd.i32        q11, q11, q12
> -       vadd.i32        q8, q8, q13
> -       vadd.i32        q9, q9, q14
> -
> -       vst1.32         {q8-q9}, [sp, :256]
> -
> -       veor            q8, q7, q8
> -       veor            q9, q4, q9
> -       vshl.u32        q7, q8, #12
> -       vshl.u32        q4, q9, #12
> -       vsri.u32        q7, q8, #20
> -       vsri.u32        q4, q9, #20
> -
> -       veor            q8, q5, q10
> -       veor            q9, q6, q11
> -       vshl.u32        q5, q8, #12
> -       vshl.u32        q6, q9, #12
> -       vsri.u32        q5, q8, #20
> -       vsri.u32        q6, q9, #20
> -
> -       // x0 += x5, x15 = rotl32(x15 ^ x0, 8)
> -       // x1 += x6, x12 = rotl32(x12 ^ x1, 8)
> -       // x2 += x7, x13 = rotl32(x13 ^ x2, 8)
> -       // x3 += x4, x14 = rotl32(x14 ^ x3, 8)
> -       vadd.i32        q0, q0, q5
> -       vadd.i32        q1, q1, q6
> -       vadd.i32        q2, q2, q7
> -       vadd.i32        q3, q3, q4
> -
> -       veor            q8, q15, q0
> -       veor            q9, q12, q1
> -       vshl.u32        q15, q8, #8
> -       vshl.u32        q12, q9, #8
> -       vsri.u32        q15, q8, #24
> -       vsri.u32        q12, q9, #24
> -
> -       veor            q8, q13, q2
> -       veor            q9, q14, q3
> -       vshl.u32        q13, q8, #8
> -       vshl.u32        q14, q9, #8
> -       vsri.u32        q13, q8, #24
> -       vsri.u32        q14, q9, #24
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -
> -       // x10 += x15, x5 = rotl32(x5 ^ x10, 7)
> -       // x11 += x12, x6 = rotl32(x6 ^ x11, 7)
> -       // x8 += x13, x7 = rotl32(x7 ^ x8, 7)
> -       // x9 += x14, x4 = rotl32(x4 ^ x9, 7)
> -       vadd.i32        q10, q10, q15
> -       vadd.i32        q11, q11, q12
> -       vadd.i32        q8, q8, q13
> -       vadd.i32        q9, q9, q14
> -
> -       vst1.32         {q8-q9}, [sp, :256]
> -
> -       veor            q8, q7, q8
> -       veor            q9, q4, q9
> -       vshl.u32        q7, q8, #7
> -       vshl.u32        q4, q9, #7
> -       vsri.u32        q7, q8, #25
> -       vsri.u32        q4, q9, #25
> -
> -       veor            q8, q5, q10
> -       veor            q9, q6, q11
> -       vshl.u32        q5, q8, #7
> -       vshl.u32        q6, q9, #7
> -       vsri.u32        q5, q8, #25
> -       vsri.u32        q6, q9, #25
> -
> -       subs            r3, r3, #1
> -       beq             0f
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -       b               .Ldoubleround4
> -
> -       // x0[0-3] += s0[0]
> -       // x1[0-3] += s0[1]
> -       // x2[0-3] += s0[2]
> -       // x3[0-3] += s0[3]
> -0:     ldmia           r0!, {r3-r6}
> -       vdup.32         q8, r3
> -       vdup.32         q9, r4
> -       vadd.i32        q0, q0, q8
> -       vadd.i32        q1, q1, q9
> -       vdup.32         q8, r5
> -       vdup.32         q9, r6
> -       vadd.i32        q2, q2, q8
> -       vadd.i32        q3, q3, q9
> -
> -       // x4[0-3] += s1[0]
> -       // x5[0-3] += s1[1]
> -       // x6[0-3] += s1[2]
> -       // x7[0-3] += s1[3]
> -       ldmia           r0!, {r3-r6}
> -       vdup.32         q8, r3
> -       vdup.32         q9, r4
> -       vadd.i32        q4, q4, q8
> -       vadd.i32        q5, q5, q9
> -       vdup.32         q8, r5
> -       vdup.32         q9, r6
> -       vadd.i32        q6, q6, q8
> -       vadd.i32        q7, q7, q9
> -
> -       // interleave 32-bit words in state n, n+1
> -       vzip.32         q0, q1
> -       vzip.32         q2, q3
> -       vzip.32         q4, q5
> -       vzip.32         q6, q7
> -
> -       // interleave 64-bit words in state n, n+2
> -       vswp            d1, d4
> -       vswp            d3, d6
> -       vswp            d9, d12
> -       vswp            d11, d14
> -
> -       // xor with corresponding input, write to output
> -       vld1.8          {q8-q9}, [r2]!
> -       veor            q8, q8, q0
> -       veor            q9, q9, q4
> -       vst1.8          {q8-q9}, [r1]!
> -
> -       vld1.32         {q8-q9}, [sp, :256]
> -
> -       // x8[0-3] += s2[0]
> -       // x9[0-3] += s2[1]
> -       // x10[0-3] += s2[2]
> -       // x11[0-3] += s2[3]
> -       ldmia           r0!, {r3-r6}
> -       vdup.32         q0, r3
> -       vdup.32         q4, r4
> -       vadd.i32        q8, q8, q0
> -       vadd.i32        q9, q9, q4
> -       vdup.32         q0, r5
> -       vdup.32         q4, r6
> -       vadd.i32        q10, q10, q0
> -       vadd.i32        q11, q11, q4
> -
> -       // x12[0-3] += s3[0]
> -       // x13[0-3] += s3[1]
> -       // x14[0-3] += s3[2]
> -       // x15[0-3] += s3[3]
> -       ldmia           r0!, {r3-r6}
> -       vdup.32         q0, r3
> -       vdup.32         q4, r4
> -       adr             r3, CTRINC
> -       vadd.i32        q12, q12, q0
> -       vld1.32         {q0}, [r3, :128]
> -       vadd.i32        q13, q13, q4
> -       vadd.i32        q12, q12, q0            // x12 += counter values 0-3
> -
> -       vdup.32         q0, r5
> -       vdup.32         q4, r6
> -       vadd.i32        q14, q14, q0
> -       vadd.i32        q15, q15, q4
> -
> -       // interleave 32-bit words in state n, n+1
> -       vzip.32         q8, q9
> -       vzip.32         q10, q11
> -       vzip.32         q12, q13
> -       vzip.32         q14, q15
> -
> -       // interleave 64-bit words in state n, n+2
> -       vswp            d17, d20
> -       vswp            d19, d22
> -       vswp            d25, d28
> -       vswp            d27, d30
> -
> -       vmov            q4, q1
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q8
> -       veor            q1, q1, q12
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q2
> -       veor            q1, q1, q6
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q10
> -       veor            q1, q1, q14
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q4
> -       veor            q1, q1, q5
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q9
> -       veor            q1, q1, q13
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]!
> -       veor            q0, q0, q3
> -       veor            q1, q1, q7
> -       vst1.8          {q0-q1}, [r1]!
> -
> -       vld1.8          {q0-q1}, [r2]
> -       veor            q0, q0, q11
> -       veor            q1, q1, q15
> -       vst1.8          {q0-q1}, [r1]
> -
> -       mov             sp, ip
> -       pop             {r4-r6, pc}
> -ENDPROC(chacha20_4block_xor_neon)
> -
> -       .align          4
> -CTRINC:        .word           0, 1, 2, 3
> diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c
> deleted file mode 100644
> index 59a7be08e80c..000000000000
> --- a/arch/arm/crypto/chacha20-neon-glue.c
> +++ /dev/null
> @@ -1,127 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
> - *
> - * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Based on:
> - * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <crypto/algapi.h>
> -#include <crypto/chacha20.h>
> -#include <crypto/internal/skcipher.h>
> -#include <linux/kernel.h>
> -#include <linux/module.h>
> -
> -#include <asm/hwcap.h>
> -#include <asm/neon.h>
> -#include <asm/simd.h>
> -
> -asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
> -asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
> -
> -static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
> -                           unsigned int bytes)
> -{
> -       u8 buf[CHACHA20_BLOCK_SIZE];
> -
> -       while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
> -               chacha20_4block_xor_neon(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE * 4;
> -               src += CHACHA20_BLOCK_SIZE * 4;
> -               dst += CHACHA20_BLOCK_SIZE * 4;
> -               state[12] += 4;
> -       }
> -       while (bytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_block_xor_neon(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE;
> -               src += CHACHA20_BLOCK_SIZE;
> -               dst += CHACHA20_BLOCK_SIZE;
> -               state[12]++;
> -       }
> -       if (bytes) {
> -               memcpy(buf, src, bytes);
> -               chacha20_block_xor_neon(state, buf, buf);
> -               memcpy(dst, buf, bytes);
> -       }
> -}
> -
> -static int chacha20_neon(struct skcipher_request *req)
> -{
> -       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       struct skcipher_walk walk;
> -       u32 state[16];
> -       int err;
> -
> -       if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
> -               return crypto_chacha20_crypt(req);
> -
> -       err = skcipher_walk_virt(&walk, req, true);
> -
> -       crypto_chacha20_init(state, ctx, walk.iv);
> -
> -       kernel_neon_begin();
> -       while (walk.nbytes > 0) {
> -               unsigned int nbytes = walk.nbytes;
> -
> -               if (nbytes < walk.total)
> -                       nbytes = round_down(nbytes, walk.stride);
> -
> -               chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                               nbytes);
> -               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> -       }
> -       kernel_neon_end();
> -
> -       return err;
> -}
> -
> -static struct skcipher_alg alg = {
> -       .base.cra_name          = "chacha20",
> -       .base.cra_driver_name   = "chacha20-neon",
> -       .base.cra_priority      = 300,
> -       .base.cra_blocksize     = 1,
> -       .base.cra_ctxsize       = sizeof(struct chacha20_ctx),
> -       .base.cra_module        = THIS_MODULE,
> -
> -       .min_keysize            = CHACHA20_KEY_SIZE,
> -       .max_keysize            = CHACHA20_KEY_SIZE,
> -       .ivsize                 = CHACHA20_IV_SIZE,
> -       .chunksize              = CHACHA20_BLOCK_SIZE,
> -       .walksize               = 4 * CHACHA20_BLOCK_SIZE,
> -       .setkey                 = crypto_chacha20_setkey,
> -       .encrypt                = chacha20_neon,
> -       .decrypt                = chacha20_neon,
> -};
> -
> -static int __init chacha20_simd_mod_init(void)
> -{
> -       if (!(elf_hwcap & HWCAP_NEON))
> -               return -ENODEV;
> -
> -       return crypto_register_skcipher(&alg);
> -}
> -
> -static void __exit chacha20_simd_mod_fini(void)
> -{
> -       crypto_unregister_skcipher(&alg);
> -}
> -
> -module_init(chacha20_simd_mod_init);
> -module_exit(chacha20_simd_mod_fini);
> -
> -MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
> -MODULE_LICENSE("GPL v2");
> -MODULE_ALIAS_CRYPTO("chacha20");
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index db8d364f8476..6cc3c8a0ad88 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -709,5 +709,4 @@ CONFIG_CRYPTO_CRCT10DIF_ARM64_CE=m
>  CONFIG_CRYPTO_CRC32_ARM64_CE=m
>  CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
>  CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
> -CONFIG_CRYPTO_CHACHA20_NEON=m
>  CONFIG_CRYPTO_AES_ARM64_BS=m
> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> index e3fdb0fd6f70..9db6d775a880 100644
> --- a/arch/arm64/crypto/Kconfig
> +++ b/arch/arm64/crypto/Kconfig
> @@ -105,12 +105,6 @@ config CRYPTO_AES_ARM64_NEON_BLK
>         select CRYPTO_AES
>         select CRYPTO_SIMD
>
> -config CRYPTO_CHACHA20_NEON
> -       tristate "NEON accelerated ChaCha20 symmetric cipher"
> -       depends on KERNEL_MODE_NEON
> -       select CRYPTO_BLKCIPHER
> -       select CRYPTO_CHACHA20
> -
>  config CRYPTO_AES_ARM64_BS
>         tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
>         depends on KERNEL_MODE_NEON
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index bcafd016618e..507c4bfb86e3 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -53,9 +53,6 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
>  obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
>  sha512-arm64-y := sha512-glue.o sha512-core.o
>
> -obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
> -chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
> -
>  obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
>  speck-neon-y := speck-neon-core.o speck-neon-glue.o
>
> diff --git a/arch/arm64/crypto/chacha20-neon-core.S b/arch/arm64/crypto/chacha20-neon-core.S
> deleted file mode 100644
> index 13c85e272c2a..000000000000
> --- a/arch/arm64/crypto/chacha20-neon-core.S
> +++ /dev/null
> @@ -1,450 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
> - *
> - * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Based on:
> - * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <linux/linkage.h>
> -
> -       .text
> -       .align          6
> -
> -ENTRY(chacha20_block_xor_neon)
> -       // x0: Input state matrix, s
> -       // x1: 1 data block output, o
> -       // x2: 1 data block input, i
> -
> -       //
> -       // This function encrypts one ChaCha20 block by loading the state matrix
> -       // in four NEON registers. It performs matrix operation on four words in
> -       // parallel, but requires shuffling to rearrange the words after each
> -       // round.
> -       //
> -
> -       // x0..3 = s0..3
> -       adr             x3, ROT8
> -       ld1             {v0.4s-v3.4s}, [x0]
> -       ld1             {v8.4s-v11.4s}, [x0]
> -       ld1             {v12.4s}, [x3]
> -
> -       mov             x3, #10
> -
> -.Ldoubleround:
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       add             v0.4s, v0.4s, v1.4s
> -       eor             v3.16b, v3.16b, v0.16b
> -       rev32           v3.8h, v3.8h
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       add             v2.4s, v2.4s, v3.4s
> -       eor             v4.16b, v1.16b, v2.16b
> -       shl             v1.4s, v4.4s, #12
> -       sri             v1.4s, v4.4s, #20
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       add             v0.4s, v0.4s, v1.4s
> -       eor             v3.16b, v3.16b, v0.16b
> -       tbl             v3.16b, {v3.16b}, v12.16b
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       add             v2.4s, v2.4s, v3.4s
> -       eor             v4.16b, v1.16b, v2.16b
> -       shl             v1.4s, v4.4s, #7
> -       sri             v1.4s, v4.4s, #25
> -
> -       // x1 = shuffle32(x1, MASK(0, 3, 2, 1))
> -       ext             v1.16b, v1.16b, v1.16b, #4
> -       // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       ext             v2.16b, v2.16b, v2.16b, #8
> -       // x3 = shuffle32(x3, MASK(2, 1, 0, 3))
> -       ext             v3.16b, v3.16b, v3.16b, #12
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       add             v0.4s, v0.4s, v1.4s
> -       eor             v3.16b, v3.16b, v0.16b
> -       rev32           v3.8h, v3.8h
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       add             v2.4s, v2.4s, v3.4s
> -       eor             v4.16b, v1.16b, v2.16b
> -       shl             v1.4s, v4.4s, #12
> -       sri             v1.4s, v4.4s, #20
> -
> -       // x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       add             v0.4s, v0.4s, v1.4s
> -       eor             v3.16b, v3.16b, v0.16b
> -       tbl             v3.16b, {v3.16b}, v12.16b
> -
> -       // x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       add             v2.4s, v2.4s, v3.4s
> -       eor             v4.16b, v1.16b, v2.16b
> -       shl             v1.4s, v4.4s, #7
> -       sri             v1.4s, v4.4s, #25
> -
> -       // x1 = shuffle32(x1, MASK(2, 1, 0, 3))
> -       ext             v1.16b, v1.16b, v1.16b, #12
> -       // x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       ext             v2.16b, v2.16b, v2.16b, #8
> -       // x3 = shuffle32(x3, MASK(0, 3, 2, 1))
> -       ext             v3.16b, v3.16b, v3.16b, #4
> -
> -       subs            x3, x3, #1
> -       b.ne            .Ldoubleround
> -
> -       ld1             {v4.16b-v7.16b}, [x2]
> -
> -       // o0 = i0 ^ (x0 + s0)
> -       add             v0.4s, v0.4s, v8.4s
> -       eor             v0.16b, v0.16b, v4.16b
> -
> -       // o1 = i1 ^ (x1 + s1)
> -       add             v1.4s, v1.4s, v9.4s
> -       eor             v1.16b, v1.16b, v5.16b
> -
> -       // o2 = i2 ^ (x2 + s2)
> -       add             v2.4s, v2.4s, v10.4s
> -       eor             v2.16b, v2.16b, v6.16b
> -
> -       // o3 = i3 ^ (x3 + s3)
> -       add             v3.4s, v3.4s, v11.4s
> -       eor             v3.16b, v3.16b, v7.16b
> -
> -       st1             {v0.16b-v3.16b}, [x1]
> -
> -       ret
> -ENDPROC(chacha20_block_xor_neon)
> -
> -       .align          6
> -ENTRY(chacha20_4block_xor_neon)
> -       // x0: Input state matrix, s
> -       // x1: 4 data blocks output, o
> -       // x2: 4 data blocks input, i
> -
> -       //
> -       // This function encrypts four consecutive ChaCha20 blocks by loading
> -       // the state matrix in NEON registers four times. The algorithm performs
> -       // each operation on the corresponding word of each state matrix, hence
> -       // requires no word shuffling. For final XORing step we transpose the
> -       // matrix by interleaving 32- and then 64-bit words, which allows us to
> -       // do XOR in NEON registers.
> -       //
> -       adr             x3, CTRINC              // ... and ROT8
> -       ld1             {v30.4s-v31.4s}, [x3]
> -
> -       // x0..15[0-3] = s0..3[0..3]
> -       mov             x4, x0
> -       ld4r            { v0.4s- v3.4s}, [x4], #16
> -       ld4r            { v4.4s- v7.4s}, [x4], #16
> -       ld4r            { v8.4s-v11.4s}, [x4], #16
> -       ld4r            {v12.4s-v15.4s}, [x4]
> -
> -       // x12 += counter values 0-3
> -       add             v12.4s, v12.4s, v30.4s
> -
> -       mov             x3, #10
> -
> -.Ldoubleround4:
> -       // x0 += x4, x12 = rotl32(x12 ^ x0, 16)
> -       // x1 += x5, x13 = rotl32(x13 ^ x1, 16)
> -       // x2 += x6, x14 = rotl32(x14 ^ x2, 16)
> -       // x3 += x7, x15 = rotl32(x15 ^ x3, 16)
> -       add             v0.4s, v0.4s, v4.4s
> -       add             v1.4s, v1.4s, v5.4s
> -       add             v2.4s, v2.4s, v6.4s
> -       add             v3.4s, v3.4s, v7.4s
> -
> -       eor             v12.16b, v12.16b, v0.16b
> -       eor             v13.16b, v13.16b, v1.16b
> -       eor             v14.16b, v14.16b, v2.16b
> -       eor             v15.16b, v15.16b, v3.16b
> -
> -       rev32           v12.8h, v12.8h
> -       rev32           v13.8h, v13.8h
> -       rev32           v14.8h, v14.8h
> -       rev32           v15.8h, v15.8h
> -
> -       // x8 += x12, x4 = rotl32(x4 ^ x8, 12)
> -       // x9 += x13, x5 = rotl32(x5 ^ x9, 12)
> -       // x10 += x14, x6 = rotl32(x6 ^ x10, 12)
> -       // x11 += x15, x7 = rotl32(x7 ^ x11, 12)
> -       add             v8.4s, v8.4s, v12.4s
> -       add             v9.4s, v9.4s, v13.4s
> -       add             v10.4s, v10.4s, v14.4s
> -       add             v11.4s, v11.4s, v15.4s
> -
> -       eor             v16.16b, v4.16b, v8.16b
> -       eor             v17.16b, v5.16b, v9.16b
> -       eor             v18.16b, v6.16b, v10.16b
> -       eor             v19.16b, v7.16b, v11.16b
> -
> -       shl             v4.4s, v16.4s, #12
> -       shl             v5.4s, v17.4s, #12
> -       shl             v6.4s, v18.4s, #12
> -       shl             v7.4s, v19.4s, #12
> -
> -       sri             v4.4s, v16.4s, #20
> -       sri             v5.4s, v17.4s, #20
> -       sri             v6.4s, v18.4s, #20
> -       sri             v7.4s, v19.4s, #20
> -
> -       // x0 += x4, x12 = rotl32(x12 ^ x0, 8)
> -       // x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> -       // x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> -       // x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> -       add             v0.4s, v0.4s, v4.4s
> -       add             v1.4s, v1.4s, v5.4s
> -       add             v2.4s, v2.4s, v6.4s
> -       add             v3.4s, v3.4s, v7.4s
> -
> -       eor             v12.16b, v12.16b, v0.16b
> -       eor             v13.16b, v13.16b, v1.16b
> -       eor             v14.16b, v14.16b, v2.16b
> -       eor             v15.16b, v15.16b, v3.16b
> -
> -       tbl             v12.16b, {v12.16b}, v31.16b
> -       tbl             v13.16b, {v13.16b}, v31.16b
> -       tbl             v14.16b, {v14.16b}, v31.16b
> -       tbl             v15.16b, {v15.16b}, v31.16b
> -
> -       // x8 += x12, x4 = rotl32(x4 ^ x8, 7)
> -       // x9 += x13, x5 = rotl32(x5 ^ x9, 7)
> -       // x10 += x14, x6 = rotl32(x6 ^ x10, 7)
> -       // x11 += x15, x7 = rotl32(x7 ^ x11, 7)
> -       add             v8.4s, v8.4s, v12.4s
> -       add             v9.4s, v9.4s, v13.4s
> -       add             v10.4s, v10.4s, v14.4s
> -       add             v11.4s, v11.4s, v15.4s
> -
> -       eor             v16.16b, v4.16b, v8.16b
> -       eor             v17.16b, v5.16b, v9.16b
> -       eor             v18.16b, v6.16b, v10.16b
> -       eor             v19.16b, v7.16b, v11.16b
> -
> -       shl             v4.4s, v16.4s, #7
> -       shl             v5.4s, v17.4s, #7
> -       shl             v6.4s, v18.4s, #7
> -       shl             v7.4s, v19.4s, #7
> -
> -       sri             v4.4s, v16.4s, #25
> -       sri             v5.4s, v17.4s, #25
> -       sri             v6.4s, v18.4s, #25
> -       sri             v7.4s, v19.4s, #25
> -
> -       // x0 += x5, x15 = rotl32(x15 ^ x0, 16)
> -       // x1 += x6, x12 = rotl32(x12 ^ x1, 16)
> -       // x2 += x7, x13 = rotl32(x13 ^ x2, 16)
> -       // x3 += x4, x14 = rotl32(x14 ^ x3, 16)
> -       add             v0.4s, v0.4s, v5.4s
> -       add             v1.4s, v1.4s, v6.4s
> -       add             v2.4s, v2.4s, v7.4s
> -       add             v3.4s, v3.4s, v4.4s
> -
> -       eor             v15.16b, v15.16b, v0.16b
> -       eor             v12.16b, v12.16b, v1.16b
> -       eor             v13.16b, v13.16b, v2.16b
> -       eor             v14.16b, v14.16b, v3.16b
> -
> -       rev32           v15.8h, v15.8h
> -       rev32           v12.8h, v12.8h
> -       rev32           v13.8h, v13.8h
> -       rev32           v14.8h, v14.8h
> -
> -       // x10 += x15, x5 = rotl32(x5 ^ x10, 12)
> -       // x11 += x12, x6 = rotl32(x6 ^ x11, 12)
> -       // x8 += x13, x7 = rotl32(x7 ^ x8, 12)
> -       // x9 += x14, x4 = rotl32(x4 ^ x9, 12)
> -       add             v10.4s, v10.4s, v15.4s
> -       add             v11.4s, v11.4s, v12.4s
> -       add             v8.4s, v8.4s, v13.4s
> -       add             v9.4s, v9.4s, v14.4s
> -
> -       eor             v16.16b, v5.16b, v10.16b
> -       eor             v17.16b, v6.16b, v11.16b
> -       eor             v18.16b, v7.16b, v8.16b
> -       eor             v19.16b, v4.16b, v9.16b
> -
> -       shl             v5.4s, v16.4s, #12
> -       shl             v6.4s, v17.4s, #12
> -       shl             v7.4s, v18.4s, #12
> -       shl             v4.4s, v19.4s, #12
> -
> -       sri             v5.4s, v16.4s, #20
> -       sri             v6.4s, v17.4s, #20
> -       sri             v7.4s, v18.4s, #20
> -       sri             v4.4s, v19.4s, #20
> -
> -       // x0 += x5, x15 = rotl32(x15 ^ x0, 8)
> -       // x1 += x6, x12 = rotl32(x12 ^ x1, 8)
> -       // x2 += x7, x13 = rotl32(x13 ^ x2, 8)
> -       // x3 += x4, x14 = rotl32(x14 ^ x3, 8)
> -       add             v0.4s, v0.4s, v5.4s
> -       add             v1.4s, v1.4s, v6.4s
> -       add             v2.4s, v2.4s, v7.4s
> -       add             v3.4s, v3.4s, v4.4s
> -
> -       eor             v15.16b, v15.16b, v0.16b
> -       eor             v12.16b, v12.16b, v1.16b
> -       eor             v13.16b, v13.16b, v2.16b
> -       eor             v14.16b, v14.16b, v3.16b
> -
> -       tbl             v15.16b, {v15.16b}, v31.16b
> -       tbl             v12.16b, {v12.16b}, v31.16b
> -       tbl             v13.16b, {v13.16b}, v31.16b
> -       tbl             v14.16b, {v14.16b}, v31.16b
> -
> -       // x10 += x15, x5 = rotl32(x5 ^ x10, 7)
> -       // x11 += x12, x6 = rotl32(x6 ^ x11, 7)
> -       // x8 += x13, x7 = rotl32(x7 ^ x8, 7)
> -       // x9 += x14, x4 = rotl32(x4 ^ x9, 7)
> -       add             v10.4s, v10.4s, v15.4s
> -       add             v11.4s, v11.4s, v12.4s
> -       add             v8.4s, v8.4s, v13.4s
> -       add             v9.4s, v9.4s, v14.4s
> -
> -       eor             v16.16b, v5.16b, v10.16b
> -       eor             v17.16b, v6.16b, v11.16b
> -       eor             v18.16b, v7.16b, v8.16b
> -       eor             v19.16b, v4.16b, v9.16b
> -
> -       shl             v5.4s, v16.4s, #7
> -       shl             v6.4s, v17.4s, #7
> -       shl             v7.4s, v18.4s, #7
> -       shl             v4.4s, v19.4s, #7
> -
> -       sri             v5.4s, v16.4s, #25
> -       sri             v6.4s, v17.4s, #25
> -       sri             v7.4s, v18.4s, #25
> -       sri             v4.4s, v19.4s, #25
> -
> -       subs            x3, x3, #1
> -       b.ne            .Ldoubleround4
> -
> -       ld4r            {v16.4s-v19.4s}, [x0], #16
> -       ld4r            {v20.4s-v23.4s}, [x0], #16
> -
> -       // x12 += counter values 0-3
> -       add             v12.4s, v12.4s, v30.4s
> -
> -       // x0[0-3] += s0[0]
> -       // x1[0-3] += s0[1]
> -       // x2[0-3] += s0[2]
> -       // x3[0-3] += s0[3]
> -       add             v0.4s, v0.4s, v16.4s
> -       add             v1.4s, v1.4s, v17.4s
> -       add             v2.4s, v2.4s, v18.4s
> -       add             v3.4s, v3.4s, v19.4s
> -
> -       ld4r            {v24.4s-v27.4s}, [x0], #16
> -       ld4r            {v28.4s-v31.4s}, [x0]
> -
> -       // x4[0-3] += s1[0]
> -       // x5[0-3] += s1[1]
> -       // x6[0-3] += s1[2]
> -       // x7[0-3] += s1[3]
> -       add             v4.4s, v4.4s, v20.4s
> -       add             v5.4s, v5.4s, v21.4s
> -       add             v6.4s, v6.4s, v22.4s
> -       add             v7.4s, v7.4s, v23.4s
> -
> -       // x8[0-3] += s2[0]
> -       // x9[0-3] += s2[1]
> -       // x10[0-3] += s2[2]
> -       // x11[0-3] += s2[3]
> -       add             v8.4s, v8.4s, v24.4s
> -       add             v9.4s, v9.4s, v25.4s
> -       add             v10.4s, v10.4s, v26.4s
> -       add             v11.4s, v11.4s, v27.4s
> -
> -       // x12[0-3] += s3[0]
> -       // x13[0-3] += s3[1]
> -       // x14[0-3] += s3[2]
> -       // x15[0-3] += s3[3]
> -       add             v12.4s, v12.4s, v28.4s
> -       add             v13.4s, v13.4s, v29.4s
> -       add             v14.4s, v14.4s, v30.4s
> -       add             v15.4s, v15.4s, v31.4s
> -
> -       // interleave 32-bit words in state n, n+1
> -       zip1            v16.4s, v0.4s, v1.4s
> -       zip2            v17.4s, v0.4s, v1.4s
> -       zip1            v18.4s, v2.4s, v3.4s
> -       zip2            v19.4s, v2.4s, v3.4s
> -       zip1            v20.4s, v4.4s, v5.4s
> -       zip2            v21.4s, v4.4s, v5.4s
> -       zip1            v22.4s, v6.4s, v7.4s
> -       zip2            v23.4s, v6.4s, v7.4s
> -       zip1            v24.4s, v8.4s, v9.4s
> -       zip2            v25.4s, v8.4s, v9.4s
> -       zip1            v26.4s, v10.4s, v11.4s
> -       zip2            v27.4s, v10.4s, v11.4s
> -       zip1            v28.4s, v12.4s, v13.4s
> -       zip2            v29.4s, v12.4s, v13.4s
> -       zip1            v30.4s, v14.4s, v15.4s
> -       zip2            v31.4s, v14.4s, v15.4s
> -
> -       // interleave 64-bit words in state n, n+2
> -       zip1            v0.2d, v16.2d, v18.2d
> -       zip2            v4.2d, v16.2d, v18.2d
> -       zip1            v8.2d, v17.2d, v19.2d
> -       zip2            v12.2d, v17.2d, v19.2d
> -       ld1             {v16.16b-v19.16b}, [x2], #64
> -
> -       zip1            v1.2d, v20.2d, v22.2d
> -       zip2            v5.2d, v20.2d, v22.2d
> -       zip1            v9.2d, v21.2d, v23.2d
> -       zip2            v13.2d, v21.2d, v23.2d
> -       ld1             {v20.16b-v23.16b}, [x2], #64
> -
> -       zip1            v2.2d, v24.2d, v26.2d
> -       zip2            v6.2d, v24.2d, v26.2d
> -       zip1            v10.2d, v25.2d, v27.2d
> -       zip2            v14.2d, v25.2d, v27.2d
> -       ld1             {v24.16b-v27.16b}, [x2], #64
> -
> -       zip1            v3.2d, v28.2d, v30.2d
> -       zip2            v7.2d, v28.2d, v30.2d
> -       zip1            v11.2d, v29.2d, v31.2d
> -       zip2            v15.2d, v29.2d, v31.2d
> -       ld1             {v28.16b-v31.16b}, [x2]
> -
> -       // xor with corresponding input, write to output
> -       eor             v16.16b, v16.16b, v0.16b
> -       eor             v17.16b, v17.16b, v1.16b
> -       eor             v18.16b, v18.16b, v2.16b
> -       eor             v19.16b, v19.16b, v3.16b
> -       eor             v20.16b, v20.16b, v4.16b
> -       eor             v21.16b, v21.16b, v5.16b
> -       st1             {v16.16b-v19.16b}, [x1], #64
> -       eor             v22.16b, v22.16b, v6.16b
> -       eor             v23.16b, v23.16b, v7.16b
> -       eor             v24.16b, v24.16b, v8.16b
> -       eor             v25.16b, v25.16b, v9.16b
> -       st1             {v20.16b-v23.16b}, [x1], #64
> -       eor             v26.16b, v26.16b, v10.16b
> -       eor             v27.16b, v27.16b, v11.16b
> -       eor             v28.16b, v28.16b, v12.16b
> -       st1             {v24.16b-v27.16b}, [x1], #64
> -       eor             v29.16b, v29.16b, v13.16b
> -       eor             v30.16b, v30.16b, v14.16b
> -       eor             v31.16b, v31.16b, v15.16b
> -       st1             {v28.16b-v31.16b}, [x1]
> -
> -       ret
> -ENDPROC(chacha20_4block_xor_neon)
> -
> -CTRINC:        .word           0, 1, 2, 3
> -ROT8:  .word           0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f
> diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
> deleted file mode 100644
> index 727579c93ded..000000000000
> --- a/arch/arm64/crypto/chacha20-neon-glue.c
> +++ /dev/null
> @@ -1,133 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
> - *
> - * Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License version 2 as
> - * published by the Free Software Foundation.
> - *
> - * Based on:
> - * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <crypto/algapi.h>
> -#include <crypto/chacha20.h>
> -#include <crypto/internal/skcipher.h>
> -#include <linux/kernel.h>
> -#include <linux/module.h>
> -
> -#include <asm/hwcap.h>
> -#include <asm/neon.h>
> -#include <asm/simd.h>
> -
> -asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
> -asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
> -
> -static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
> -                           unsigned int bytes)
> -{
> -       u8 buf[CHACHA20_BLOCK_SIZE];
> -
> -       while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
> -               kernel_neon_begin();
> -               chacha20_4block_xor_neon(state, dst, src);
> -               kernel_neon_end();
> -               bytes -= CHACHA20_BLOCK_SIZE * 4;
> -               src += CHACHA20_BLOCK_SIZE * 4;
> -               dst += CHACHA20_BLOCK_SIZE * 4;
> -               state[12] += 4;
> -       }
> -
> -       if (!bytes)
> -               return;
> -
> -       kernel_neon_begin();
> -       while (bytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_block_xor_neon(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE;
> -               src += CHACHA20_BLOCK_SIZE;
> -               dst += CHACHA20_BLOCK_SIZE;
> -               state[12]++;
> -       }
> -       if (bytes) {
> -               memcpy(buf, src, bytes);
> -               chacha20_block_xor_neon(state, buf, buf);
> -               memcpy(dst, buf, bytes);
> -       }
> -       kernel_neon_end();
> -}
> -
> -static int chacha20_neon(struct skcipher_request *req)
> -{
> -       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       struct skcipher_walk walk;
> -       u32 state[16];
> -       int err;
> -
> -       if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
> -               return crypto_chacha20_crypt(req);
> -
> -       err = skcipher_walk_virt(&walk, req, false);
> -
> -       crypto_chacha20_init(state, ctx, walk.iv);
> -
> -       while (walk.nbytes > 0) {
> -               unsigned int nbytes = walk.nbytes;
> -
> -               if (nbytes < walk.total)
> -                       nbytes = round_down(nbytes, walk.stride);
> -
> -               chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                               nbytes);
> -               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> -       }
> -
> -       return err;
> -}
> -
> -static struct skcipher_alg alg = {
> -       .base.cra_name          = "chacha20",
> -       .base.cra_driver_name   = "chacha20-neon",
> -       .base.cra_priority      = 300,
> -       .base.cra_blocksize     = 1,
> -       .base.cra_ctxsize       = sizeof(struct chacha20_ctx),
> -       .base.cra_module        = THIS_MODULE,
> -
> -       .min_keysize            = CHACHA20_KEY_SIZE,
> -       .max_keysize            = CHACHA20_KEY_SIZE,
> -       .ivsize                 = CHACHA20_IV_SIZE,
> -       .chunksize              = CHACHA20_BLOCK_SIZE,
> -       .walksize               = 4 * CHACHA20_BLOCK_SIZE,
> -       .setkey                 = crypto_chacha20_setkey,
> -       .encrypt                = chacha20_neon,
> -       .decrypt                = chacha20_neon,
> -};
> -
> -static int __init chacha20_simd_mod_init(void)
> -{
> -       if (!(elf_hwcap & HWCAP_ASIMD))
> -               return -ENODEV;
> -
> -       return crypto_register_skcipher(&alg);
> -}
> -
> -static void __exit chacha20_simd_mod_fini(void)
> -{
> -       crypto_unregister_skcipher(&alg);
> -}
> -
> -module_init(chacha20_simd_mod_init);
> -module_exit(chacha20_simd_mod_fini);
> -
> -MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
> -MODULE_LICENSE("GPL v2");
> -MODULE_ALIAS_CRYPTO("chacha20");
> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
> index cf830219846b..419212c31246 100644
> --- a/arch/x86/crypto/Makefile
> +++ b/arch/x86/crypto/Makefile
> @@ -23,7 +23,6 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
>  obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
>  obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
>  obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o
> -obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o
>  obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o
>  obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
>  obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
> @@ -76,7 +75,6 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
>  blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
>  twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
>  twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o
> -chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
>  serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
>
>  aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
> @@ -99,7 +97,6 @@ endif
>
>  ifeq ($(avx2_supported),yes)
>         camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o
> -       chacha20-x86_64-y += chacha20-avx2-x86_64.o
>         serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
>
>         morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
> diff --git a/arch/x86/crypto/chacha20-avx2-x86_64.S b/arch/x86/crypto/chacha20-avx2-x86_64.S
> deleted file mode 100644
> index f3cd26f48332..000000000000
> --- a/arch/x86/crypto/chacha20-avx2-x86_64.S
> +++ /dev/null
> @@ -1,448 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, x64 AVX2 functions
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <linux/linkage.h>
> -
> -.section       .rodata.cst32.ROT8, "aM", @progbits, 32
> -.align 32
> -ROT8:  .octa 0x0e0d0c0f0a09080b0605040702010003
> -       .octa 0x0e0d0c0f0a09080b0605040702010003
> -
> -.section       .rodata.cst32.ROT16, "aM", @progbits, 32
> -.align 32
> -ROT16: .octa 0x0d0c0f0e09080b0a0504070601000302
> -       .octa 0x0d0c0f0e09080b0a0504070601000302
> -
> -.section       .rodata.cst32.CTRINC, "aM", @progbits, 32
> -.align 32
> -CTRINC:        .octa 0x00000003000000020000000100000000
> -       .octa 0x00000007000000060000000500000004
> -
> -.text
> -
> -ENTRY(chacha20_8block_xor_avx2)
> -       # %rdi: Input state matrix, s
> -       # %rsi: 8 data blocks output, o
> -       # %rdx: 8 data blocks input, i
> -
> -       # This function encrypts eight consecutive ChaCha20 blocks by loading
> -       # the state matrix in AVX registers eight times. As we need some
> -       # scratch registers, we save the first four registers on the stack. The
> -       # algorithm performs each operation on the corresponding word of each
> -       # state matrix, hence requires no word shuffling. For final XORing step
> -       # we transpose the matrix by interleaving 32-, 64- and then 128-bit
> -       # words, which allows us to do XOR in AVX registers. 8/16-bit word
> -       # rotation is done with the slightly better performing byte shuffling,
> -       # 7/12-bit word rotation uses traditional shift+OR.
> -
> -       vzeroupper
> -       # 4 * 32 byte stack, 32-byte aligned
> -       lea             8(%rsp),%r10
> -       and             $~31, %rsp
> -       sub             $0x80, %rsp
> -
> -       # x0..15[0-7] = s[0..15]
> -       vpbroadcastd    0x00(%rdi),%ymm0
> -       vpbroadcastd    0x04(%rdi),%ymm1
> -       vpbroadcastd    0x08(%rdi),%ymm2
> -       vpbroadcastd    0x0c(%rdi),%ymm3
> -       vpbroadcastd    0x10(%rdi),%ymm4
> -       vpbroadcastd    0x14(%rdi),%ymm5
> -       vpbroadcastd    0x18(%rdi),%ymm6
> -       vpbroadcastd    0x1c(%rdi),%ymm7
> -       vpbroadcastd    0x20(%rdi),%ymm8
> -       vpbroadcastd    0x24(%rdi),%ymm9
> -       vpbroadcastd    0x28(%rdi),%ymm10
> -       vpbroadcastd    0x2c(%rdi),%ymm11
> -       vpbroadcastd    0x30(%rdi),%ymm12
> -       vpbroadcastd    0x34(%rdi),%ymm13
> -       vpbroadcastd    0x38(%rdi),%ymm14
> -       vpbroadcastd    0x3c(%rdi),%ymm15
> -       # x0..3 on stack
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vmovdqa         %ymm1,0x20(%rsp)
> -       vmovdqa         %ymm2,0x40(%rsp)
> -       vmovdqa         %ymm3,0x60(%rsp)
> -
> -       vmovdqa         CTRINC(%rip),%ymm1
> -       vmovdqa         ROT8(%rip),%ymm2
> -       vmovdqa         ROT16(%rip),%ymm3
> -
> -       # x12 += counter values 0-3
> -       vpaddd          %ymm1,%ymm12,%ymm12
> -
> -       mov             $10,%ecx
> -
> -.Ldoubleround8:
> -       # x0 += x4, x12 = rotl32(x12 ^ x0, 16)
> -       vpaddd          0x00(%rsp),%ymm4,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpxor           %ymm0,%ymm12,%ymm12
> -       vpshufb         %ymm3,%ymm12,%ymm12
> -       # x1 += x5, x13 = rotl32(x13 ^ x1, 16)
> -       vpaddd          0x20(%rsp),%ymm5,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpxor           %ymm0,%ymm13,%ymm13
> -       vpshufb         %ymm3,%ymm13,%ymm13
> -       # x2 += x6, x14 = rotl32(x14 ^ x2, 16)
> -       vpaddd          0x40(%rsp),%ymm6,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpxor           %ymm0,%ymm14,%ymm14
> -       vpshufb         %ymm3,%ymm14,%ymm14
> -       # x3 += x7, x15 = rotl32(x15 ^ x3, 16)
> -       vpaddd          0x60(%rsp),%ymm7,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpxor           %ymm0,%ymm15,%ymm15
> -       vpshufb         %ymm3,%ymm15,%ymm15
> -
> -       # x8 += x12, x4 = rotl32(x4 ^ x8, 12)
> -       vpaddd          %ymm12,%ymm8,%ymm8
> -       vpxor           %ymm8,%ymm4,%ymm4
> -       vpslld          $12,%ymm4,%ymm0
> -       vpsrld          $20,%ymm4,%ymm4
> -       vpor            %ymm0,%ymm4,%ymm4
> -       # x9 += x13, x5 = rotl32(x5 ^ x9, 12)
> -       vpaddd          %ymm13,%ymm9,%ymm9
> -       vpxor           %ymm9,%ymm5,%ymm5
> -       vpslld          $12,%ymm5,%ymm0
> -       vpsrld          $20,%ymm5,%ymm5
> -       vpor            %ymm0,%ymm5,%ymm5
> -       # x10 += x14, x6 = rotl32(x6 ^ x10, 12)
> -       vpaddd          %ymm14,%ymm10,%ymm10
> -       vpxor           %ymm10,%ymm6,%ymm6
> -       vpslld          $12,%ymm6,%ymm0
> -       vpsrld          $20,%ymm6,%ymm6
> -       vpor            %ymm0,%ymm6,%ymm6
> -       # x11 += x15, x7 = rotl32(x7 ^ x11, 12)
> -       vpaddd          %ymm15,%ymm11,%ymm11
> -       vpxor           %ymm11,%ymm7,%ymm7
> -       vpslld          $12,%ymm7,%ymm0
> -       vpsrld          $20,%ymm7,%ymm7
> -       vpor            %ymm0,%ymm7,%ymm7
> -
> -       # x0 += x4, x12 = rotl32(x12 ^ x0, 8)
> -       vpaddd          0x00(%rsp),%ymm4,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpxor           %ymm0,%ymm12,%ymm12
> -       vpshufb         %ymm2,%ymm12,%ymm12
> -       # x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> -       vpaddd          0x20(%rsp),%ymm5,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpxor           %ymm0,%ymm13,%ymm13
> -       vpshufb         %ymm2,%ymm13,%ymm13
> -       # x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> -       vpaddd          0x40(%rsp),%ymm6,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpxor           %ymm0,%ymm14,%ymm14
> -       vpshufb         %ymm2,%ymm14,%ymm14
> -       # x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> -       vpaddd          0x60(%rsp),%ymm7,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpxor           %ymm0,%ymm15,%ymm15
> -       vpshufb         %ymm2,%ymm15,%ymm15
> -
> -       # x8 += x12, x4 = rotl32(x4 ^ x8, 7)
> -       vpaddd          %ymm12,%ymm8,%ymm8
> -       vpxor           %ymm8,%ymm4,%ymm4
> -       vpslld          $7,%ymm4,%ymm0
> -       vpsrld          $25,%ymm4,%ymm4
> -       vpor            %ymm0,%ymm4,%ymm4
> -       # x9 += x13, x5 = rotl32(x5 ^ x9, 7)
> -       vpaddd          %ymm13,%ymm9,%ymm9
> -       vpxor           %ymm9,%ymm5,%ymm5
> -       vpslld          $7,%ymm5,%ymm0
> -       vpsrld          $25,%ymm5,%ymm5
> -       vpor            %ymm0,%ymm5,%ymm5
> -       # x10 += x14, x6 = rotl32(x6 ^ x10, 7)
> -       vpaddd          %ymm14,%ymm10,%ymm10
> -       vpxor           %ymm10,%ymm6,%ymm6
> -       vpslld          $7,%ymm6,%ymm0
> -       vpsrld          $25,%ymm6,%ymm6
> -       vpor            %ymm0,%ymm6,%ymm6
> -       # x11 += x15, x7 = rotl32(x7 ^ x11, 7)
> -       vpaddd          %ymm15,%ymm11,%ymm11
> -       vpxor           %ymm11,%ymm7,%ymm7
> -       vpslld          $7,%ymm7,%ymm0
> -       vpsrld          $25,%ymm7,%ymm7
> -       vpor            %ymm0,%ymm7,%ymm7
> -
> -       # x0 += x5, x15 = rotl32(x15 ^ x0, 16)
> -       vpaddd          0x00(%rsp),%ymm5,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpxor           %ymm0,%ymm15,%ymm15
> -       vpshufb         %ymm3,%ymm15,%ymm15
> -       # x1 += x6, x12 = rotl32(x12 ^ x1, 16)%ymm0
> -       vpaddd          0x20(%rsp),%ymm6,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpxor           %ymm0,%ymm12,%ymm12
> -       vpshufb         %ymm3,%ymm12,%ymm12
> -       # x2 += x7, x13 = rotl32(x13 ^ x2, 16)
> -       vpaddd          0x40(%rsp),%ymm7,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpxor           %ymm0,%ymm13,%ymm13
> -       vpshufb         %ymm3,%ymm13,%ymm13
> -       # x3 += x4, x14 = rotl32(x14 ^ x3, 16)
> -       vpaddd          0x60(%rsp),%ymm4,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpxor           %ymm0,%ymm14,%ymm14
> -       vpshufb         %ymm3,%ymm14,%ymm14
> -
> -       # x10 += x15, x5 = rotl32(x5 ^ x10, 12)
> -       vpaddd          %ymm15,%ymm10,%ymm10
> -       vpxor           %ymm10,%ymm5,%ymm5
> -       vpslld          $12,%ymm5,%ymm0
> -       vpsrld          $20,%ymm5,%ymm5
> -       vpor            %ymm0,%ymm5,%ymm5
> -       # x11 += x12, x6 = rotl32(x6 ^ x11, 12)
> -       vpaddd          %ymm12,%ymm11,%ymm11
> -       vpxor           %ymm11,%ymm6,%ymm6
> -       vpslld          $12,%ymm6,%ymm0
> -       vpsrld          $20,%ymm6,%ymm6
> -       vpor            %ymm0,%ymm6,%ymm6
> -       # x8 += x13, x7 = rotl32(x7 ^ x8, 12)
> -       vpaddd          %ymm13,%ymm8,%ymm8
> -       vpxor           %ymm8,%ymm7,%ymm7
> -       vpslld          $12,%ymm7,%ymm0
> -       vpsrld          $20,%ymm7,%ymm7
> -       vpor            %ymm0,%ymm7,%ymm7
> -       # x9 += x14, x4 = rotl32(x4 ^ x9, 12)
> -       vpaddd          %ymm14,%ymm9,%ymm9
> -       vpxor           %ymm9,%ymm4,%ymm4
> -       vpslld          $12,%ymm4,%ymm0
> -       vpsrld          $20,%ymm4,%ymm4
> -       vpor            %ymm0,%ymm4,%ymm4
> -
> -       # x0 += x5, x15 = rotl32(x15 ^ x0, 8)
> -       vpaddd          0x00(%rsp),%ymm5,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpxor           %ymm0,%ymm15,%ymm15
> -       vpshufb         %ymm2,%ymm15,%ymm15
> -       # x1 += x6, x12 = rotl32(x12 ^ x1, 8)
> -       vpaddd          0x20(%rsp),%ymm6,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpxor           %ymm0,%ymm12,%ymm12
> -       vpshufb         %ymm2,%ymm12,%ymm12
> -       # x2 += x7, x13 = rotl32(x13 ^ x2, 8)
> -       vpaddd          0x40(%rsp),%ymm7,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpxor           %ymm0,%ymm13,%ymm13
> -       vpshufb         %ymm2,%ymm13,%ymm13
> -       # x3 += x4, x14 = rotl32(x14 ^ x3, 8)
> -       vpaddd          0x60(%rsp),%ymm4,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpxor           %ymm0,%ymm14,%ymm14
> -       vpshufb         %ymm2,%ymm14,%ymm14
> -
> -       # x10 += x15, x5 = rotl32(x5 ^ x10, 7)
> -       vpaddd          %ymm15,%ymm10,%ymm10
> -       vpxor           %ymm10,%ymm5,%ymm5
> -       vpslld          $7,%ymm5,%ymm0
> -       vpsrld          $25,%ymm5,%ymm5
> -       vpor            %ymm0,%ymm5,%ymm5
> -       # x11 += x12, x6 = rotl32(x6 ^ x11, 7)
> -       vpaddd          %ymm12,%ymm11,%ymm11
> -       vpxor           %ymm11,%ymm6,%ymm6
> -       vpslld          $7,%ymm6,%ymm0
> -       vpsrld          $25,%ymm6,%ymm6
> -       vpor            %ymm0,%ymm6,%ymm6
> -       # x8 += x13, x7 = rotl32(x7 ^ x8, 7)
> -       vpaddd          %ymm13,%ymm8,%ymm8
> -       vpxor           %ymm8,%ymm7,%ymm7
> -       vpslld          $7,%ymm7,%ymm0
> -       vpsrld          $25,%ymm7,%ymm7
> -       vpor            %ymm0,%ymm7,%ymm7
> -       # x9 += x14, x4 = rotl32(x4 ^ x9, 7)
> -       vpaddd          %ymm14,%ymm9,%ymm9
> -       vpxor           %ymm9,%ymm4,%ymm4
> -       vpslld          $7,%ymm4,%ymm0
> -       vpsrld          $25,%ymm4,%ymm4
> -       vpor            %ymm0,%ymm4,%ymm4
> -
> -       dec             %ecx
> -       jnz             .Ldoubleround8
> -
> -       # x0..15[0-3] += s[0..15]
> -       vpbroadcastd    0x00(%rdi),%ymm0
> -       vpaddd          0x00(%rsp),%ymm0,%ymm0
> -       vmovdqa         %ymm0,0x00(%rsp)
> -       vpbroadcastd    0x04(%rdi),%ymm0
> -       vpaddd          0x20(%rsp),%ymm0,%ymm0
> -       vmovdqa         %ymm0,0x20(%rsp)
> -       vpbroadcastd    0x08(%rdi),%ymm0
> -       vpaddd          0x40(%rsp),%ymm0,%ymm0
> -       vmovdqa         %ymm0,0x40(%rsp)
> -       vpbroadcastd    0x0c(%rdi),%ymm0
> -       vpaddd          0x60(%rsp),%ymm0,%ymm0
> -       vmovdqa         %ymm0,0x60(%rsp)
> -       vpbroadcastd    0x10(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm4,%ymm4
> -       vpbroadcastd    0x14(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm5,%ymm5
> -       vpbroadcastd    0x18(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm6,%ymm6
> -       vpbroadcastd    0x1c(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm7,%ymm7
> -       vpbroadcastd    0x20(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm8,%ymm8
> -       vpbroadcastd    0x24(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm9,%ymm9
> -       vpbroadcastd    0x28(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm10,%ymm10
> -       vpbroadcastd    0x2c(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm11,%ymm11
> -       vpbroadcastd    0x30(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm12,%ymm12
> -       vpbroadcastd    0x34(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm13,%ymm13
> -       vpbroadcastd    0x38(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm14,%ymm14
> -       vpbroadcastd    0x3c(%rdi),%ymm0
> -       vpaddd          %ymm0,%ymm15,%ymm15
> -
> -       # x12 += counter values 0-3
> -       vpaddd          %ymm1,%ymm12,%ymm12
> -
> -       # interleave 32-bit words in state n, n+1
> -       vmovdqa         0x00(%rsp),%ymm0
> -       vmovdqa         0x20(%rsp),%ymm1
> -       vpunpckldq      %ymm1,%ymm0,%ymm2
> -       vpunpckhdq      %ymm1,%ymm0,%ymm1
> -       vmovdqa         %ymm2,0x00(%rsp)
> -       vmovdqa         %ymm1,0x20(%rsp)
> -       vmovdqa         0x40(%rsp),%ymm0
> -       vmovdqa         0x60(%rsp),%ymm1
> -       vpunpckldq      %ymm1,%ymm0,%ymm2
> -       vpunpckhdq      %ymm1,%ymm0,%ymm1
> -       vmovdqa         %ymm2,0x40(%rsp)
> -       vmovdqa         %ymm1,0x60(%rsp)
> -       vmovdqa         %ymm4,%ymm0
> -       vpunpckldq      %ymm5,%ymm0,%ymm4
> -       vpunpckhdq      %ymm5,%ymm0,%ymm5
> -       vmovdqa         %ymm6,%ymm0
> -       vpunpckldq      %ymm7,%ymm0,%ymm6
> -       vpunpckhdq      %ymm7,%ymm0,%ymm7
> -       vmovdqa         %ymm8,%ymm0
> -       vpunpckldq      %ymm9,%ymm0,%ymm8
> -       vpunpckhdq      %ymm9,%ymm0,%ymm9
> -       vmovdqa         %ymm10,%ymm0
> -       vpunpckldq      %ymm11,%ymm0,%ymm10
> -       vpunpckhdq      %ymm11,%ymm0,%ymm11
> -       vmovdqa         %ymm12,%ymm0
> -       vpunpckldq      %ymm13,%ymm0,%ymm12
> -       vpunpckhdq      %ymm13,%ymm0,%ymm13
> -       vmovdqa         %ymm14,%ymm0
> -       vpunpckldq      %ymm15,%ymm0,%ymm14
> -       vpunpckhdq      %ymm15,%ymm0,%ymm15
> -
> -       # interleave 64-bit words in state n, n+2
> -       vmovdqa         0x00(%rsp),%ymm0
> -       vmovdqa         0x40(%rsp),%ymm2
> -       vpunpcklqdq     %ymm2,%ymm0,%ymm1
> -       vpunpckhqdq     %ymm2,%ymm0,%ymm2
> -       vmovdqa         %ymm1,0x00(%rsp)
> -       vmovdqa         %ymm2,0x40(%rsp)
> -       vmovdqa         0x20(%rsp),%ymm0
> -       vmovdqa         0x60(%rsp),%ymm2
> -       vpunpcklqdq     %ymm2,%ymm0,%ymm1
> -       vpunpckhqdq     %ymm2,%ymm0,%ymm2
> -       vmovdqa         %ymm1,0x20(%rsp)
> -       vmovdqa         %ymm2,0x60(%rsp)
> -       vmovdqa         %ymm4,%ymm0
> -       vpunpcklqdq     %ymm6,%ymm0,%ymm4
> -       vpunpckhqdq     %ymm6,%ymm0,%ymm6
> -       vmovdqa         %ymm5,%ymm0
> -       vpunpcklqdq     %ymm7,%ymm0,%ymm5
> -       vpunpckhqdq     %ymm7,%ymm0,%ymm7
> -       vmovdqa         %ymm8,%ymm0
> -       vpunpcklqdq     %ymm10,%ymm0,%ymm8
> -       vpunpckhqdq     %ymm10,%ymm0,%ymm10
> -       vmovdqa         %ymm9,%ymm0
> -       vpunpcklqdq     %ymm11,%ymm0,%ymm9
> -       vpunpckhqdq     %ymm11,%ymm0,%ymm11
> -       vmovdqa         %ymm12,%ymm0
> -       vpunpcklqdq     %ymm14,%ymm0,%ymm12
> -       vpunpckhqdq     %ymm14,%ymm0,%ymm14
> -       vmovdqa         %ymm13,%ymm0
> -       vpunpcklqdq     %ymm15,%ymm0,%ymm13
> -       vpunpckhqdq     %ymm15,%ymm0,%ymm15
> -
> -       # interleave 128-bit words in state n, n+4
> -       vmovdqa         0x00(%rsp),%ymm0
> -       vperm2i128      $0x20,%ymm4,%ymm0,%ymm1
> -       vperm2i128      $0x31,%ymm4,%ymm0,%ymm4
> -       vmovdqa         %ymm1,0x00(%rsp)
> -       vmovdqa         0x20(%rsp),%ymm0
> -       vperm2i128      $0x20,%ymm5,%ymm0,%ymm1
> -       vperm2i128      $0x31,%ymm5,%ymm0,%ymm5
> -       vmovdqa         %ymm1,0x20(%rsp)
> -       vmovdqa         0x40(%rsp),%ymm0
> -       vperm2i128      $0x20,%ymm6,%ymm0,%ymm1
> -       vperm2i128      $0x31,%ymm6,%ymm0,%ymm6
> -       vmovdqa         %ymm1,0x40(%rsp)
> -       vmovdqa         0x60(%rsp),%ymm0
> -       vperm2i128      $0x20,%ymm7,%ymm0,%ymm1
> -       vperm2i128      $0x31,%ymm7,%ymm0,%ymm7
> -       vmovdqa         %ymm1,0x60(%rsp)
> -       vperm2i128      $0x20,%ymm12,%ymm8,%ymm0
> -       vperm2i128      $0x31,%ymm12,%ymm8,%ymm12
> -       vmovdqa         %ymm0,%ymm8
> -       vperm2i128      $0x20,%ymm13,%ymm9,%ymm0
> -       vperm2i128      $0x31,%ymm13,%ymm9,%ymm13
> -       vmovdqa         %ymm0,%ymm9
> -       vperm2i128      $0x20,%ymm14,%ymm10,%ymm0
> -       vperm2i128      $0x31,%ymm14,%ymm10,%ymm14
> -       vmovdqa         %ymm0,%ymm10
> -       vperm2i128      $0x20,%ymm15,%ymm11,%ymm0
> -       vperm2i128      $0x31,%ymm15,%ymm11,%ymm15
> -       vmovdqa         %ymm0,%ymm11
> -
> -       # xor with corresponding input, write to output
> -       vmovdqa         0x00(%rsp),%ymm0
> -       vpxor           0x0000(%rdx),%ymm0,%ymm0
> -       vmovdqu         %ymm0,0x0000(%rsi)
> -       vmovdqa         0x20(%rsp),%ymm0
> -       vpxor           0x0080(%rdx),%ymm0,%ymm0
> -       vmovdqu         %ymm0,0x0080(%rsi)
> -       vmovdqa         0x40(%rsp),%ymm0
> -       vpxor           0x0040(%rdx),%ymm0,%ymm0
> -       vmovdqu         %ymm0,0x0040(%rsi)
> -       vmovdqa         0x60(%rsp),%ymm0
> -       vpxor           0x00c0(%rdx),%ymm0,%ymm0
> -       vmovdqu         %ymm0,0x00c0(%rsi)
> -       vpxor           0x0100(%rdx),%ymm4,%ymm4
> -       vmovdqu         %ymm4,0x0100(%rsi)
> -       vpxor           0x0180(%rdx),%ymm5,%ymm5
> -       vmovdqu         %ymm5,0x00180(%rsi)
> -       vpxor           0x0140(%rdx),%ymm6,%ymm6
> -       vmovdqu         %ymm6,0x0140(%rsi)
> -       vpxor           0x01c0(%rdx),%ymm7,%ymm7
> -       vmovdqu         %ymm7,0x01c0(%rsi)
> -       vpxor           0x0020(%rdx),%ymm8,%ymm8
> -       vmovdqu         %ymm8,0x0020(%rsi)
> -       vpxor           0x00a0(%rdx),%ymm9,%ymm9
> -       vmovdqu         %ymm9,0x00a0(%rsi)
> -       vpxor           0x0060(%rdx),%ymm10,%ymm10
> -       vmovdqu         %ymm10,0x0060(%rsi)
> -       vpxor           0x00e0(%rdx),%ymm11,%ymm11
> -       vmovdqu         %ymm11,0x00e0(%rsi)
> -       vpxor           0x0120(%rdx),%ymm12,%ymm12
> -       vmovdqu         %ymm12,0x0120(%rsi)
> -       vpxor           0x01a0(%rdx),%ymm13,%ymm13
> -       vmovdqu         %ymm13,0x01a0(%rsi)
> -       vpxor           0x0160(%rdx),%ymm14,%ymm14
> -       vmovdqu         %ymm14,0x0160(%rsi)
> -       vpxor           0x01e0(%rdx),%ymm15,%ymm15
> -       vmovdqu         %ymm15,0x01e0(%rsi)
> -
> -       vzeroupper
> -       lea             -8(%r10),%rsp
> -       ret
> -ENDPROC(chacha20_8block_xor_avx2)
> diff --git a/arch/x86/crypto/chacha20-ssse3-x86_64.S b/arch/x86/crypto/chacha20-ssse3-x86_64.S
> deleted file mode 100644
> index 512a2b500fd1..000000000000
> --- a/arch/x86/crypto/chacha20-ssse3-x86_64.S
> +++ /dev/null
> @@ -1,630 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <linux/linkage.h>
> -
> -.section       .rodata.cst16.ROT8, "aM", @progbits, 16
> -.align 16
> -ROT8:  .octa 0x0e0d0c0f0a09080b0605040702010003
> -.section       .rodata.cst16.ROT16, "aM", @progbits, 16
> -.align 16
> -ROT16: .octa 0x0d0c0f0e09080b0a0504070601000302
> -.section       .rodata.cst16.CTRINC, "aM", @progbits, 16
> -.align 16
> -CTRINC:        .octa 0x00000003000000020000000100000000
> -
> -.text
> -
> -ENTRY(chacha20_block_xor_ssse3)
> -       # %rdi: Input state matrix, s
> -       # %rsi: 1 data block output, o
> -       # %rdx: 1 data block input, i
> -
> -       # This function encrypts one ChaCha20 block by loading the state matrix
> -       # in four SSE registers. It performs matrix operation on four words in
> -       # parallel, but requireds shuffling to rearrange the words after each
> -       # round. 8/16-bit word rotation is done with the slightly better
> -       # performing SSSE3 byte shuffling, 7/12-bit word rotation uses
> -       # traditional shift+OR.
> -
> -       # x0..3 = s0..3
> -       movdqa          0x00(%rdi),%xmm0
> -       movdqa          0x10(%rdi),%xmm1
> -       movdqa          0x20(%rdi),%xmm2
> -       movdqa          0x30(%rdi),%xmm3
> -       movdqa          %xmm0,%xmm8
> -       movdqa          %xmm1,%xmm9
> -       movdqa          %xmm2,%xmm10
> -       movdqa          %xmm3,%xmm11
> -
> -       movdqa          ROT8(%rip),%xmm4
> -       movdqa          ROT16(%rip),%xmm5
> -
> -       mov     $10,%ecx
> -
> -.Ldoubleround:
> -
> -       # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       paddd           %xmm1,%xmm0
> -       pxor            %xmm0,%xmm3
> -       pshufb          %xmm5,%xmm3
> -
> -       # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       paddd           %xmm3,%xmm2
> -       pxor            %xmm2,%xmm1
> -       movdqa          %xmm1,%xmm6
> -       pslld           $12,%xmm6
> -       psrld           $20,%xmm1
> -       por             %xmm6,%xmm1
> -
> -       # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       paddd           %xmm1,%xmm0
> -       pxor            %xmm0,%xmm3
> -       pshufb          %xmm4,%xmm3
> -
> -       # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       paddd           %xmm3,%xmm2
> -       pxor            %xmm2,%xmm1
> -       movdqa          %xmm1,%xmm7
> -       pslld           $7,%xmm7
> -       psrld           $25,%xmm1
> -       por             %xmm7,%xmm1
> -
> -       # x1 = shuffle32(x1, MASK(0, 3, 2, 1))
> -       pshufd          $0x39,%xmm1,%xmm1
> -       # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       pshufd          $0x4e,%xmm2,%xmm2
> -       # x3 = shuffle32(x3, MASK(2, 1, 0, 3))
> -       pshufd          $0x93,%xmm3,%xmm3
> -
> -       # x0 += x1, x3 = rotl32(x3 ^ x0, 16)
> -       paddd           %xmm1,%xmm0
> -       pxor            %xmm0,%xmm3
> -       pshufb          %xmm5,%xmm3
> -
> -       # x2 += x3, x1 = rotl32(x1 ^ x2, 12)
> -       paddd           %xmm3,%xmm2
> -       pxor            %xmm2,%xmm1
> -       movdqa          %xmm1,%xmm6
> -       pslld           $12,%xmm6
> -       psrld           $20,%xmm1
> -       por             %xmm6,%xmm1
> -
> -       # x0 += x1, x3 = rotl32(x3 ^ x0, 8)
> -       paddd           %xmm1,%xmm0
> -       pxor            %xmm0,%xmm3
> -       pshufb          %xmm4,%xmm3
> -
> -       # x2 += x3, x1 = rotl32(x1 ^ x2, 7)
> -       paddd           %xmm3,%xmm2
> -       pxor            %xmm2,%xmm1
> -       movdqa          %xmm1,%xmm7
> -       pslld           $7,%xmm7
> -       psrld           $25,%xmm1
> -       por             %xmm7,%xmm1
> -
> -       # x1 = shuffle32(x1, MASK(2, 1, 0, 3))
> -       pshufd          $0x93,%xmm1,%xmm1
> -       # x2 = shuffle32(x2, MASK(1, 0, 3, 2))
> -       pshufd          $0x4e,%xmm2,%xmm2
> -       # x3 = shuffle32(x3, MASK(0, 3, 2, 1))
> -       pshufd          $0x39,%xmm3,%xmm3
> -
> -       dec             %ecx
> -       jnz             .Ldoubleround
> -
> -       # o0 = i0 ^ (x0 + s0)
> -       movdqu          0x00(%rdx),%xmm4
> -       paddd           %xmm8,%xmm0
> -       pxor            %xmm4,%xmm0
> -       movdqu          %xmm0,0x00(%rsi)
> -       # o1 = i1 ^ (x1 + s1)
> -       movdqu          0x10(%rdx),%xmm5
> -       paddd           %xmm9,%xmm1
> -       pxor            %xmm5,%xmm1
> -       movdqu          %xmm1,0x10(%rsi)
> -       # o2 = i2 ^ (x2 + s2)
> -       movdqu          0x20(%rdx),%xmm6
> -       paddd           %xmm10,%xmm2
> -       pxor            %xmm6,%xmm2
> -       movdqu          %xmm2,0x20(%rsi)
> -       # o3 = i3 ^ (x3 + s3)
> -       movdqu          0x30(%rdx),%xmm7
> -       paddd           %xmm11,%xmm3
> -       pxor            %xmm7,%xmm3
> -       movdqu          %xmm3,0x30(%rsi)
> -
> -       ret
> -ENDPROC(chacha20_block_xor_ssse3)
> -
> -ENTRY(chacha20_4block_xor_ssse3)
> -       # %rdi: Input state matrix, s
> -       # %rsi: 4 data blocks output, o
> -       # %rdx: 4 data blocks input, i
> -
> -       # This function encrypts four consecutive ChaCha20 blocks by loading the
> -       # the state matrix in SSE registers four times. As we need some scratch
> -       # registers, we save the first four registers on the stack. The
> -       # algorithm performs each operation on the corresponding word of each
> -       # state matrix, hence requires no word shuffling. For final XORing step
> -       # we transpose the matrix by interleaving 32- and then 64-bit words,
> -       # which allows us to do XOR in SSE registers. 8/16-bit word rotation is
> -       # done with the slightly better performing SSSE3 byte shuffling,
> -       # 7/12-bit word rotation uses traditional shift+OR.
> -
> -       lea             8(%rsp),%r10
> -       sub             $0x80,%rsp
> -       and             $~63,%rsp
> -
> -       # x0..15[0-3] = s0..3[0..3]
> -       movq            0x00(%rdi),%xmm1
> -       pshufd          $0x00,%xmm1,%xmm0
> -       pshufd          $0x55,%xmm1,%xmm1
> -       movq            0x08(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       movq            0x10(%rdi),%xmm5
> -       pshufd          $0x00,%xmm5,%xmm4
> -       pshufd          $0x55,%xmm5,%xmm5
> -       movq            0x18(%rdi),%xmm7
> -       pshufd          $0x00,%xmm7,%xmm6
> -       pshufd          $0x55,%xmm7,%xmm7
> -       movq            0x20(%rdi),%xmm9
> -       pshufd          $0x00,%xmm9,%xmm8
> -       pshufd          $0x55,%xmm9,%xmm9
> -       movq            0x28(%rdi),%xmm11
> -       pshufd          $0x00,%xmm11,%xmm10
> -       pshufd          $0x55,%xmm11,%xmm11
> -       movq            0x30(%rdi),%xmm13
> -       pshufd          $0x00,%xmm13,%xmm12
> -       pshufd          $0x55,%xmm13,%xmm13
> -       movq            0x38(%rdi),%xmm15
> -       pshufd          $0x00,%xmm15,%xmm14
> -       pshufd          $0x55,%xmm15,%xmm15
> -       # x0..3 on stack
> -       movdqa          %xmm0,0x00(%rsp)
> -       movdqa          %xmm1,0x10(%rsp)
> -       movdqa          %xmm2,0x20(%rsp)
> -       movdqa          %xmm3,0x30(%rsp)
> -
> -       movdqa          CTRINC(%rip),%xmm1
> -       movdqa          ROT8(%rip),%xmm2
> -       movdqa          ROT16(%rip),%xmm3
> -
> -       # x12 += counter values 0-3
> -       paddd           %xmm1,%xmm12
> -
> -       mov             $10,%ecx
> -
> -.Ldoubleround4:
> -       # x0 += x4, x12 = rotl32(x12 ^ x0, 16)
> -       movdqa          0x00(%rsp),%xmm0
> -       paddd           %xmm4,%xmm0
> -       movdqa          %xmm0,0x00(%rsp)
> -       pxor            %xmm0,%xmm12
> -       pshufb          %xmm3,%xmm12
> -       # x1 += x5, x13 = rotl32(x13 ^ x1, 16)
> -       movdqa          0x10(%rsp),%xmm0
> -       paddd           %xmm5,%xmm0
> -       movdqa          %xmm0,0x10(%rsp)
> -       pxor            %xmm0,%xmm13
> -       pshufb          %xmm3,%xmm13
> -       # x2 += x6, x14 = rotl32(x14 ^ x2, 16)
> -       movdqa          0x20(%rsp),%xmm0
> -       paddd           %xmm6,%xmm0
> -       movdqa          %xmm0,0x20(%rsp)
> -       pxor            %xmm0,%xmm14
> -       pshufb          %xmm3,%xmm14
> -       # x3 += x7, x15 = rotl32(x15 ^ x3, 16)
> -       movdqa          0x30(%rsp),%xmm0
> -       paddd           %xmm7,%xmm0
> -       movdqa          %xmm0,0x30(%rsp)
> -       pxor            %xmm0,%xmm15
> -       pshufb          %xmm3,%xmm15
> -
> -       # x8 += x12, x4 = rotl32(x4 ^ x8, 12)
> -       paddd           %xmm12,%xmm8
> -       pxor            %xmm8,%xmm4
> -       movdqa          %xmm4,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm4
> -       por             %xmm0,%xmm4
> -       # x9 += x13, x5 = rotl32(x5 ^ x9, 12)
> -       paddd           %xmm13,%xmm9
> -       pxor            %xmm9,%xmm5
> -       movdqa          %xmm5,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm5
> -       por             %xmm0,%xmm5
> -       # x10 += x14, x6 = rotl32(x6 ^ x10, 12)
> -       paddd           %xmm14,%xmm10
> -       pxor            %xmm10,%xmm6
> -       movdqa          %xmm6,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm6
> -       por             %xmm0,%xmm6
> -       # x11 += x15, x7 = rotl32(x7 ^ x11, 12)
> -       paddd           %xmm15,%xmm11
> -       pxor            %xmm11,%xmm7
> -       movdqa          %xmm7,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm7
> -       por             %xmm0,%xmm7
> -
> -       # x0 += x4, x12 = rotl32(x12 ^ x0, 8)
> -       movdqa          0x00(%rsp),%xmm0
> -       paddd           %xmm4,%xmm0
> -       movdqa          %xmm0,0x00(%rsp)
> -       pxor            %xmm0,%xmm12
> -       pshufb          %xmm2,%xmm12
> -       # x1 += x5, x13 = rotl32(x13 ^ x1, 8)
> -       movdqa          0x10(%rsp),%xmm0
> -       paddd           %xmm5,%xmm0
> -       movdqa          %xmm0,0x10(%rsp)
> -       pxor            %xmm0,%xmm13
> -       pshufb          %xmm2,%xmm13
> -       # x2 += x6, x14 = rotl32(x14 ^ x2, 8)
> -       movdqa          0x20(%rsp),%xmm0
> -       paddd           %xmm6,%xmm0
> -       movdqa          %xmm0,0x20(%rsp)
> -       pxor            %xmm0,%xmm14
> -       pshufb          %xmm2,%xmm14
> -       # x3 += x7, x15 = rotl32(x15 ^ x3, 8)
> -       movdqa          0x30(%rsp),%xmm0
> -       paddd           %xmm7,%xmm0
> -       movdqa          %xmm0,0x30(%rsp)
> -       pxor            %xmm0,%xmm15
> -       pshufb          %xmm2,%xmm15
> -
> -       # x8 += x12, x4 = rotl32(x4 ^ x8, 7)
> -       paddd           %xmm12,%xmm8
> -       pxor            %xmm8,%xmm4
> -       movdqa          %xmm4,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm4
> -       por             %xmm0,%xmm4
> -       # x9 += x13, x5 = rotl32(x5 ^ x9, 7)
> -       paddd           %xmm13,%xmm9
> -       pxor            %xmm9,%xmm5
> -       movdqa          %xmm5,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm5
> -       por             %xmm0,%xmm5
> -       # x10 += x14, x6 = rotl32(x6 ^ x10, 7)
> -       paddd           %xmm14,%xmm10
> -       pxor            %xmm10,%xmm6
> -       movdqa          %xmm6,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm6
> -       por             %xmm0,%xmm6
> -       # x11 += x15, x7 = rotl32(x7 ^ x11, 7)
> -       paddd           %xmm15,%xmm11
> -       pxor            %xmm11,%xmm7
> -       movdqa          %xmm7,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm7
> -       por             %xmm0,%xmm7
> -
> -       # x0 += x5, x15 = rotl32(x15 ^ x0, 16)
> -       movdqa          0x00(%rsp),%xmm0
> -       paddd           %xmm5,%xmm0
> -       movdqa          %xmm0,0x00(%rsp)
> -       pxor            %xmm0,%xmm15
> -       pshufb          %xmm3,%xmm15
> -       # x1 += x6, x12 = rotl32(x12 ^ x1, 16)
> -       movdqa          0x10(%rsp),%xmm0
> -       paddd           %xmm6,%xmm0
> -       movdqa          %xmm0,0x10(%rsp)
> -       pxor            %xmm0,%xmm12
> -       pshufb          %xmm3,%xmm12
> -       # x2 += x7, x13 = rotl32(x13 ^ x2, 16)
> -       movdqa          0x20(%rsp),%xmm0
> -       paddd           %xmm7,%xmm0
> -       movdqa          %xmm0,0x20(%rsp)
> -       pxor            %xmm0,%xmm13
> -       pshufb          %xmm3,%xmm13
> -       # x3 += x4, x14 = rotl32(x14 ^ x3, 16)
> -       movdqa          0x30(%rsp),%xmm0
> -       paddd           %xmm4,%xmm0
> -       movdqa          %xmm0,0x30(%rsp)
> -       pxor            %xmm0,%xmm14
> -       pshufb          %xmm3,%xmm14
> -
> -       # x10 += x15, x5 = rotl32(x5 ^ x10, 12)
> -       paddd           %xmm15,%xmm10
> -       pxor            %xmm10,%xmm5
> -       movdqa          %xmm5,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm5
> -       por             %xmm0,%xmm5
> -       # x11 += x12, x6 = rotl32(x6 ^ x11, 12)
> -       paddd           %xmm12,%xmm11
> -       pxor            %xmm11,%xmm6
> -       movdqa          %xmm6,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm6
> -       por             %xmm0,%xmm6
> -       # x8 += x13, x7 = rotl32(x7 ^ x8, 12)
> -       paddd           %xmm13,%xmm8
> -       pxor            %xmm8,%xmm7
> -       movdqa          %xmm7,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm7
> -       por             %xmm0,%xmm7
> -       # x9 += x14, x4 = rotl32(x4 ^ x9, 12)
> -       paddd           %xmm14,%xmm9
> -       pxor            %xmm9,%xmm4
> -       movdqa          %xmm4,%xmm0
> -       pslld           $12,%xmm0
> -       psrld           $20,%xmm4
> -       por             %xmm0,%xmm4
> -
> -       # x0 += x5, x15 = rotl32(x15 ^ x0, 8)
> -       movdqa          0x00(%rsp),%xmm0
> -       paddd           %xmm5,%xmm0
> -       movdqa          %xmm0,0x00(%rsp)
> -       pxor            %xmm0,%xmm15
> -       pshufb          %xmm2,%xmm15
> -       # x1 += x6, x12 = rotl32(x12 ^ x1, 8)
> -       movdqa          0x10(%rsp),%xmm0
> -       paddd           %xmm6,%xmm0
> -       movdqa          %xmm0,0x10(%rsp)
> -       pxor            %xmm0,%xmm12
> -       pshufb          %xmm2,%xmm12
> -       # x2 += x7, x13 = rotl32(x13 ^ x2, 8)
> -       movdqa          0x20(%rsp),%xmm0
> -       paddd           %xmm7,%xmm0
> -       movdqa          %xmm0,0x20(%rsp)
> -       pxor            %xmm0,%xmm13
> -       pshufb          %xmm2,%xmm13
> -       # x3 += x4, x14 = rotl32(x14 ^ x3, 8)
> -       movdqa          0x30(%rsp),%xmm0
> -       paddd           %xmm4,%xmm0
> -       movdqa          %xmm0,0x30(%rsp)
> -       pxor            %xmm0,%xmm14
> -       pshufb          %xmm2,%xmm14
> -
> -       # x10 += x15, x5 = rotl32(x5 ^ x10, 7)
> -       paddd           %xmm15,%xmm10
> -       pxor            %xmm10,%xmm5
> -       movdqa          %xmm5,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm5
> -       por             %xmm0,%xmm5
> -       # x11 += x12, x6 = rotl32(x6 ^ x11, 7)
> -       paddd           %xmm12,%xmm11
> -       pxor            %xmm11,%xmm6
> -       movdqa          %xmm6,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm6
> -       por             %xmm0,%xmm6
> -       # x8 += x13, x7 = rotl32(x7 ^ x8, 7)
> -       paddd           %xmm13,%xmm8
> -       pxor            %xmm8,%xmm7
> -       movdqa          %xmm7,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm7
> -       por             %xmm0,%xmm7
> -       # x9 += x14, x4 = rotl32(x4 ^ x9, 7)
> -       paddd           %xmm14,%xmm9
> -       pxor            %xmm9,%xmm4
> -       movdqa          %xmm4,%xmm0
> -       pslld           $7,%xmm0
> -       psrld           $25,%xmm4
> -       por             %xmm0,%xmm4
> -
> -       dec             %ecx
> -       jnz             .Ldoubleround4
> -
> -       # x0[0-3] += s0[0]
> -       # x1[0-3] += s0[1]
> -       movq            0x00(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           0x00(%rsp),%xmm2
> -       movdqa          %xmm2,0x00(%rsp)
> -       paddd           0x10(%rsp),%xmm3
> -       movdqa          %xmm3,0x10(%rsp)
> -       # x2[0-3] += s0[2]
> -       # x3[0-3] += s0[3]
> -       movq            0x08(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           0x20(%rsp),%xmm2
> -       movdqa          %xmm2,0x20(%rsp)
> -       paddd           0x30(%rsp),%xmm3
> -       movdqa          %xmm3,0x30(%rsp)
> -
> -       # x4[0-3] += s1[0]
> -       # x5[0-3] += s1[1]
> -       movq            0x10(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm4
> -       paddd           %xmm3,%xmm5
> -       # x6[0-3] += s1[2]
> -       # x7[0-3] += s1[3]
> -       movq            0x18(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm6
> -       paddd           %xmm3,%xmm7
> -
> -       # x8[0-3] += s2[0]
> -       # x9[0-3] += s2[1]
> -       movq            0x20(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm8
> -       paddd           %xmm3,%xmm9
> -       # x10[0-3] += s2[2]
> -       # x11[0-3] += s2[3]
> -       movq            0x28(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm10
> -       paddd           %xmm3,%xmm11
> -
> -       # x12[0-3] += s3[0]
> -       # x13[0-3] += s3[1]
> -       movq            0x30(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm12
> -       paddd           %xmm3,%xmm13
> -       # x14[0-3] += s3[2]
> -       # x15[0-3] += s3[3]
> -       movq            0x38(%rdi),%xmm3
> -       pshufd          $0x00,%xmm3,%xmm2
> -       pshufd          $0x55,%xmm3,%xmm3
> -       paddd           %xmm2,%xmm14
> -       paddd           %xmm3,%xmm15
> -
> -       # x12 += counter values 0-3
> -       paddd           %xmm1,%xmm12
> -
> -       # interleave 32-bit words in state n, n+1
> -       movdqa          0x00(%rsp),%xmm0
> -       movdqa          0x10(%rsp),%xmm1
> -       movdqa          %xmm0,%xmm2
> -       punpckldq       %xmm1,%xmm2
> -       punpckhdq       %xmm1,%xmm0
> -       movdqa          %xmm2,0x00(%rsp)
> -       movdqa          %xmm0,0x10(%rsp)
> -       movdqa          0x20(%rsp),%xmm0
> -       movdqa          0x30(%rsp),%xmm1
> -       movdqa          %xmm0,%xmm2
> -       punpckldq       %xmm1,%xmm2
> -       punpckhdq       %xmm1,%xmm0
> -       movdqa          %xmm2,0x20(%rsp)
> -       movdqa          %xmm0,0x30(%rsp)
> -       movdqa          %xmm4,%xmm0
> -       punpckldq       %xmm5,%xmm4
> -       punpckhdq       %xmm5,%xmm0
> -       movdqa          %xmm0,%xmm5
> -       movdqa          %xmm6,%xmm0
> -       punpckldq       %xmm7,%xmm6
> -       punpckhdq       %xmm7,%xmm0
> -       movdqa          %xmm0,%xmm7
> -       movdqa          %xmm8,%xmm0
> -       punpckldq       %xmm9,%xmm8
> -       punpckhdq       %xmm9,%xmm0
> -       movdqa          %xmm0,%xmm9
> -       movdqa          %xmm10,%xmm0
> -       punpckldq       %xmm11,%xmm10
> -       punpckhdq       %xmm11,%xmm0
> -       movdqa          %xmm0,%xmm11
> -       movdqa          %xmm12,%xmm0
> -       punpckldq       %xmm13,%xmm12
> -       punpckhdq       %xmm13,%xmm0
> -       movdqa          %xmm0,%xmm13
> -       movdqa          %xmm14,%xmm0
> -       punpckldq       %xmm15,%xmm14
> -       punpckhdq       %xmm15,%xmm0
> -       movdqa          %xmm0,%xmm15
> -
> -       # interleave 64-bit words in state n, n+2
> -       movdqa          0x00(%rsp),%xmm0
> -       movdqa          0x20(%rsp),%xmm1
> -       movdqa          %xmm0,%xmm2
> -       punpcklqdq      %xmm1,%xmm2
> -       punpckhqdq      %xmm1,%xmm0
> -       movdqa          %xmm2,0x00(%rsp)
> -       movdqa          %xmm0,0x20(%rsp)
> -       movdqa          0x10(%rsp),%xmm0
> -       movdqa          0x30(%rsp),%xmm1
> -       movdqa          %xmm0,%xmm2
> -       punpcklqdq      %xmm1,%xmm2
> -       punpckhqdq      %xmm1,%xmm0
> -       movdqa          %xmm2,0x10(%rsp)
> -       movdqa          %xmm0,0x30(%rsp)
> -       movdqa          %xmm4,%xmm0
> -       punpcklqdq      %xmm6,%xmm4
> -       punpckhqdq      %xmm6,%xmm0
> -       movdqa          %xmm0,%xmm6
> -       movdqa          %xmm5,%xmm0
> -       punpcklqdq      %xmm7,%xmm5
> -       punpckhqdq      %xmm7,%xmm0
> -       movdqa          %xmm0,%xmm7
> -       movdqa          %xmm8,%xmm0
> -       punpcklqdq      %xmm10,%xmm8
> -       punpckhqdq      %xmm10,%xmm0
> -       movdqa          %xmm0,%xmm10
> -       movdqa          %xmm9,%xmm0
> -       punpcklqdq      %xmm11,%xmm9
> -       punpckhqdq      %xmm11,%xmm0
> -       movdqa          %xmm0,%xmm11
> -       movdqa          %xmm12,%xmm0
> -       punpcklqdq      %xmm14,%xmm12
> -       punpckhqdq      %xmm14,%xmm0
> -       movdqa          %xmm0,%xmm14
> -       movdqa          %xmm13,%xmm0
> -       punpcklqdq      %xmm15,%xmm13
> -       punpckhqdq      %xmm15,%xmm0
> -       movdqa          %xmm0,%xmm15
> -
> -       # xor with corresponding input, write to output
> -       movdqa          0x00(%rsp),%xmm0
> -       movdqu          0x00(%rdx),%xmm1
> -       pxor            %xmm1,%xmm0
> -       movdqu          %xmm0,0x00(%rsi)
> -       movdqa          0x10(%rsp),%xmm0
> -       movdqu          0x80(%rdx),%xmm1
> -       pxor            %xmm1,%xmm0
> -       movdqu          %xmm0,0x80(%rsi)
> -       movdqa          0x20(%rsp),%xmm0
> -       movdqu          0x40(%rdx),%xmm1
> -       pxor            %xmm1,%xmm0
> -       movdqu          %xmm0,0x40(%rsi)
> -       movdqa          0x30(%rsp),%xmm0
> -       movdqu          0xc0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm0
> -       movdqu          %xmm0,0xc0(%rsi)
> -       movdqu          0x10(%rdx),%xmm1
> -       pxor            %xmm1,%xmm4
> -       movdqu          %xmm4,0x10(%rsi)
> -       movdqu          0x90(%rdx),%xmm1
> -       pxor            %xmm1,%xmm5
> -       movdqu          %xmm5,0x90(%rsi)
> -       movdqu          0x50(%rdx),%xmm1
> -       pxor            %xmm1,%xmm6
> -       movdqu          %xmm6,0x50(%rsi)
> -       movdqu          0xd0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm7
> -       movdqu          %xmm7,0xd0(%rsi)
> -       movdqu          0x20(%rdx),%xmm1
> -       pxor            %xmm1,%xmm8
> -       movdqu          %xmm8,0x20(%rsi)
> -       movdqu          0xa0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm9
> -       movdqu          %xmm9,0xa0(%rsi)
> -       movdqu          0x60(%rdx),%xmm1
> -       pxor            %xmm1,%xmm10
> -       movdqu          %xmm10,0x60(%rsi)
> -       movdqu          0xe0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm11
> -       movdqu          %xmm11,0xe0(%rsi)
> -       movdqu          0x30(%rdx),%xmm1
> -       pxor            %xmm1,%xmm12
> -       movdqu          %xmm12,0x30(%rsi)
> -       movdqu          0xb0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm13
> -       movdqu          %xmm13,0xb0(%rsi)
> -       movdqu          0x70(%rdx),%xmm1
> -       pxor            %xmm1,%xmm14
> -       movdqu          %xmm14,0x70(%rsi)
> -       movdqu          0xf0(%rdx),%xmm1
> -       pxor            %xmm1,%xmm15
> -       movdqu          %xmm15,0xf0(%rsi)
> -
> -       lea             -8(%r10),%rsp
> -       ret
> -ENDPROC(chacha20_4block_xor_ssse3)
> diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
> deleted file mode 100644
> index dce7c5d39c2f..000000000000
> --- a/arch/x86/crypto/chacha20_glue.c
> +++ /dev/null
> @@ -1,146 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <crypto/algapi.h>
> -#include <crypto/chacha20.h>
> -#include <crypto/internal/skcipher.h>
> -#include <linux/kernel.h>
> -#include <linux/module.h>
> -#include <asm/fpu/api.h>
> -#include <asm/simd.h>
> -
> -#define CHACHA20_STATE_ALIGN 16
> -
> -asmlinkage void chacha20_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
> -asmlinkage void chacha20_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
> -#ifdef CONFIG_AS_AVX2
> -asmlinkage void chacha20_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src);
> -static bool chacha20_use_avx2;
> -#endif
> -
> -static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
> -                           unsigned int bytes)
> -{
> -       u8 buf[CHACHA20_BLOCK_SIZE];
> -
> -#ifdef CONFIG_AS_AVX2
> -       if (chacha20_use_avx2) {
> -               while (bytes >= CHACHA20_BLOCK_SIZE * 8) {
> -                       chacha20_8block_xor_avx2(state, dst, src);
> -                       bytes -= CHACHA20_BLOCK_SIZE * 8;
> -                       src += CHACHA20_BLOCK_SIZE * 8;
> -                       dst += CHACHA20_BLOCK_SIZE * 8;
> -                       state[12] += 8;
> -               }
> -       }
> -#endif
> -       while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
> -               chacha20_4block_xor_ssse3(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE * 4;
> -               src += CHACHA20_BLOCK_SIZE * 4;
> -               dst += CHACHA20_BLOCK_SIZE * 4;
> -               state[12] += 4;
> -       }
> -       while (bytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_block_xor_ssse3(state, dst, src);
> -               bytes -= CHACHA20_BLOCK_SIZE;
> -               src += CHACHA20_BLOCK_SIZE;
> -               dst += CHACHA20_BLOCK_SIZE;
> -               state[12]++;
> -       }
> -       if (bytes) {
> -               memcpy(buf, src, bytes);
> -               chacha20_block_xor_ssse3(state, buf, buf);
> -               memcpy(dst, buf, bytes);
> -       }
> -}
> -
> -static int chacha20_simd(struct skcipher_request *req)
> -{
> -       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       u32 *state, state_buf[16 + 2] __aligned(8);
> -       struct skcipher_walk walk;
> -       int err;
> -
> -       BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
> -       state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
> -
> -       if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
> -               return crypto_chacha20_crypt(req);
> -
> -       err = skcipher_walk_virt(&walk, req, true);
> -
> -       crypto_chacha20_init(state, ctx, walk.iv);
> -
> -       kernel_fpu_begin();
> -
> -       while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                               rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
> -               err = skcipher_walk_done(&walk,
> -                                        walk.nbytes % CHACHA20_BLOCK_SIZE);
> -       }
> -
> -       if (walk.nbytes) {
> -               chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                               walk.nbytes);
> -               err = skcipher_walk_done(&walk, 0);
> -       }
> -
> -       kernel_fpu_end();
> -
> -       return err;
> -}
> -
> -static struct skcipher_alg alg = {
> -       .base.cra_name          = "chacha20",
> -       .base.cra_driver_name   = "chacha20-simd",
> -       .base.cra_priority      = 300,
> -       .base.cra_blocksize     = 1,
> -       .base.cra_ctxsize       = sizeof(struct chacha20_ctx),
> -       .base.cra_module        = THIS_MODULE,
> -
> -       .min_keysize            = CHACHA20_KEY_SIZE,
> -       .max_keysize            = CHACHA20_KEY_SIZE,
> -       .ivsize                 = CHACHA20_IV_SIZE,
> -       .chunksize              = CHACHA20_BLOCK_SIZE,
> -       .setkey                 = crypto_chacha20_setkey,
> -       .encrypt                = chacha20_simd,
> -       .decrypt                = chacha20_simd,
> -};
> -
> -static int __init chacha20_simd_mod_init(void)
> -{
> -       if (!boot_cpu_has(X86_FEATURE_SSSE3))
> -               return -ENODEV;
> -
> -#ifdef CONFIG_AS_AVX2
> -       chacha20_use_avx2 = boot_cpu_has(X86_FEATURE_AVX) &&
> -                           boot_cpu_has(X86_FEATURE_AVX2) &&
> -                           cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
> -#endif
> -       return crypto_register_skcipher(&alg);
> -}
> -
> -static void __exit chacha20_simd_mod_fini(void)
> -{
> -       crypto_unregister_skcipher(&alg);
> -}
> -
> -module_init(chacha20_simd_mod_init);
> -module_exit(chacha20_simd_mod_fini);
> -
> -MODULE_LICENSE("GPL");
> -MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
> -MODULE_DESCRIPTION("chacha20 cipher algorithm, SIMD accelerated");
> -MODULE_ALIAS_CRYPTO("chacha20");
> -MODULE_ALIAS_CRYPTO("chacha20-simd");
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index 47859a0f8052..93cd4d199447 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -1433,22 +1433,6 @@ config CRYPTO_CHACHA20
>
>           ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
>           Bernstein and further specified in RFC7539 for use in IETF protocols.
> -         This is the portable C implementation of ChaCha20.
> -
> -         See also:
> -         <http://cr.yp.to/chacha/chacha-20080128.pdf>
> -
> -config CRYPTO_CHACHA20_X86_64
> -       tristate "ChaCha20 cipher algorithm (x86_64/SSSE3/AVX2)"
> -       depends on X86 && 64BIT
> -       select CRYPTO_BLKCIPHER
> -       select CRYPTO_CHACHA20
> -       help
> -         ChaCha20 cipher algorithm, RFC7539.
> -
> -         ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
> -         Bernstein and further specified in RFC7539 for use in IETF protocols.
> -         This is the x86_64 assembler implementation using SIMD instructions.
>
>           See also:
>           <http://cr.yp.to/chacha/chacha-20080128.pdf>
> diff --git a/crypto/Makefile b/crypto/Makefile
> index 5e60348d02e2..587103b87890 100644
> --- a/crypto/Makefile
> +++ b/crypto/Makefile
> @@ -117,7 +117,7 @@ obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
>  obj-$(CONFIG_CRYPTO_SEED) += seed.o
>  obj-$(CONFIG_CRYPTO_SPECK) += speck.o
>  obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
> -obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
> +obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_zinc.o
>  obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_zinc.o
>  obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
>  obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
> diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c
> deleted file mode 100644
> index e451c3cb6a56..000000000000
> --- a/crypto/chacha20_generic.c
> +++ /dev/null
> @@ -1,136 +0,0 @@
> -/*
> - * ChaCha20 256-bit cipher algorithm, RFC7539
> - *
> - * Copyright (C) 2015 Martin Willi
> - *
> - * This program is free software; you can redistribute it and/or modify
> - * it under the terms of the GNU General Public License as published by
> - * the Free Software Foundation; either version 2 of the License, or
> - * (at your option) any later version.
> - */
> -
> -#include <asm/unaligned.h>
> -#include <crypto/algapi.h>
> -#include <crypto/chacha20.h>
> -#include <crypto/internal/skcipher.h>
> -#include <linux/module.h>
> -
> -static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src,
> -                            unsigned int bytes)
> -{
> -       u32 stream[CHACHA20_BLOCK_WORDS];
> -
> -       if (dst != src)
> -               memcpy(dst, src, bytes);
> -
> -       while (bytes >= CHACHA20_BLOCK_SIZE) {
> -               chacha20_block(state, stream);
> -               crypto_xor(dst, (const u8 *)stream, CHACHA20_BLOCK_SIZE);
> -               bytes -= CHACHA20_BLOCK_SIZE;
> -               dst += CHACHA20_BLOCK_SIZE;
> -       }
> -       if (bytes) {
> -               chacha20_block(state, stream);
> -               crypto_xor(dst, (const u8 *)stream, bytes);
> -       }
> -}
> -
> -void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv)
> -{
> -       state[0]  = 0x61707865; /* "expa" */
> -       state[1]  = 0x3320646e; /* "nd 3" */
> -       state[2]  = 0x79622d32; /* "2-by" */
> -       state[3]  = 0x6b206574; /* "te k" */
> -       state[4]  = ctx->key[0];
> -       state[5]  = ctx->key[1];
> -       state[6]  = ctx->key[2];
> -       state[7]  = ctx->key[3];
> -       state[8]  = ctx->key[4];
> -       state[9]  = ctx->key[5];
> -       state[10] = ctx->key[6];
> -       state[11] = ctx->key[7];
> -       state[12] = get_unaligned_le32(iv +  0);
> -       state[13] = get_unaligned_le32(iv +  4);
> -       state[14] = get_unaligned_le32(iv +  8);
> -       state[15] = get_unaligned_le32(iv + 12);
> -}
> -EXPORT_SYMBOL_GPL(crypto_chacha20_init);
> -
> -int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
> -                          unsigned int keysize)
> -{
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       int i;
> -
> -       if (keysize != CHACHA20_KEY_SIZE)
> -               return -EINVAL;
> -
> -       for (i = 0; i < ARRAY_SIZE(ctx->key); i++)
> -               ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
> -
> -       return 0;
> -}
> -EXPORT_SYMBOL_GPL(crypto_chacha20_setkey);
> -
> -int crypto_chacha20_crypt(struct skcipher_request *req)
> -{
> -       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> -       struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
> -       struct skcipher_walk walk;
> -       u32 state[16];
> -       int err;
> -
> -       err = skcipher_walk_virt(&walk, req, true);
> -
> -       crypto_chacha20_init(state, ctx, walk.iv);
> -
> -       while (walk.nbytes > 0) {
> -               unsigned int nbytes = walk.nbytes;
> -
> -               if (nbytes < walk.total)
> -                       nbytes = round_down(nbytes, walk.stride);
> -
> -               chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
> -                                nbytes);
> -               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> -       }
> -
> -       return err;
> -}
> -EXPORT_SYMBOL_GPL(crypto_chacha20_crypt);
> -
> -static struct skcipher_alg alg = {
> -       .base.cra_name          = "chacha20",
> -       .base.cra_driver_name   = "chacha20-generic",
> -       .base.cra_priority      = 100,
> -       .base.cra_blocksize     = 1,
> -       .base.cra_ctxsize       = sizeof(struct chacha20_ctx),
> -       .base.cra_module        = THIS_MODULE,
> -
> -       .min_keysize            = CHACHA20_KEY_SIZE,
> -       .max_keysize            = CHACHA20_KEY_SIZE,
> -       .ivsize                 = CHACHA20_IV_SIZE,
> -       .chunksize              = CHACHA20_BLOCK_SIZE,
> -       .setkey                 = crypto_chacha20_setkey,
> -       .encrypt                = crypto_chacha20_crypt,
> -       .decrypt                = crypto_chacha20_crypt,
> -};
> -
> -static int __init chacha20_generic_mod_init(void)
> -{
> -       return crypto_register_skcipher(&alg);
> -}
> -
> -static void __exit chacha20_generic_mod_fini(void)
> -{
> -       crypto_unregister_skcipher(&alg);
> -}
> -
> -module_init(chacha20_generic_mod_init);
> -module_exit(chacha20_generic_mod_fini);
> -
> -MODULE_LICENSE("GPL");
> -MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
> -MODULE_DESCRIPTION("chacha20 cipher algorithm");
> -MODULE_ALIAS_CRYPTO("chacha20");
> -MODULE_ALIAS_CRYPTO("chacha20-generic");
> diff --git a/crypto/chacha20_zinc.c b/crypto/chacha20_zinc.c
> new file mode 100644
> index 000000000000..5df88fdee066
> --- /dev/null
> +++ b/crypto/chacha20_zinc.c
> @@ -0,0 +1,100 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * Copyright (C) 2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + */
> +
> +#include <asm/unaligned.h>
> +#include <crypto/algapi.h>
> +#include <crypto/internal/skcipher.h>
> +#include <zinc/chacha20.h>
> +#include <linux/module.h>
> +
> +struct chacha20_key_ctx {
> +       u32 key[8];
> +};
> +
> +static int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
> +                                 unsigned int keysize)
> +{
> +       struct chacha20_key_ctx *key_ctx = crypto_skcipher_ctx(tfm);
> +       int i;
> +
> +       if (keysize != CHACHA20_KEY_SIZE)
> +               return -EINVAL;
> +
> +       for (i = 0; i < ARRAY_SIZE(key_ctx->key); ++i)
> +               key_ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
> +
> +       return 0;
> +}
> +
> +static int crypto_chacha20_crypt(struct skcipher_request *req)
> +{
> +       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
> +       struct chacha20_key_ctx *key_ctx = crypto_skcipher_ctx(tfm);
> +       struct chacha20_ctx ctx;
> +       struct skcipher_walk walk;
> +       simd_context_t simd_context;
> +       int err, i;
> +
> +       err = skcipher_walk_virt(&walk, req, true);
> +       if (unlikely(err))
> +               return err;
> +
> +       memcpy(ctx.key, key_ctx->key, sizeof(ctx.key));
> +       for (i = 0; i < ARRAY_SIZE(ctx.counter); ++i)
> +               ctx.counter[i] = get_unaligned_le32(walk.iv + i * sizeof(u32));
> +
> +       simd_context = simd_get();
> +       while (walk.nbytes > 0) {
> +               unsigned int nbytes = walk.nbytes;
> +
> +               if (nbytes < walk.total)
> +                       nbytes = round_down(nbytes, walk.stride);
> +
> +               chacha20(&ctx, walk.dst.virt.addr, walk.src.virt.addr, nbytes,
> +                        simd_context);
> +
> +               err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
> +               simd_context = simd_relax(simd_context);
> +       }
> +       simd_put(simd_context);
> +
> +       return err;
> +}
> +
> +static struct skcipher_alg alg = {
> +       .base.cra_name          = "chacha20",
> +       .base.cra_driver_name   = "chacha20-software",
> +       .base.cra_priority      = 100,
> +       .base.cra_blocksize     = 1,
> +       .base.cra_ctxsize       = sizeof(struct chacha20_key_ctx),
> +       .base.cra_module        = THIS_MODULE,
> +
> +       .min_keysize            = CHACHA20_KEY_SIZE,
> +       .max_keysize            = CHACHA20_KEY_SIZE,
> +       .ivsize                 = CHACHA20_IV_SIZE,
> +       .chunksize              = CHACHA20_BLOCK_SIZE,
> +       .setkey                 = crypto_chacha20_setkey,
> +       .encrypt                = crypto_chacha20_crypt,
> +       .decrypt                = crypto_chacha20_crypt,
> +};
> +
> +static int __init chacha20_mod_init(void)
> +{
> +       return crypto_register_skcipher(&alg);
> +}
> +
> +static void __exit chacha20_mod_exit(void)
> +{
> +       crypto_unregister_skcipher(&alg);
> +}
> +
> +module_init(chacha20_mod_init);
> +module_exit(chacha20_mod_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
> +MODULE_DESCRIPTION("ChaCha20 stream cipher");
> +MODULE_ALIAS_CRYPTO("chacha20");
> +MODULE_ALIAS_CRYPTO("chacha20-software");
> diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c
> index bf523797bef3..b26adb9ed898 100644
> --- a/crypto/chacha20poly1305.c
> +++ b/crypto/chacha20poly1305.c
> @@ -13,7 +13,7 @@
>  #include <crypto/internal/hash.h>
>  #include <crypto/internal/skcipher.h>
>  #include <crypto/scatterwalk.h>
> -#include <crypto/chacha20.h>
> +#include <zinc/chacha20.h>
>  #include <zinc/poly1305.h>
>  #include <linux/err.h>
>  #include <linux/init.h>
> diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h
> index b83d66073db0..3b92f58f3891 100644
> --- a/include/crypto/chacha20.h
> +++ b/include/crypto/chacha20.h
> @@ -6,23 +6,11 @@
>  #ifndef _CRYPTO_CHACHA20_H
>  #define _CRYPTO_CHACHA20_H
>
> -#include <crypto/skcipher.h>
> -#include <linux/types.h>
> -#include <linux/crypto.h>
> -
>  #define CHACHA20_IV_SIZE       16
>  #define CHACHA20_KEY_SIZE      32
>  #define CHACHA20_BLOCK_SIZE    64
>  #define CHACHA20_BLOCK_WORDS   (CHACHA20_BLOCK_SIZE / sizeof(u32))
>
> -struct chacha20_ctx {
> -       u32 key[8];
> -};
> -
>  void chacha20_block(u32 *state, u32 *stream);
> -void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv);
> -int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
> -                          unsigned int keysize);
> -int crypto_chacha20_crypt(struct skcipher_request *req);
>
>  #endif
> --
> 2.19.0
>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations
  2018-09-14 17:27   ` Ard Biesheuvel
@ 2018-09-14 17:45     ` Jason A. Donenfeld
  0 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 17:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: LKML, Netdev, Linux Crypto Mailing List, David Miller,
	Greg Kroah-Hartman, Samuel Neves, Andrew Lutomirski,
	Jean-Philippe Aumasson, Andy Polyakov, Russell King - ARM Linux,
	linux-arm-kernel

Hi Ard,

On Fri, Sep 14, 2018 at 7:27 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> As I asked in response to v3, could we please have this as a separate
> patch on top? The diff below is corrupted.

I had played with that originally, but thought it made things actually
harder to review, whereas here you have the changes presented pretty
straight forwardly, and I'd appreciate your review of them. If you and
Eric both prefer I split this into two commits, with the first one
just plopping down the CRYPTOGAMS code as is and the second one
bringing it up to kernel-snuff, I can do that.

> Also, both Andy and Eric have offered to get involved in upstreaming
> these changes to OpenSSL, so there is no delta to begin with.

Yes, I think this is probably a good long-term plan, which we can act
on sometime after Zinc is merged.

> I still don't like the GCC -includes, especially because these .h
> files contain function and variable definitions so they are not
> actually header files to begin with.

I very very strongly disagree with you here. I think doing it via
-include is significantly cleaner than any of the alternatives, and
allows the code to be cleanly expressed as conditionals that the
optimizer trivially compiles out in the case of stub functions
returning false and branch optimizes when the stub functions return
true. It is extremely important that these compile together as one
compilation unit. Yes, this is a different design than the crypto
API's approach, but I believe the approach presented here poses
significant improvements and is a lot cleaner.

> Also, you mentioned in the commit log that you got rid of defines and
> made the code more modular, but as far as I can tell, libzinc is still
> a single monolithic binary that is essentially always builtin once we
> move random.c to it.

Yes, it's still monolithic, but it's now trivial to split up when the
time comes to do that. If you and AndyL think that it should be split
into multiple modules _now_, then I can go ahead and do that for v5.
But if it's not essential, it seems simpler to keep it as is. I'll
wait for word from you two on this.

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
  2018-09-14 17:38   ` Ard Biesheuvel
@ 2018-09-14 17:49     ` Jason A. Donenfeld
  0 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-14 17:49 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: LKML, Netdev, Linux Crypto Mailing List, David Miller,
	Greg Kroah-Hartman, Samuel Neves, Andrew Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers

On Fri, Sep 14, 2018 at 7:38 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> so could we please bring that discussion to a close before we drop the ARM code?

My understanding is that either these will find their way up to AndyP
and then back down here, or Eric or you will augment the .S in this
patch at a later date with an improvement commit that includes some
benchmarks.

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 20/20] net: WireGuard secure network tunnel
  2018-09-14 16:22 ` [PATCH net-next v4 20/20] net: WireGuard secure network tunnel Jason A. Donenfeld
@ 2018-09-15 22:17   ` kbuild test robot
  2018-09-15 23:14     ` Jason A. Donenfeld
  2018-09-15 23:01   ` kbuild test robot
  1 sibling, 1 reply; 38+ messages in thread
From: kbuild test robot @ 2018-09-15 22:17 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kbuild-all, linux-kernel, netdev, linux-crypto, davem, gregkh,
	Jason A. Donenfeld

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

Hi Jason,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/WireGuard-Secure-Network-Tunnel/20180916-043623
config: powerpc-canyonlands_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=powerpc 

All warnings (new ones prefixed by >>):

   drivers/net/wireguard/send.c: In function 'packet_encrypt_worker':
>> drivers/net/wireguard/send.c:320:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    }
    ^
--
   drivers/net/wireguard/receive.c: In function 'packet_decrypt_worker':
>> drivers/net/wireguard/receive.c:520:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    }
    ^

vim +320 drivers/net/wireguard/send.c

   294	
   295	void packet_encrypt_worker(struct work_struct *work)
   296	{
   297		struct crypt_queue *queue =
   298			container_of(work, struct multicore_worker, work)->ptr;
   299		struct sk_buff *first, *skb, *next;
   300		simd_context_t simd_context = simd_get();
   301	
   302		while ((first = ptr_ring_consume_bh(&queue->ring)) != NULL) {
   303			enum packet_state state = PACKET_STATE_CRYPTED;
   304	
   305			skb_walk_null_queue_safe (first, skb, next) {
   306				if (likely(skb_encrypt(skb, PACKET_CB(first)->keypair,
   307						       simd_context)))
   308					skb_reset(skb);
   309				else {
   310					state = PACKET_STATE_DEAD;
   311					break;
   312				}
   313			}
   314			queue_enqueue_per_peer(&PACKET_PEER(first)->tx_queue, first,
   315					       state);
   316	
   317			simd_context = simd_relax(simd_context);
   318		}
   319		simd_put(simd_context);
 > 320	}
   321	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 14899 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 20/20] net: WireGuard secure network tunnel
  2018-09-14 16:22 ` [PATCH net-next v4 20/20] net: WireGuard secure network tunnel Jason A. Donenfeld
  2018-09-15 22:17   ` kbuild test robot
@ 2018-09-15 23:01   ` kbuild test robot
  2018-09-15 23:48     ` Jason A. Donenfeld
  1 sibling, 1 reply; 38+ messages in thread
From: kbuild test robot @ 2018-09-15 23:01 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kbuild-all, linux-kernel, netdev, linux-crypto, davem, gregkh,
	Jason A. Donenfeld

[-- Attachment #1: Type: text/plain, Size: 3298 bytes --]

Hi Jason,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/WireGuard-Secure-Network-Tunnel/20180916-043623
config: x86_64-randconfig-b0-09160521 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from lib/zinc/curve25519/curve25519.c:45:0:
   lib/zinc/curve25519/curve25519-hacl64.h: In function 'curve25519_generic':
>> lib/zinc/curve25519/curve25519-hacl64.h:785:1: warning: the frame size of 13216 bytes is larger than 8192 bytes [-Wframe-larger-than=]
    }
    ^

vim +785 lib/zinc/curve25519/curve25519-hacl64.h

cec5aa7c Jason A. Donenfeld 2018-09-14  755  
cec5aa7c Jason A. Donenfeld 2018-09-14  756  static void curve25519_generic(u8 mypublic[CURVE25519_POINT_SIZE],
cec5aa7c Jason A. Donenfeld 2018-09-14  757  			       const u8 secret[CURVE25519_POINT_SIZE],
cec5aa7c Jason A. Donenfeld 2018-09-14  758  			       const u8 basepoint[CURVE25519_POINT_SIZE])
cec5aa7c Jason A. Donenfeld 2018-09-14  759  {
cec5aa7c Jason A. Donenfeld 2018-09-14  760  	u64 buf0[10] __aligned(32) = { 0 };
cec5aa7c Jason A. Donenfeld 2018-09-14  761  	u64 *x0 = buf0;
cec5aa7c Jason A. Donenfeld 2018-09-14  762  	u64 *z = buf0 + 5;
cec5aa7c Jason A. Donenfeld 2018-09-14  763  	u64 *q;
cec5aa7c Jason A. Donenfeld 2018-09-14  764  	format_fexpand(x0, basepoint);
cec5aa7c Jason A. Donenfeld 2018-09-14  765  	z[0] = 1;
cec5aa7c Jason A. Donenfeld 2018-09-14  766  	q = buf0;
cec5aa7c Jason A. Donenfeld 2018-09-14  767  	{
cec5aa7c Jason A. Donenfeld 2018-09-14  768  		u8 e[32] __aligned(32) = { 0 };
cec5aa7c Jason A. Donenfeld 2018-09-14  769  		u8 *scalar;
cec5aa7c Jason A. Donenfeld 2018-09-14  770  		memcpy(e, secret, 32);
cec5aa7c Jason A. Donenfeld 2018-09-14  771  		normalize_secret(e);
cec5aa7c Jason A. Donenfeld 2018-09-14  772  		scalar = e;
cec5aa7c Jason A. Donenfeld 2018-09-14  773  		{
cec5aa7c Jason A. Donenfeld 2018-09-14  774  			u64 buf[15] = { 0 };
cec5aa7c Jason A. Donenfeld 2018-09-14  775  			u64 *nq = buf;
cec5aa7c Jason A. Donenfeld 2018-09-14  776  			u64 *x = nq;
cec5aa7c Jason A. Donenfeld 2018-09-14  777  			x[0] = 1;
cec5aa7c Jason A. Donenfeld 2018-09-14  778  			ladder_cmult(nq, scalar, q);
cec5aa7c Jason A. Donenfeld 2018-09-14  779  			format_scalar_of_point(mypublic, nq);
cec5aa7c Jason A. Donenfeld 2018-09-14  780  			memzero_explicit(buf, sizeof(buf));
cec5aa7c Jason A. Donenfeld 2018-09-14  781  		}
cec5aa7c Jason A. Donenfeld 2018-09-14  782  		memzero_explicit(e, sizeof(e));
cec5aa7c Jason A. Donenfeld 2018-09-14  783  	}
cec5aa7c Jason A. Donenfeld 2018-09-14  784  	memzero_explicit(buf0, sizeof(buf0));
cec5aa7c Jason A. Donenfeld 2018-09-14 @785  }

:::::: The code at line 785 was first introduced by commit
:::::: cec5aa7cd396d909f8555d2df80d3e1a0f94688b zinc: Curve25519 generic C implementations and selftest

:::::: TO: Jason A. Donenfeld <Jason@zx2c4.com>
:::::: CC: 0day robot <lkp@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30997 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc
  2018-09-14 16:22 ` [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc Jason A. Donenfeld
@ 2018-09-15 23:02   ` kbuild test robot
  2018-09-16  0:31     ` Jason A. Donenfeld
  2018-09-15 23:39   ` kbuild test robot
  1 sibling, 1 reply; 38+ messages in thread
From: kbuild test robot @ 2018-09-15 23:02 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kbuild-all, linux-kernel, netdev, linux-crypto, davem, gregkh,
	Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/WireGuard-Secure-Network-Tunnel/20180916-043623
config: arm64-defconfig
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        GCC_VERSION=7.2.0 make.cross ARCH=arm64  defconfig
        GCC_VERSION=7.2.0 make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   fs/sysfs/Kconfig:1:error: recursive dependency detected!
   fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
   fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by ACPI_CONFIGFS
   drivers/acpi/Kconfig:542: symbol ACPI_CONFIGFS depends on ACPI
>> drivers/acpi/Kconfig:9: symbol ACPI depends on ARCH_SUPPORTS_ACPI
>> drivers/acpi/Kconfig:6: symbol ARCH_SUPPORTS_ACPI is selected by EFI
>> arch/arm64/Kconfig:1253: symbol EFI depends on KERNEL_MODE_NEON
>> arch/arm64/Kconfig:262: symbol KERNEL_MODE_NEON is implied by ZINC_ARCH_ARM
>> lib/zinc/Kconfig:42: symbol ZINC_ARCH_ARM depends on ZINC
   lib/zinc/Kconfig:1: symbol ZINC is selected by ZINC_POLY1305
   lib/zinc/Kconfig:9: symbol ZINC_POLY1305 is selected by CRYPTO_POLY1305
   crypto/Kconfig:656: symbol CRYPTO_POLY1305 depends on CRYPTO
   crypto/Kconfig:16: symbol CRYPTO is selected by IP_SCTP
   net/sctp/Kconfig:5: symbol IP_SCTP is selected by DLM
   fs/dlm/Kconfig:1: symbol DLM depends on SYSFS
   For a resolution refer to Documentation/kbuild/kconfig-language.txt
   subsection "Kconfig recursive dependency limitations"

vim +9 drivers/acpi/Kconfig

^1da177e4 Linus Torvalds    2005-04-16   5  
f5d707ede Arnd Bergmann     2018-08-21  @6  config ARCH_SUPPORTS_ACPI
f5d707ede Arnd Bergmann     2018-08-21   7  	bool
f5d707ede Arnd Bergmann     2018-08-21   8  
3f2c48c9b Jan Engelhardt    2007-07-03  @9  menuconfig ACPI
355ee5eb6 Frans Pop         2007-10-29  10  	bool "ACPI (Advanced Configuration and Power Interface) Support"
2c870e611 Arnd Bergmann     2018-07-24  11  	depends on ARCH_SUPPORTS_ACPI
1300124f6 Adrian Bunk       2006-03-28  12  	depends on PCI
243b66e76 Len Brown         2007-02-15  13  	select PNP
2c870e611 Arnd Bergmann     2018-07-24  14  	default y if X86
1c48aa36e Bjorn Helgaas     2009-02-19  15  	help
^1da177e4 Linus Torvalds    2005-04-16  16  	  Advanced Configuration and Power Interface (ACPI) support for 
1c48aa36e Bjorn Helgaas     2009-02-19  17  	  Linux requires an ACPI-compliant platform (hardware/firmware),
^1da177e4 Linus Torvalds    2005-04-16  18  	  and assumes the presence of OS-directed configuration and power
^1da177e4 Linus Torvalds    2005-04-16  19  	  management (OSPM) software.  This option will enlarge your 
^1da177e4 Linus Torvalds    2005-04-16  20  	  kernel by about 70K.
^1da177e4 Linus Torvalds    2005-04-16  21  
^1da177e4 Linus Torvalds    2005-04-16  22  	  Linux ACPI provides a robust functional replacement for several 
^1da177e4 Linus Torvalds    2005-04-16  23  	  legacy configuration and power management interfaces, including
^1da177e4 Linus Torvalds    2005-04-16  24  	  the Plug-and-Play BIOS specification (PnP BIOS), the 
^1da177e4 Linus Torvalds    2005-04-16  25  	  MultiProcessor Specification (MPS), and the Advanced Power 
^1da177e4 Linus Torvalds    2005-04-16  26  	  Management (APM) specification.  If both ACPI and APM support 
1c48aa36e Bjorn Helgaas     2009-02-19  27  	  are configured, ACPI is used.
^1da177e4 Linus Torvalds    2005-04-16  28  
1c48aa36e Bjorn Helgaas     2009-02-19  29  	  The project home page for the Linux ACPI subsystem is here:
aaf3d29fe Rafael J. Wysocki 2013-10-10  30  	  <https://01.org/linux-acpi>
^1da177e4 Linus Torvalds    2005-04-16  31  
^1da177e4 Linus Torvalds    2005-04-16  32  	  Linux support for ACPI is based on Intel Corporation's ACPI
1c48aa36e Bjorn Helgaas     2009-02-19  33  	  Component Architecture (ACPI CA).  For more information on the
1c48aa36e Bjorn Helgaas     2009-02-19  34  	  ACPI CA, see:
1c48aa36e Bjorn Helgaas     2009-02-19  35  	  <http://acpica.org/>
^1da177e4 Linus Torvalds    2005-04-16  36  
c7f5220d0 Hanjun Guo        2014-04-08  37  	  ACPI is an open industry specification originally co-developed by
c7f5220d0 Hanjun Guo        2014-04-08  38  	  Hewlett-Packard, Intel, Microsoft, Phoenix, and Toshiba. Currently,
c7f5220d0 Hanjun Guo        2014-04-08  39  	  it is developed by the ACPI Specification Working Group (ASWG) under
c7f5220d0 Hanjun Guo        2014-04-08  40  	  the UEFI Forum and any UEFI member can join the ASWG and contribute
c7f5220d0 Hanjun Guo        2014-04-08  41  	  to the ACPI specification.
1c48aa36e Bjorn Helgaas     2009-02-19  42  	  The specification is available at:
^1da177e4 Linus Torvalds    2005-04-16  43  	  <http://www.acpi.info>
c7f5220d0 Hanjun Guo        2014-04-08  44  	  <http://www.uefi.org/acpi/specs>
^1da177e4 Linus Torvalds    2005-04-16  45  

:::::: The code at line 9 was first introduced by commit
:::::: 3f2c48c9b48423d1411695da066d525cca2a27db ACPI: Use menuconfig objects

:::::: TO: Jan Engelhardt <jengelh@linux01.gwdg.de>
:::::: CC: Len Brown <len.brown@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 20/20] net: WireGuard secure network tunnel
  2018-09-15 22:17   ` kbuild test robot
@ 2018-09-15 23:14     ` Jason A. Donenfeld
  0 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-15 23:14 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, LKML, Netdev, Linux Crypto Mailing List,
	David Miller, Greg Kroah-Hartman

Hi Mr. Ro Bot,

On Sun, Sep 16, 2018 at 12:19 AM kbuild test robot <lkp@intel.com> wrote:
>    drivers/net/wireguard/send.c: In function 'packet_encrypt_worker':
> >> drivers/net/wireguard/send.c:320:1: warning: the frame size of 1136 bytes is larger than 1024 bytes [-Wframe-larger-than=]
>     }

Thanks, fixed for v5.

Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
       [not found]   ` <201809160711.HzjdJedZ%fengguang.wu@intel.com>
@ 2018-09-15 23:21     ` Jason A. Donenfeld
  0 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-15 23:21 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, LKML, Netdev, Linux Crypto Mailing List,
	David Miller, Greg Kroah-Hartman, Samuel Neves,
	Andrew Lutomirski, Jean-Philippe Aumasson, Eric Biggers

Hi Mr. Ro Bot,

On Sun, Sep 16, 2018 at 1:14 AM kbuild test robot <lkp@intel.com> wrote:
>    crypto/chacha20_zinc.o: In function `crypto_chacha20_crypt':
> >> crypto/chacha20_zinc.c:55: undefined reference to `chacha20'

Looks like my Kconfig change didn't get squashed in as intended. Fixed for v5.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc
  2018-09-14 16:22 ` [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc Jason A. Donenfeld
  2018-09-15 23:02   ` kbuild test robot
@ 2018-09-15 23:39   ` kbuild test robot
  1 sibling, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2018-09-15 23:39 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kbuild-all, linux-kernel, netdev, linux-crypto, davem, gregkh,
	Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/WireGuard-Secure-Network-Tunnel/20180916-043623
config: arm64-defconfig
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        GCC_VERSION=7.2.0 make.cross ARCH=arm64  defconfig
        GCC_VERSION=7.2.0 make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> fs/sysfs/Kconfig:1:error: recursive dependency detected!
>> fs/sysfs/Kconfig:1: symbol SYSFS is selected by CONFIGFS_FS
>> fs/configfs/Kconfig:1: symbol CONFIGFS_FS is selected by ACPI_CONFIGFS
>> drivers/acpi/Kconfig:542: symbol ACPI_CONFIGFS depends on ACPI
   drivers/acpi/Kconfig:9: symbol ACPI depends on ARCH_SUPPORTS_ACPI
   drivers/acpi/Kconfig:6: symbol ARCH_SUPPORTS_ACPI is selected by EFI
   arch/arm64/Kconfig:1253: symbol EFI depends on KERNEL_MODE_NEON
   arch/arm64/Kconfig:262: symbol KERNEL_MODE_NEON is implied by ZINC_ARCH_ARM
   lib/zinc/Kconfig:42: symbol ZINC_ARCH_ARM depends on ZINC
>> lib/zinc/Kconfig:1: symbol ZINC is selected by ZINC_POLY1305
>> lib/zinc/Kconfig:9: symbol ZINC_POLY1305 is selected by CRYPTO_POLY1305
>> crypto/Kconfig:656: symbol CRYPTO_POLY1305 depends on CRYPTO
>> crypto/Kconfig:16: symbol CRYPTO is selected by IP_SCTP
>> net/sctp/Kconfig:5: symbol IP_SCTP is selected by DLM
>> fs/dlm/Kconfig:1: symbol DLM depends on SYSFS
   For a resolution refer to Documentation/kbuild/kconfig-language.txt
   subsection "Kconfig recursive dependency limitations"

vim +1 lib/zinc/Kconfig

32bbe22e Jason A. Donenfeld 2018-09-14  @1  config ZINC
32bbe22e Jason A. Donenfeld 2018-09-14   2  	tristate
32bbe22e Jason A. Donenfeld 2018-09-14   3  
35f45248 Jason A. Donenfeld 2018-09-14   4  config ZINC_CHACHA20
35f45248 Jason A. Donenfeld 2018-09-14   5  	bool
35f45248 Jason A. Donenfeld 2018-09-14   6  	select ZINC
35f45248 Jason A. Donenfeld 2018-09-14   7  	select CRYPTO_ALGAPI
35f45248 Jason A. Donenfeld 2018-09-14   8  
0a36c146 Jason A. Donenfeld 2018-09-14  @9  config ZINC_POLY1305
0a36c146 Jason A. Donenfeld 2018-09-14  10  	bool
0a36c146 Jason A. Donenfeld 2018-09-14  11  	select ZINC
0a36c146 Jason A. Donenfeld 2018-09-14  12  
1b5dbb86 Jason A. Donenfeld 2018-09-14  13  config ZINC_CHACHA20POLY1305
1b5dbb86 Jason A. Donenfeld 2018-09-14  14  	bool
1b5dbb86 Jason A. Donenfeld 2018-09-14  15  	select ZINC
1b5dbb86 Jason A. Donenfeld 2018-09-14  16  	select ZINC_CHACHA20
1b5dbb86 Jason A. Donenfeld 2018-09-14  17  	select ZINC_POLY1305
1b5dbb86 Jason A. Donenfeld 2018-09-14  18  	select CRYPTO_BLKCIPHER
1b5dbb86 Jason A. Donenfeld 2018-09-14  19  
a740374c Jason A. Donenfeld 2018-09-14  20  config ZINC_BLAKE2S
a740374c Jason A. Donenfeld 2018-09-14  21  	bool
a740374c Jason A. Donenfeld 2018-09-14  22  	select ZINC
a740374c Jason A. Donenfeld 2018-09-14  23  
cec5aa7c Jason A. Donenfeld 2018-09-14  24  config ZINC_CURVE25519
cec5aa7c Jason A. Donenfeld 2018-09-14  25  	bool
cec5aa7c Jason A. Donenfeld 2018-09-14  26  	select ZINC
cec5aa7c Jason A. Donenfeld 2018-09-14  27  	select CONFIG_CRYPTO
cec5aa7c Jason A. Donenfeld 2018-09-14  28  
32bbe22e Jason A. Donenfeld 2018-09-14  29  config ZINC_DEBUG
32bbe22e Jason A. Donenfeld 2018-09-14  30  	bool "Zinc cryptography library debugging and self-tests"
32bbe22e Jason A. Donenfeld 2018-09-14  31  	depends on ZINC
32bbe22e Jason A. Donenfeld 2018-09-14  32  	help
32bbe22e Jason A. Donenfeld 2018-09-14  33  	  This builds a series of self-tests for the Zinc crypto library, which
32bbe22e Jason A. Donenfeld 2018-09-14  34  	  help diagnose any cryptographic algorithm implementation issues that
32bbe22e Jason A. Donenfeld 2018-09-14  35  	  might be at the root cause of potential bugs. It also adds various
32bbe22e Jason A. Donenfeld 2018-09-14  36  	  debugging traps.
32bbe22e Jason A. Donenfeld 2018-09-14  37  
32bbe22e Jason A. Donenfeld 2018-09-14  38  	  Unless you're developing and testing cryptographic routines, or are
32bbe22e Jason A. Donenfeld 2018-09-14  39  	  especially paranoid about correctness on your hardware, you may say
32bbe22e Jason A. Donenfeld 2018-09-14  40  	  N here.
32bbe22e Jason A. Donenfeld 2018-09-14  41  
32bbe22e Jason A. Donenfeld 2018-09-14 @42  config ZINC_ARCH_ARM
32bbe22e Jason A. Donenfeld 2018-09-14  43  	def_bool y
32bbe22e Jason A. Donenfeld 2018-09-14  44  	depends on ARM
32bbe22e Jason A. Donenfeld 2018-09-14  45  	depends on ZINC
32bbe22e Jason A. Donenfeld 2018-09-14  46  	imply VFP
32bbe22e Jason A. Donenfeld 2018-09-14  47  	imply VFPv3 if CPU_V7
32bbe22e Jason A. Donenfeld 2018-09-14  48  	imply NEON if CPU_V7
32bbe22e Jason A. Donenfeld 2018-09-14  49  	imply KERNEL_MODE_NEON if CPU_V7
32bbe22e Jason A. Donenfeld 2018-09-14  50  

:::::: The code at line 1 was first introduced by commit
:::::: 32bbe22ec6fddc21b70274aab7d2830d69d51578 zinc: introduce minimal cryptography library

:::::: TO: Jason A. Donenfeld <Jason@zx2c4.com>
:::::: CC: 0day robot <lkp@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 20/20] net: WireGuard secure network tunnel
  2018-09-15 23:01   ` kbuild test robot
@ 2018-09-15 23:48     ` Jason A. Donenfeld
  0 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-15 23:48 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, LKML, Netdev, Linux Crypto Mailing List,
	David Miller, Greg Kroah-Hartman

Hi Mr. Ro Bot,

Looks like this is related to stack frames with KASAN. I've fixed this for v5.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 19/20] security/keys: rewrite big_key crypto to use Zinc
  2018-09-14 16:22 ` [PATCH net-next v4 19/20] security/keys: rewrite big_key crypto to use Zinc Jason A. Donenfeld
@ 2018-09-15 23:52   ` kbuild test robot
  2018-09-16  0:29     ` Jason A. Donenfeld
  0 siblings, 1 reply; 38+ messages in thread
From: kbuild test robot @ 2018-09-15 23:52 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kbuild-all, linux-kernel, netdev, linux-crypto, davem, gregkh,
	Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers, David Howells

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/WireGuard-Secure-Network-Tunnel/20180916-043623
config: arm64-defconfig
compiler: aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        GCC_VERSION=7.2.0 make.cross ARCH=arm64  defconfig
        GCC_VERSION=7.2.0 make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

>> drivers/acpi/Kconfig:9:error: recursive dependency detected!
   drivers/acpi/Kconfig:9: symbol ACPI depends on ARCH_SUPPORTS_ACPI
   drivers/acpi/Kconfig:6: symbol ARCH_SUPPORTS_ACPI is selected by EFI
   arch/arm64/Kconfig:1253: symbol EFI depends on KERNEL_MODE_NEON
   arch/arm64/Kconfig:262: symbol KERNEL_MODE_NEON is implied by ZINC_ARCH_ARM
   lib/zinc/Kconfig:42: symbol ZINC_ARCH_ARM depends on ZINC
>> lib/zinc/Kconfig:1: symbol ZINC is selected by ZINC_CHACHA20
>> lib/zinc/Kconfig:4: symbol ZINC_CHACHA20 is selected by ZINC_CHACHA20POLY1305
>> lib/zinc/Kconfig:13: symbol ZINC_CHACHA20POLY1305 is selected by BIG_KEYS
>> security/keys/Kconfig:44: symbol BIG_KEYS depends on KEYS
>> security/keys/Kconfig:5: symbol KEYS is selected by FS_ENCRYPTION
>> fs/crypto/Kconfig:1: symbol FS_ENCRYPTION is selected by UBIFS_FS_ENCRYPTION
>> fs/ubifs/Kconfig:65: symbol UBIFS_FS_ENCRYPTION depends on MISC_FILESYSTEMS
>> fs/Kconfig:218: symbol MISC_FILESYSTEMS is selected by ACPI_APEI
>> drivers/acpi/apei/Kconfig:8: symbol ACPI_APEI depends on ACPI
   For a resolution refer to Documentation/kbuild/kconfig-language.txt
   subsection "Kconfig recursive dependency limitations"

vim +4 lib/zinc/Kconfig

32bbe22e Jason A. Donenfeld 2018-09-14  @1  config ZINC
32bbe22e Jason A. Donenfeld 2018-09-14   2  	tristate
32bbe22e Jason A. Donenfeld 2018-09-14   3  
35f45248 Jason A. Donenfeld 2018-09-14  @4  config ZINC_CHACHA20
35f45248 Jason A. Donenfeld 2018-09-14   5  	bool
35f45248 Jason A. Donenfeld 2018-09-14   6  	select ZINC
35f45248 Jason A. Donenfeld 2018-09-14   7  	select CRYPTO_ALGAPI
35f45248 Jason A. Donenfeld 2018-09-14   8  
0a36c146 Jason A. Donenfeld 2018-09-14   9  config ZINC_POLY1305
0a36c146 Jason A. Donenfeld 2018-09-14  10  	bool
0a36c146 Jason A. Donenfeld 2018-09-14  11  	select ZINC
0a36c146 Jason A. Donenfeld 2018-09-14  12  
1b5dbb86 Jason A. Donenfeld 2018-09-14 @13  config ZINC_CHACHA20POLY1305
1b5dbb86 Jason A. Donenfeld 2018-09-14  14  	bool
1b5dbb86 Jason A. Donenfeld 2018-09-14  15  	select ZINC
1b5dbb86 Jason A. Donenfeld 2018-09-14  16  	select ZINC_CHACHA20
1b5dbb86 Jason A. Donenfeld 2018-09-14  17  	select ZINC_POLY1305
1b5dbb86 Jason A. Donenfeld 2018-09-14  18  	select CRYPTO_BLKCIPHER
1b5dbb86 Jason A. Donenfeld 2018-09-14  19  
a740374c Jason A. Donenfeld 2018-09-14  20  config ZINC_BLAKE2S
a740374c Jason A. Donenfeld 2018-09-14  21  	bool
a740374c Jason A. Donenfeld 2018-09-14  22  	select ZINC
a740374c Jason A. Donenfeld 2018-09-14  23  
cec5aa7c Jason A. Donenfeld 2018-09-14  24  config ZINC_CURVE25519
cec5aa7c Jason A. Donenfeld 2018-09-14  25  	bool
cec5aa7c Jason A. Donenfeld 2018-09-14  26  	select ZINC
cec5aa7c Jason A. Donenfeld 2018-09-14  27  	select CONFIG_CRYPTO
cec5aa7c Jason A. Donenfeld 2018-09-14  28  
32bbe22e Jason A. Donenfeld 2018-09-14  29  config ZINC_DEBUG
32bbe22e Jason A. Donenfeld 2018-09-14  30  	bool "Zinc cryptography library debugging and self-tests"
32bbe22e Jason A. Donenfeld 2018-09-14  31  	depends on ZINC
32bbe22e Jason A. Donenfeld 2018-09-14  32  	help
32bbe22e Jason A. Donenfeld 2018-09-14  33  	  This builds a series of self-tests for the Zinc crypto library, which
32bbe22e Jason A. Donenfeld 2018-09-14  34  	  help diagnose any cryptographic algorithm implementation issues that
32bbe22e Jason A. Donenfeld 2018-09-14  35  	  might be at the root cause of potential bugs. It also adds various
32bbe22e Jason A. Donenfeld 2018-09-14  36  	  debugging traps.
32bbe22e Jason A. Donenfeld 2018-09-14  37  
32bbe22e Jason A. Donenfeld 2018-09-14  38  	  Unless you're developing and testing cryptographic routines, or are
32bbe22e Jason A. Donenfeld 2018-09-14  39  	  especially paranoid about correctness on your hardware, you may say
32bbe22e Jason A. Donenfeld 2018-09-14  40  	  N here.
32bbe22e Jason A. Donenfeld 2018-09-14  41  
32bbe22e Jason A. Donenfeld 2018-09-14 @42  config ZINC_ARCH_ARM
32bbe22e Jason A. Donenfeld 2018-09-14  43  	def_bool y
32bbe22e Jason A. Donenfeld 2018-09-14  44  	depends on ARM
32bbe22e Jason A. Donenfeld 2018-09-14  45  	depends on ZINC
32bbe22e Jason A. Donenfeld 2018-09-14  46  	imply VFP
32bbe22e Jason A. Donenfeld 2018-09-14  47  	imply VFPv3 if CPU_V7
32bbe22e Jason A. Donenfeld 2018-09-14  48  	imply NEON if CPU_V7
32bbe22e Jason A. Donenfeld 2018-09-14  49  	imply KERNEL_MODE_NEON if CPU_V7
32bbe22e Jason A. Donenfeld 2018-09-14  50  

:::::: The code at line 4 was first introduced by commit
:::::: 35f45248597b5a2c80f0f4a680344c22c86efe7d zinc: ChaCha20 generic C implementation

:::::: TO: Jason A. Donenfeld <Jason@zx2c4.com>
:::::: CC: 0day robot <lkp@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 19/20] security/keys: rewrite big_key crypto to use Zinc
  2018-09-15 23:52   ` kbuild test robot
@ 2018-09-16  0:29     ` Jason A. Donenfeld
  0 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-16  0:29 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, LKML, Netdev, Linux Crypto Mailing List,
	David Miller, Greg Kroah-Hartman, Samuel Neves,
	Andrew Lutomirski, Jean-Philippe Aumasson, Eric Biggers,
	David Howells

Hey again Ro,

> 32bbe22e Jason A. Donenfeld 2018-09-14  49      imply KERNEL_MODE_NEON if CPU_V7

This shouldn't have ever been there anyway. Fixed now for v5.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc
  2018-09-15 23:02   ` kbuild test robot
@ 2018-09-16  0:31     ` Jason A. Donenfeld
  0 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-16  0:31 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, LKML, Netdev, Linux Crypto Mailing List,
	David Miller, Greg Kroah-Hartman, Samuel Neves,
	Andrew Lutomirski, Jean-Philippe Aumasson, Eric Biggers

Greetings Mr. Ro Bot,

Another one of your robot friends also caught this, and the offending
code has been removed for v5.

Thanks for botting,
Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
  2018-09-14 16:22 ` [PATCH net-next v4 18/20] crypto: port ChaCha20 " Jason A. Donenfeld
  2018-09-14 17:38   ` Ard Biesheuvel
       [not found]   ` <201809160711.HzjdJedZ%fengguang.wu@intel.com>
@ 2018-09-16 19:51   ` Martin Willi
  2018-09-17  4:54     ` Jason A. Donenfeld
  2018-09-17 11:35   ` kbuild test robot
  3 siblings, 1 reply; 38+ messages in thread
From: Martin Willi @ 2018-09-16 19:51 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, netdev, linux-crypto, davem, gregkh, Samuel Neves,
	Andy Lutomirski, Jean-Philippe Aumasson, Eric Biggers

Hi Jason,

> Now that ChaCha20 is in Zinc, we can have the crypto API code simply
> call into it.

>  delete mode 100644 arch/x86/crypto/chacha20-avx2-x86_64.S
>  delete mode 100644 arch/x86/crypto/chacha20-ssse3-x86_64.S

I did some trivial benchmarking with tcrypt for the ChaCha20Poly1305
AEAD as used by IPsec. This is on a box with AVX2, which is probably
the configuration mostly used these days. With Zinc I get:

> testing speed of rfc7539esp(chacha20,poly1305) (rfc7539esp(chacha20-software,poly1305-software)) decryption
> test 0 (288 bit key, 16 byte blocks): 743510 operations in 1 seconds (11896160 bytes)
> test 1 (288 bit key, 64 byte blocks): 743190 operations in 1 seconds (47564160 bytes)
> test 2 (288 bit key, 256 byte blocks): 701461 operations in 1 seconds (179574016 bytes)
> test 3 (288 bit key, 512 byte blocks): 681567 operations in 1 seconds (348962304 bytes)
> test 4 (288 bit key, 1024 byte blocks): 572854 operations in 1 seconds (586602496 bytes)
> test 5 (288 bit key, 2048 byte blocks): 434477 operations in 1 seconds (889808896 bytes)
> test 6 (288 bit key, 4096 byte blocks): 293553 operations in 1 seconds (1202393088 bytes)
> test 7 (288 bit key, 8192 byte blocks): 173351 operations in 1 seconds (1420091392 bytes)

Using the existing implementation, this was:

> testing speed of rfc7539esp(chacha20,poly1305) (rfc7539esp(chacha20-simd,poly1305-simd)) decryption
> test 0 (288 bit key, 16 byte blocks): 1064524 operations in 1 seconds (17032384 bytes)
> test 1 (288 bit key, 64 byte blocks): 1016046 operations in 1 seconds (65026944 bytes)
> test 2 (288 bit key, 256 byte blocks): 829566 operations in 1 seconds (212368896 bytes)
> test 3 (288 bit key, 512 byte blocks): 778912 operations in 1 seconds (398802944 bytes)
> test 4 (288 bit key, 1024 byte blocks): 622331 operations in 1 seconds (637266944 bytes)
> test 5 (288 bit key, 2048 byte blocks): 441790 operations in 1 seconds (904785920 bytes)
> test 6 (288 bit key, 4096 byte blocks): 280616 operations in 1 seconds (1149403136 bytes)
> test 7 (288 bit key, 8192 byte blocks): 158800 operations in 1 seconds (1300889600 bytes)

I've also experimented with the SIMD context save/restore amortization
from patch one on the existing implementation:

> testing speed of rfc7539esp(chacha20,poly1305) (rfc7539esp(chacha20-simd,poly1305-simd)) decryption
> test 0 (288 bit key, 16 byte blocks): 1088215 operations in 1 seconds (17411440 bytes)
> test 1 (288 bit key, 64 byte blocks): 1001788 operations in 1 seconds (64114432 bytes)
> test 2 (288 bit key, 256 byte blocks): 870193 operations in 1 seconds (222769408 bytes)
> test 3 (288 bit key, 512 byte blocks): 822149 operations in 1 seconds (420940288 bytes)
> test 4 (288 bit key, 1024 byte blocks): 647447 operations in 1 seconds (662985728 bytes)
> test 5 (288 bit key, 2048 byte blocks): 454734 operations in 1 seconds (931295232 bytes)
> test 6 (288 bit key, 4096 byte blocks): 286995 operations in 1 seconds (1175531520 bytes)
> test 7 (288 bit key, 8192 byte blocks): 162028 operations in 1 seconds (1327333376 bytes)

For large blocks your implementation is faster; for typical IPsec MTUs
this degrades performance by ~10% and more.

Martin

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
  2018-09-16 19:51   ` Martin Willi
@ 2018-09-17  4:54     ` Jason A. Donenfeld
  0 siblings, 0 replies; 38+ messages in thread
From: Jason A. Donenfeld @ 2018-09-17  4:54 UTC (permalink / raw)
  To: Martin Willi
  Cc: LKML, Netdev, Linux Crypto Mailing List, David Miller,
	Greg Kroah-Hartman, Samuel Neves, Andrew Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers

Hey Martin,

Thanks for running these and pointing this out. I've replicated the
results with tcrypt and fixed some issues, and the next patch series
should be a lot closer to what you'd expect, instead of the regression
you noticed. Most of the slowdown happened as a result of over-eager
XSAVEs, which I've now rectified. I'm still working on a few other
facets of it, but I believe v5 will be more satisfactory when posted.

Regards,
Jason

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v4 18/20] crypto: port ChaCha20 to Zinc
  2018-09-14 16:22 ` [PATCH net-next v4 18/20] crypto: port ChaCha20 " Jason A. Donenfeld
                     ` (2 preceding siblings ...)
  2018-09-16 19:51   ` Martin Willi
@ 2018-09-17 11:35   ` kbuild test robot
  3 siblings, 0 replies; 38+ messages in thread
From: kbuild test robot @ 2018-09-17 11:35 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: kbuild-all, linux-kernel, netdev, linux-crypto, davem, gregkh,
	Jason A. Donenfeld, Samuel Neves, Andy Lutomirski,
	Jean-Philippe Aumasson, Eric Biggers

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jason-A-Donenfeld/WireGuard-Secure-Network-Tunnel/20180916-043623
config: i386-randconfig-s3-09171149 (attached as .config)
compiler: gcc-6 (Debian 6.4.0-9) 6.4.0 20171026
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

>> ERROR: "chacha20" [crypto/chacha20_zinc.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29188 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-09-17 11:36 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-14 16:22 [PATCH net-next v4 00/20] WireGuard: Secure Network Tunnel Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 01/20] asm: simd context helper API Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 02/20] zinc: introduce minimal cryptography library Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 03/20] zinc: ChaCha20 generic C implementation Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 04/20] zinc: ChaCha20 ARM and ARM64 implementations Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 05/20] zinc: ChaCha20 x86_64 implementation Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 06/20] zinc: ChaCha20 MIPS32r2 implementation Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 07/20] zinc: Poly1305 generic C implementations and selftest Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 08/20] zinc: Poly1305 ARM and ARM64 implementations Jason A. Donenfeld
2018-09-14 17:27   ` Ard Biesheuvel
2018-09-14 17:45     ` Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 09/20] zinc: Poly1305 x86_64 implementation Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 10/20] zinc: Poly1305 MIPS32r2 and MIPS64 implementations Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 11/20] zinc: ChaCha20Poly1305 construction and selftest Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 12/20] zinc: BLAKE2s generic C implementation " Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 13/20] zinc: BLAKE2s x86_64 implementation Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 14/20] zinc: Curve25519 generic C implementations and selftest Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 15/20] zinc: Curve25519 ARM implementation Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 16/20] zinc: Curve25519 x86_64 implementation Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 17/20] crypto: port Poly1305 to Zinc Jason A. Donenfeld
2018-09-15 23:02   ` kbuild test robot
2018-09-16  0:31     ` Jason A. Donenfeld
2018-09-15 23:39   ` kbuild test robot
2018-09-14 16:22 ` [PATCH net-next v4 18/20] crypto: port ChaCha20 " Jason A. Donenfeld
2018-09-14 17:38   ` Ard Biesheuvel
2018-09-14 17:49     ` Jason A. Donenfeld
     [not found]   ` <201809160711.HzjdJedZ%fengguang.wu@intel.com>
2018-09-15 23:21     ` Jason A. Donenfeld
2018-09-16 19:51   ` Martin Willi
2018-09-17  4:54     ` Jason A. Donenfeld
2018-09-17 11:35   ` kbuild test robot
2018-09-14 16:22 ` [PATCH net-next v4 19/20] security/keys: rewrite big_key crypto to use Zinc Jason A. Donenfeld
2018-09-15 23:52   ` kbuild test robot
2018-09-16  0:29     ` Jason A. Donenfeld
2018-09-14 16:22 ` [PATCH net-next v4 20/20] net: WireGuard secure network tunnel Jason A. Donenfeld
2018-09-15 22:17   ` kbuild test robot
2018-09-15 23:14     ` Jason A. Donenfeld
2018-09-15 23:01   ` kbuild test robot
2018-09-15 23:48     ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).