All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/22] staging: add skein/threefish crypto algos
@ 2014-03-11 21:32 Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 01/22] scripts: objdiff: detect object code changes between two commits Jason Cooper
                   ` (23 more replies)
  0 siblings, 24 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Greg, all,

Attached is a series I've sat on for the past month and a half.

I'm hoping that by posting it in it's incomplete state, it will either a)
motivate me to finish hooking into the crypto API, or b) motivate someone else
to pitch in. ;-)

>From patch 3, all commits build successfully.  In addition, using the script I
added in patch 1, I can confirm that no object code was harmed in this process.

I'm under no time crunch with this, and I add to it as I find time.  If Greg
wants to take it for v3.15, great.  Otherwise is fine as well.  It's been a
while since I watched the inner workings of the staging tree, so I'm not
familiar with how strict it is recently.

Barring a few false-positives, this series makes the code checkpatch-clean, but
it is not ready for mainline as yet.  In particular, I really don't like the
adhoc macro definitions, nor the camelCase.

The plan is to get skein and threefish registered into the crypto API, build as
modules, and then move it to crypto/.

To facilitate tinkering with this, One can pull from the following:

  git://git.infradead.org/users/jcooper/linux.git tags/staging-skein-3.14-rc1

This is based on v3.14-rc1, and is prone to rebasing based on comments.

thx,

Jason.

Jason Cooper (22):
  scripts: objdiff: detect object code changes between two commits
  staging: crypto: skein: import code from Skein3Fish.git
  staging: crypto: skein: allow building statically
  staging: crypto: skein: remove brg_*.h includes
  staging: crypto: skein: remove skein_port.h
  staging: crypto: skein: remove __cplusplus and an unneeded stddef.h
  staging: crypto: skein: remove unneeded typedefs
  staging: crypto: skein: remove all typedef {struct,enum}
  staging: crypto: skein: use u8, u64 vice uint*_t
  staging: crypto: skein: fixup pointer whitespace
  staging: crypto: skein: cleanup whitespace around operators/punc.
  staging: crypto: skein: dos2unix, remove executable perms
  staging: crypto: skein: fix leading whitespace
  staging: crypto: skein: remove trailing whitespace
  staging: crypto: skein: cleanup >80 character lines
  staging: crypto: skein: fix do/while brace formatting
  staging: crypto: skein: fix brace placement errors
  staging: crypto: skein: wrap multi-line macros in do-while loops
  staging: crypto: skein: remove externs from .c files
  staging: crypto: skein: remove braces from single-statement block
  staging: crypto: skein: remove unnecessary line continuation
  staging: crypto: skein: add TODO file

 drivers/staging/Kconfig                      |    2 +
 drivers/staging/Makefile                     |    1 +
 drivers/staging/skein/Kconfig                |   32 +
 drivers/staging/skein/Makefile               |   13 +
 drivers/staging/skein/TODO                   |   11 +
 drivers/staging/skein/include/skein.h        |  344 ++
 drivers/staging/skein/include/skeinApi.h     |  230 ++
 drivers/staging/skein/include/skein_block.h  |   22 +
 drivers/staging/skein/include/skein_iv.h     |  186 +
 drivers/staging/skein/include/threefishApi.h |  164 +
 drivers/staging/skein/skein.c                |  880 +++++
 drivers/staging/skein/skeinApi.c             |  237 ++
 drivers/staging/skein/skeinBlockNo3F.c       |  175 +
 drivers/staging/skein/skein_block.c          |  770 ++++
 drivers/staging/skein/threefish1024Block.c   | 4900 ++++++++++++++++++++++++++
 drivers/staging/skein/threefish256Block.c    | 1137 ++++++
 drivers/staging/skein/threefish512Block.c    | 2223 ++++++++++++
 drivers/staging/skein/threefishApi.c         |   79 +
 scripts/objdiff                              |  126 +
 19 files changed, 11532 insertions(+)
 create mode 100644 drivers/staging/skein/Kconfig
 create mode 100644 drivers/staging/skein/Makefile
 create mode 100644 drivers/staging/skein/TODO
 create mode 100644 drivers/staging/skein/include/skein.h
 create mode 100644 drivers/staging/skein/include/skeinApi.h
 create mode 100644 drivers/staging/skein/include/skein_block.h
 create mode 100644 drivers/staging/skein/include/skein_iv.h
 create mode 100644 drivers/staging/skein/include/threefishApi.h
 create mode 100644 drivers/staging/skein/skein.c
 create mode 100644 drivers/staging/skein/skeinApi.c
 create mode 100644 drivers/staging/skein/skeinBlockNo3F.c
 create mode 100644 drivers/staging/skein/skein_block.c
 create mode 100644 drivers/staging/skein/threefish1024Block.c
 create mode 100644 drivers/staging/skein/threefish256Block.c
 create mode 100644 drivers/staging/skein/threefish512Block.c
 create mode 100644 drivers/staging/skein/threefishApi.c
 create mode 100755 scripts/objdiff

-- 
1.9.0

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC PATCH 01/22] scripts: objdiff: detect object code changes between two commits
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 02/22] staging: crypto: skein: import code from Skein3Fish.git Jason Cooper
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

objdiff is useful when doing large code cleanups.  For example, when
removing checkpatch warnings and errors from new drivers in the staging
tree.

objdiff can be used in conjunction with a git rebase to confirm that
each commit made no changes to the resulting object code.  It has the
same return values as diff(1).

This was written specifically to support adding the skein and threefish
cryto drivers to the staging tree.  I needed a programmatic way to
confirm that commits changing >90% of the lines didn't inadvertently
change the code.

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 scripts/objdiff | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 126 insertions(+)
 create mode 100755 scripts/objdiff

diff --git a/scripts/objdiff b/scripts/objdiff
new file mode 100755
index 000000000000..9e1ad8df2d07
--- /dev/null
+++ b/scripts/objdiff
@@ -0,0 +1,126 @@
+#!/bin/bash
+
+# objdiff - a small script for validating that a commit or series of commits
+# didn't change object code.
+#
+# Copyright 2014, Jason Cooper <jason@lakedaemon.net>
+#
+# Licensed under the terms of the GNU GPL version 2
+
+# usage example:
+#
+# $ git checkout COMMIT_A
+# $ <your fancy build command here>
+# $ ./scripts/objdiff record path/to/*.o
+#
+# $ git checkout COMMIT_B
+# $ <your fancy build command here>
+# $ ./scripts/objdiff record path/to/*.o
+#
+# $ ./scripts/objdiff diff COMMIT_A COMMIT_B
+# $
+
+# And to clean up (everything is in /tmp/objdiff-*)
+# $ ./scripts/objdiff clean all
+
+usage() {
+	echo "Usage: $0 <command> <args>"
+	echo "  record    <list of object files>"
+	echo "  diff      <commitA> <commitB>"
+	echo "  clean     all | <commit>"
+	exit 1
+}
+
+dorecord() {
+	[ $# -eq 0 ] && usage
+
+	FILES="$*"
+
+	CMT="`git rev-parse --short HEAD`"
+
+	OBJDUMP="${CROSS_COMPILE}objdump"
+	OBJDIFFD="/tmp/objdiff-$CMT"
+
+	[ ! -d "$OBJDIFFD" ] && mkdir -p "$OBJDIFFD"
+
+	for f in $FILES; do
+		dn="${f%/*}"
+		bn="${f##*/}"
+
+		[ ! -d "$OBJDIFFD/$dn" ] && mkdir -p "$OBJDIFFD/$dn"
+
+		# remove addresses for a more clear diff
+		# http://dummdida.tumblr.com/post/60924060451/binary-diff-between-libc-from-scientificlinux-and
+		$OBJDUMP -D "$f" | sed "s/^[[:space:]]\+[0-9a-f]\+//" \
+			>"$OBJDIFFD/$dn/$bn"
+
+		# force rebuild
+		rm -f "$f"
+	done
+}
+
+dodiff() {
+	[ $# -ne 2 ] && usage
+
+	SRC="`git rev-parse --short $1`"
+	DST="`git rev-parse --short $2`"
+
+	DIFF="`which colordiff`"
+
+	if [ ${#DIFF} -eq 0 ] || [ ! -x "$DIFF" ]; then
+		DIFF="`which diff`"
+	fi
+
+	SRCD="/tmp/objdiff-$SRC"
+	DSTD="/tmp/objdiff-$DST"
+
+	if [ ! -d "$SRCD" ]; then
+		echo "ERROR: $SRCD doesn't exist"
+		exit 1
+	fi
+
+	if [ ! -d "$DSTD" ]; then
+		echo "ERROR: $DSTD doesn't exist"
+		exit 1
+	fi
+
+	$DIFF -Nurd $SRCD $DSTD
+}
+
+doclean() {
+	[ $# -eq 0 ] && usage
+	[ $# -gt 1 ] && usage
+
+	if [ "x$1" = "xall" ]; then
+		rm -rf /tmp/objdiff-*
+	else
+		CMT="`git rev-parse --short $1`"
+
+		if [ -d "/tmp/objdiff-$CMT" ]; then
+			rm -rf /tmp/objdiff-$CMT
+		else
+			echo "$CMT not found"
+		fi
+	fi
+}
+
+[ $# -eq 0 ] &&	usage
+
+case "$1" in
+	record)
+		shift
+		dorecord $*
+		;;
+	diff)
+		shift
+		dodiff $*
+		;;
+	clean)
+		shift
+		doclean $*
+		;;
+	*)
+		echo "Unrecognized command '$1'"
+		exit 1
+		;;
+esac
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 02/22] staging: crypto: skein: import code from Skein3Fish.git
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 01/22] scripts: objdiff: detect object code changes between two commits Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 03/22] staging: crypto: skein: allow building statically Jason Cooper
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

This is a byte-for-byte copy of the skein implementation found at:

  https://github.com/wernerd/Skein3Fish.git

Specifically, from the master branch at commit:

  00e925444c2c Merge pull request #4 from csm/master

The next commit will do the minimum necessary to build this code as a
module.

I've generated the sha256 sums of the files by:

$ (cd drivers/staging/skein; find . -type f | sort | xargs sha256sum)
bcd73168e5805b1b157dbf08863e6a8c217a7b270b6be1a361540591b00624e3  ./CMakeLists.txt
e1adb97dd9e87bc7c05892ed7863a66d1d9fde6728a97a8b7b092709da664d29  ./include/brg_endian.h
240329b4ca4d829ac4d1490e96e83118e161e719e448c7e8dbf15735ab8a8e87  ./include/brg_types.h
0d8f16438f641fa365844a5991220eb04969f0a19c60dff08e10f521e74db5c3  ./include/skein.h
8f7362796e9e43f7619d51020d6faeedce786492b65bebd2ff6a833b621051cb  ./include/skeinApi.h
90510d8a9f686c3bfbf6cf7737237e3fa263c1ed5046b0f19727ba55b9bffeb9  ./include/skein_iv.h
42c6c8eff8f364ee2f0de3177d468dbceba9c6a73222fea473fe6d603213806a  ./include/skein_port.h
0154a4b8d54f5aa424b39a7ee668b31f2522b907bf3a8536fe46440b584531a1  ./include/threefishApi.h
ac0fc0f95a48a716d30cf02e5adad77af17725a938f939cf94f6dfba42badeca  ./skein.c
7af70b177bc63690f68eebceca2dbfef8a4473dcc847ae3525508c65c7d7bcc1  ./skeinApi.c
d7ef7330be8253f7f061de3c36880dbc83b0f5d90c8f2b72d3478766f54fbff0  ./skeinBlockNo3F.c
8bb3d7864afc9eab5569949fb2799cb6f14e583ba00641313cf877a5aea1c763  ./skein_block.c
438e6cb59a0090166e8f1e39418c0a2d0036737a32c5e2822c2ed8b803e2132f  ./threefish1024Block.c
e812ec6f2881300e90c803cfd9d044e954f1ca64faa2fc17a709f56a2f122ff8  ./threefish256Block.c
926f680057e128cdd1feba4a8544c177a74420137af480267b949ae79f3d02b8  ./threefish512Block.c
19357f5d47e7183bc8558a8d0949a3f5a80a931848917d26f36eebb7d205f003  ./threefishApi.c

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/CMakeLists.txt         |   27 +
 drivers/staging/skein/include/brg_endian.h   |  148 +++
 drivers/staging/skein/include/brg_types.h    |  188 ++++
 drivers/staging/skein/include/skein.h        |  327 ++++++
 drivers/staging/skein/include/skeinApi.h     |  239 +++++
 drivers/staging/skein/include/skein_iv.h     |  199 ++++
 drivers/staging/skein/include/skein_port.h   |  124 +++
 drivers/staging/skein/include/threefishApi.h |  167 ++++
 drivers/staging/skein/skein.c                |  742 ++++++++++++++
 drivers/staging/skein/skeinApi.c             |  221 ++++
 drivers/staging/skein/skeinBlockNo3F.c       |  172 ++++
 drivers/staging/skein/skein_block.c          |  689 +++++++++++++
 drivers/staging/skein/threefish1024Block.c   | 1385 ++++++++++++++++++++++++++
 drivers/staging/skein/threefish256Block.c    |  349 +++++++
 drivers/staging/skein/threefish512Block.c    |  643 ++++++++++++
 drivers/staging/skein/threefishApi.c         |   79 ++
 16 files changed, 5699 insertions(+)
 create mode 100755 drivers/staging/skein/CMakeLists.txt
 create mode 100644 drivers/staging/skein/include/brg_endian.h
 create mode 100644 drivers/staging/skein/include/brg_types.h
 create mode 100644 drivers/staging/skein/include/skein.h
 create mode 100755 drivers/staging/skein/include/skeinApi.h
 create mode 100644 drivers/staging/skein/include/skein_iv.h
 create mode 100644 drivers/staging/skein/include/skein_port.h
 create mode 100644 drivers/staging/skein/include/threefishApi.h
 create mode 100644 drivers/staging/skein/skein.c
 create mode 100755 drivers/staging/skein/skeinApi.c
 create mode 100644 drivers/staging/skein/skeinBlockNo3F.c
 create mode 100644 drivers/staging/skein/skein_block.c
 create mode 100644 drivers/staging/skein/threefish1024Block.c
 create mode 100644 drivers/staging/skein/threefish256Block.c
 create mode 100644 drivers/staging/skein/threefish512Block.c
 create mode 100644 drivers/staging/skein/threefishApi.c

diff --git a/drivers/staging/skein/CMakeLists.txt b/drivers/staging/skein/CMakeLists.txt
new file mode 100755
index 000000000000..604aaa394cb1
--- /dev/null
+++ b/drivers/staging/skein/CMakeLists.txt
@@ -0,0 +1,27 @@
+cmake_minimum_required (VERSION 2.6)
+
+include_directories (${CMAKE_CURRENT_SOURCE_DIR}/include)
+
+# set(skeinBlock_src skein_block.c)
+set(skeinBlock_src skeinBlockNo3F.c)
+
+set(skein_src 
+    ${skeinBlock_src}
+    skein.c
+    skeinApi.c
+    )
+
+set(threefish_src
+    threefishApi.c
+    threefish256Block.c
+    threefish512Block.c
+    threefish1024Block.c
+    )
+set(s3f_src ${skein_src} ${threefish_src})
+
+add_library(skein3fish SHARED ${s3f_src})
+set_target_properties(skein3fish PROPERTIES VERSION ${VERSION} SOVERSION ${SOVERSION})
+target_link_libraries(skein3fish ${LIBS})
+
+install(TARGETS skein3fish DESTINATION ${LIBDIRNAME})
+
diff --git a/drivers/staging/skein/include/brg_endian.h b/drivers/staging/skein/include/brg_endian.h
new file mode 100644
index 000000000000..c03c7c5d1eb4
--- /dev/null
+++ b/drivers/staging/skein/include/brg_endian.h
@@ -0,0 +1,148 @@
+/*
+ ---------------------------------------------------------------------------
+ Copyright (c) 2003, Dr Brian Gladman, Worcester, UK.   All rights reserved.
+
+ LICENSE TERMS
+
+ The free distribution and use of this software in both source and binary
+ form is allowed (with or without changes) provided that:
+
+   1. distributions of this source code include the above copyright
+      notice, this list of conditions and the following disclaimer;
+
+   2. distributions in binary form include the above copyright
+      notice, this list of conditions and the following disclaimer
+      in the documentation and/or other associated materials;
+
+   3. the copyright holder's name is not used to endorse products
+      built using this software without specific written permission.
+
+ ALTERNATIVELY, provided that this notice is retained in full, this product
+ may be distributed under the terms of the GNU General Public License (GPL),
+ in which case the provisions of the GPL apply INSTEAD OF those given above.
+
+ DISCLAIMER
+
+ This software is provided 'as is' with no explicit or implied warranties
+ in respect of its properties, including, but not limited to, correctness
+ and/or fitness for purpose.
+ ---------------------------------------------------------------------------
+ Issue 20/10/2006
+*/
+
+#ifndef BRG_ENDIAN_H
+#define BRG_ENDIAN_H
+
+#define IS_BIG_ENDIAN      4321 /* byte 0 is most significant (mc68k) */
+#define IS_LITTLE_ENDIAN   1234 /* byte 0 is least significant (i386) */
+
+/* Include files where endian defines and byteswap functions may reside */
+#if defined( __FreeBSD__ ) || defined( __OpenBSD__ ) || defined( __NetBSD__ )
+#  include <sys/endian.h>
+#elif defined( BSD ) && ( BSD >= 199103 ) || defined( __APPLE__ ) || \
+      defined( __CYGWIN32__ ) || defined( __DJGPP__ ) || defined( __osf__ )
+#  include <machine/endian.h>
+#elif defined( __linux__ ) || defined( __GNUC__ ) || defined( __GNU_LIBRARY__ )
+#  if !defined( __MINGW32__ ) && !defined(AVR)
+#    include <endian.h>
+#    if !defined( __BEOS__ )
+#      include <byteswap.h>
+#    endif
+#  endif
+#endif
+
+/* Now attempt to set the define for platform byte order using any  */
+/* of the four forms SYMBOL, _SYMBOL, __SYMBOL & __SYMBOL__, which  */
+/* seem to encompass most endian symbol definitions                 */
+
+#if defined( BIG_ENDIAN ) && defined( LITTLE_ENDIAN )
+#  if defined( BYTE_ORDER ) && BYTE_ORDER == BIG_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#  elif defined( BYTE_ORDER ) && BYTE_ORDER == LITTLE_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#  endif
+#elif defined( BIG_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#elif defined( LITTLE_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+
+#if defined( _BIG_ENDIAN ) && defined( _LITTLE_ENDIAN )
+#  if defined( _BYTE_ORDER ) && _BYTE_ORDER == _BIG_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#  elif defined( _BYTE_ORDER ) && _BYTE_ORDER == _LITTLE_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#  endif
+#elif defined( _BIG_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#elif defined( _LITTLE_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+
+#if defined( __BIG_ENDIAN ) && defined( __LITTLE_ENDIAN )
+#  if defined( __BYTE_ORDER ) && __BYTE_ORDER == __BIG_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#  elif defined( __BYTE_ORDER ) && __BYTE_ORDER == __LITTLE_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#  endif
+#elif defined( __BIG_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#elif defined( __LITTLE_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+
+#if defined( __BIG_ENDIAN__ ) && defined( __LITTLE_ENDIAN__ )
+#  if defined( __BYTE_ORDER__ ) && __BYTE_ORDER__ == __BIG_ENDIAN__
+#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#  elif defined( __BYTE_ORDER__ ) && __BYTE_ORDER__ == __LITTLE_ENDIAN__
+#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#  endif
+#elif defined( __BIG_ENDIAN__ )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#elif defined( __LITTLE_ENDIAN__ )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+
+/*  if the platform byte order could not be determined, then try to */
+/*  set this define using common machine defines                    */
+#if !defined(PLATFORM_BYTE_ORDER)
+
+#if   defined( __alpha__ ) || defined( __alpha ) || defined( i386 )       || \
+      defined( __i386__ )  || defined( _M_I86 )  || defined( _M_IX86 )    || \
+      defined( __OS2__ )   || defined( sun386 )  || defined( __TURBOC__ ) || \
+      defined( vax )       || defined( vms )     || defined( VMS )        || \
+      defined( __VMS )     || defined( _M_X64 )  || defined( AVR )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+
+#elif defined( AMIGA )   || defined( applec )    || defined( __AS400__ )  || \
+      defined( _CRAY )   || defined( __hppa )    || defined( __hp9000 )   || \
+      defined( ibm370 )  || defined( mc68000 )   || defined( m68k )       || \
+      defined( __MRC__ ) || defined( __MVS__ )   || defined( __MWERKS__ ) || \
+      defined( sparc )   || defined( __sparc)    || defined( SYMANTEC_C ) || \
+      defined( __VOS__ ) || defined( __TIGCC__ ) || defined( __TANDEM )   || \
+      defined( THINK_C ) || defined( __VMCMS__ )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+
+#elif 0     /* **** EDIT HERE IF NECESSARY **** */
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#elif 0     /* **** EDIT HERE IF NECESSARY **** */
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#else
+#  error Please edit lines 126 or 128 in brg_endian.h to set the platform byte order
+#endif
+#endif
+
+/* special handler for IA64, which may be either endianness (?)  */
+/* here we assume little-endian, but this may need to be changed */
+#if defined(__ia64) || defined(__ia64__) || defined(_M_IA64)
+#  define PLATFORM_MUST_ALIGN (1)
+#ifndef PLATFORM_BYTE_ORDER
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+#endif
+
+#ifndef   PLATFORM_MUST_ALIGN
+#  define PLATFORM_MUST_ALIGN (0)
+#endif
+
+#endif  /* ifndef BRG_ENDIAN_H */
diff --git a/drivers/staging/skein/include/brg_types.h b/drivers/staging/skein/include/brg_types.h
new file mode 100644
index 000000000000..6db737d71b9e
--- /dev/null
+++ b/drivers/staging/skein/include/brg_types.h
@@ -0,0 +1,188 @@
+/*
+ ---------------------------------------------------------------------------
+ Copyright (c) 1998-2006, Brian Gladman, Worcester, UK. All rights reserved.
+
+ LICENSE TERMS
+
+ The free distribution and use of this software in both source and binary
+ form is allowed (with or without changes) provided that:
+
+   1. distributions of this source code include the above copyright
+      notice, this list of conditions and the following disclaimer;
+
+   2. distributions in binary form include the above copyright
+      notice, this list of conditions and the following disclaimer
+      in the documentation and/or other associated materials;
+
+   3. the copyright holder's name is not used to endorse products
+      built using this software without specific written permission.
+
+ ALTERNATIVELY, provided that this notice is retained in full, this product
+ may be distributed under the terms of the GNU General Public License (GPL),
+ in which case the provisions of the GPL apply INSTEAD OF those given above.
+
+ DISCLAIMER
+
+ This software is provided 'as is' with no explicit or implied warranties
+ in respect of its properties, including, but not limited to, correctness
+ and/or fitness for purpose.
+ ---------------------------------------------------------------------------
+ Issue 09/09/2006
+
+ The unsigned integer types defined here are of the form uint_<nn>t where
+ <nn> is the length of the type; for example, the unsigned 32-bit type is
+ 'uint_32t'.  These are NOT the same as the 'C99 integer types' that are
+ defined in the inttypes.h and stdint.h headers since attempts to use these
+ types have shown that support for them is still highly variable.  However,
+ since the latter are of the form uint<nn>_t, a regular expression search
+ and replace (in VC++ search on 'uint_{:z}t' and replace with 'uint\1_t')
+ can be used to convert the types used here to the C99 standard types.
+*/
+
+#ifndef BRG_TYPES_H
+#define BRG_TYPES_H
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#include <limits.h>
+
+#ifndef BRG_UI8
+#  define BRG_UI8
+#  if UCHAR_MAX == 255u
+     typedef unsigned char uint_8t;
+#  else
+#    error Please define uint_8t as an 8-bit unsigned integer type in brg_types.h
+#  endif
+#endif
+
+#ifndef BRG_UI16
+#  define BRG_UI16
+#  if USHRT_MAX == 65535u
+     typedef unsigned short uint_16t;
+#  else
+#    error Please define uint_16t as a 16-bit unsigned short type in brg_types.h
+#  endif
+#endif
+
+#ifndef BRG_UI32
+#  define BRG_UI32
+#  if UINT_MAX == 4294967295u
+#    define li_32(h) 0x##h##u
+     typedef unsigned int uint_32t;
+#  elif ULONG_MAX == 4294967295u
+#    define li_32(h) 0x##h##ul
+     typedef unsigned long uint_32t;
+#  elif defined( _CRAY )
+#    error This code needs 32-bit data types, which Cray machines do not provide
+#  else
+#    error Please define uint_32t as a 32-bit unsigned integer type in brg_types.h
+#  endif
+#endif
+
+#ifndef BRG_UI64
+#  if defined( __BORLANDC__ ) && !defined( __MSDOS__ )
+#    define BRG_UI64
+#    define li_64(h) 0x##h##ui64
+     typedef unsigned __int64 uint_64t;
+#  elif defined( _MSC_VER ) && ( _MSC_VER < 1300 )    /* 1300 == VC++ 7.0 */
+#    define BRG_UI64
+#    define li_64(h) 0x##h##ui64
+     typedef unsigned __int64 uint_64t;
+#  elif defined( __sun ) && defined(ULONG_MAX) && ULONG_MAX == 0xfffffffful
+#    define BRG_UI64
+#    define li_64(h) 0x##h##ull
+     typedef unsigned long long uint_64t;
+#  elif defined( UINT_MAX ) && UINT_MAX > 4294967295u
+#    if UINT_MAX == 18446744073709551615u
+#      define BRG_UI64
+#      define li_64(h) 0x##h##u
+       typedef unsigned int uint_64t;
+#    endif
+#  elif defined( ULONG_MAX ) && ULONG_MAX > 4294967295u
+#    if ULONG_MAX == 18446744073709551615ul
+#      define BRG_UI64
+#      define li_64(h) 0x##h##ul
+       typedef unsigned long uint_64t;
+#    endif
+#  elif defined( ULLONG_MAX ) && ULLONG_MAX > 4294967295u
+#    if ULLONG_MAX == 18446744073709551615ull
+#      define BRG_UI64
+#      define li_64(h) 0x##h##ull
+       typedef unsigned long long uint_64t;
+#    endif
+#  elif defined( ULONG_LONG_MAX ) && ULONG_LONG_MAX > 4294967295u
+#    if ULONG_LONG_MAX == 18446744073709551615ull
+#      define BRG_UI64
+#      define li_64(h) 0x##h##ull
+       typedef unsigned long long uint_64t;
+#    endif
+#  elif defined(__GNUC__)  /* DLW: avoid mingw problem with -ansi */
+#      define BRG_UI64
+#      define li_64(h) 0x##h##ull
+       typedef unsigned long long uint_64t;
+#  endif
+#endif
+
+#if defined( NEED_UINT_64T ) && !defined( BRG_UI64 )
+#  error Please define uint_64t as an unsigned 64 bit type in brg_types.h
+#endif
+
+#ifndef RETURN_VALUES
+#  define RETURN_VALUES
+#  if defined( DLL_EXPORT )
+#    if defined( _MSC_VER ) || defined ( __INTEL_COMPILER )
+#      define VOID_RETURN    __declspec( dllexport ) void __stdcall
+#      define INT_RETURN     __declspec( dllexport ) int  __stdcall
+#    elif defined( __GNUC__ )
+#      define VOID_RETURN    __declspec( __dllexport__ ) void
+#      define INT_RETURN     __declspec( __dllexport__ ) int
+#    else
+#      error Use of the DLL is only available on the Microsoft, Intel and GCC compilers
+#    endif
+#  elif defined( DLL_IMPORT )
+#    if defined( _MSC_VER ) || defined ( __INTEL_COMPILER )
+#      define VOID_RETURN    __declspec( dllimport ) void __stdcall
+#      define INT_RETURN     __declspec( dllimport ) int  __stdcall
+#    elif defined( __GNUC__ )
+#      define VOID_RETURN    __declspec( __dllimport__ ) void
+#      define INT_RETURN     __declspec( __dllimport__ ) int
+#    else
+#      error Use of the DLL is only available on the Microsoft, Intel and GCC compilers
+#    endif
+#  elif defined( __WATCOMC__ )
+#    define VOID_RETURN  void __cdecl
+#    define INT_RETURN   int  __cdecl
+#  else
+#    define VOID_RETURN  void
+#    define INT_RETURN   int
+#  endif
+#endif
+
+/*  These defines are used to declare buffers in a way that allows
+    faster operations on longer variables to be used.  In all these
+    defines 'size' must be a power of 2 and >= 8
+
+    dec_unit_type(size,x)       declares a variable 'x' of length 
+                                'size' bits
+
+    dec_bufr_type(size,bsize,x) declares a buffer 'x' of length 'bsize' 
+                                bytes defined as an array of variables
+                                each of 'size' bits (bsize must be a 
+                                multiple of size / 8)
+
+    ptr_cast(x,size)            casts a pointer to a pointer to a 
+                                varaiable of length 'size' bits
+*/
+
+#define ui_type(size)               uint_##size##t
+#define dec_unit_type(size,x)       typedef ui_type(size) x
+#define dec_bufr_type(size,bsize,x) typedef ui_type(size) x[bsize / (size >> 3)]
+#define ptr_cast(x,size)            ((ui_type(size)*)(x))
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
new file mode 100644
index 000000000000..cb613fa09d9e
--- /dev/null
+++ b/drivers/staging/skein/include/skein.h
@@ -0,0 +1,327 @@
+#ifndef _SKEIN_H_
+#define _SKEIN_H_     1
+/**************************************************************************
+**
+** Interface declarations and internal definitions for Skein hashing.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+***************************************************************************
+** 
+** The following compile-time switches may be defined to control some
+** tradeoffs between speed, code size, error checking, and security.
+**
+** The "default" note explains what happens when the switch is not defined.
+**
+**  SKEIN_DEBUG            -- make callouts from inside Skein code
+**                            to examine/display intermediate values.
+**                            [default: no callouts (no overhead)]
+**
+**  SKEIN_ERR_CHECK        -- how error checking is handled inside Skein
+**                            code. If not defined, most error checking 
+**                            is disabled (for performance). Otherwise, 
+**                            the switch value is interpreted as:
+**                                0: use assert()      to flag errors
+**                                1: return SKEIN_FAIL to flag errors
+**
+***************************************************************************/
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+#include <stddef.h>                          /* get size_t definition */
+#include <skein_port.h>               /* get platform-specific definitions */
+
+enum
+    {
+    SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
+    SKEIN_FAIL            =      1,
+    SKEIN_BAD_HASHLEN     =      2
+    };
+
+#define  SKEIN_MODIFIER_WORDS  ( 2)          /* number of modifier (tweak) words */
+
+#define  SKEIN_256_STATE_WORDS ( 4)
+#define  SKEIN_512_STATE_WORDS ( 8)
+#define  SKEIN1024_STATE_WORDS (16)
+#define  SKEIN_MAX_STATE_WORDS (16)
+
+#define  SKEIN_256_STATE_BYTES ( 8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BYTES ( 8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BYTES ( 8*SKEIN1024_STATE_WORDS)
+
+#define  SKEIN_256_STATE_BITS  (64*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BITS  (64*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BITS  (64*SKEIN1024_STATE_WORDS)
+
+#define  SKEIN_256_BLOCK_BYTES ( 8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_BLOCK_BYTES ( 8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_BLOCK_BYTES ( 8*SKEIN1024_STATE_WORDS)
+
+typedef struct
+    {
+    size_t  hashBitLen;                      /* size of hash result, in bits */
+    size_t  bCnt;                            /* current byte count in buffer b[] */
+    u64b_t  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+    } Skein_Ctxt_Hdr_t;
+
+typedef struct                               /*  256-bit Skein hash context structure */
+    {
+    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    u64b_t  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
+    u08b_t  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    } Skein_256_Ctxt_t;
+
+typedef struct                               /*  512-bit Skein hash context structure */
+    {
+    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    u64b_t  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
+    u08b_t  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    } Skein_512_Ctxt_t;
+
+typedef struct                               /* 1024-bit Skein hash context structure */
+    {
+    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    u64b_t  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
+    u08b_t  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    } Skein1024_Ctxt_t;
+
+/*   Skein APIs for (incremental) "straight hashing" */
+int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
+int  Skein_512_Init  (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
+int  Skein1024_Init  (Skein1024_Ctxt_t *ctx, size_t hashBitLen);
+
+int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
+int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
+int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
+
+int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+
+/*
+**   Skein APIs for "extended" initialization: MAC keys, tree hashing.
+**   After an InitExt() call, just use Update/Final calls as with Init().
+**
+**   Notes: Same parameters as _Init() calls, plus treeInfo/key/keyBytes.
+**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL, 
+**              the results of InitExt() are identical to calling Init().
+**          The function Init() may be called once to "precompute" the IV for
+**              a given hashBitLen value, then by saving a copy of the context
+**              the IV computation may be avoided in later calls.
+**          Similarly, the function InitExt() may be called once per MAC key 
+**              to precompute the MAC IV, then a copy of the context saved and
+**              reused for each new MAC computation.
+**/
+int  Skein_256_InitExt(Skein_256_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
+int  Skein_512_InitExt(Skein_512_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
+int  Skein1024_InitExt(Skein1024_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
+
+/*
+**   Skein APIs for MAC and tree hash:
+**      Final_Pad:  pad, do final block, but no OUTPUT type
+**      Output:     do just the output stage
+*/
+int  Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+
+#ifndef SKEIN_TREE_HASH
+#define SKEIN_TREE_HASH (1)
+#endif
+#if  SKEIN_TREE_HASH
+int  Skein_256_Output   (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_512_Output   (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+#endif
+
+/*****************************************************************
+** "Internal" Skein definitions
+**    -- not needed for sequential hashing API, but will be 
+**           helpful for other uses of Skein (e.g., tree hash mode).
+**    -- included here so that they can be shared between
+**           reference and optimized code.
+******************************************************************/
+
+/* tweak word T[1]: bit field starting positions */
+#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
+                                
+#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
+#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
+#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
+#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
+#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
+                                
+/* tweak word T[1]: flag bit definition(s) */
+#define SKEIN_T1_FLAG_FIRST     (((u64b_t)  1 ) << SKEIN_T1_POS_FIRST)
+#define SKEIN_T1_FLAG_FINAL     (((u64b_t)  1 ) << SKEIN_T1_POS_FINAL)
+#define SKEIN_T1_FLAG_BIT_PAD   (((u64b_t)  1 ) << SKEIN_T1_POS_BIT_PAD)
+                                
+/* tweak word T[1]: tree level bit field mask */
+#define SKEIN_T1_TREE_LVL_MASK  (((u64b_t)0x7F) << SKEIN_T1_POS_TREE_LVL)
+#define SKEIN_T1_TREE_LEVEL(n)  (((u64b_t) (n)) << SKEIN_T1_POS_TREE_LVL)
+
+/* tweak word T[1]: block type field */
+#define SKEIN_BLK_TYPE_KEY      ( 0)                    /* key, for MAC and KDF */
+#define SKEIN_BLK_TYPE_CFG      ( 4)                    /* configuration block */
+#define SKEIN_BLK_TYPE_PERS     ( 8)                    /* personalization string */
+#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
+#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
+#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
+#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
+#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
+#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
+
+#define SKEIN_T1_BLK_TYPE(T)   (((u64b_t) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
+#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
+#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
+#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
+#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
+#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
+#define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
+#define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
+#define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
+#define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */
+
+#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
+#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)
+
+#define SKEIN_VERSION           (1)
+
+#ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
+#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
+#endif
+
+#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64b_t) (hi32)) << 32))
+#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION,SKEIN_ID_STRING_LE)
+#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA,0xA9FC1A22)
+
+#define SKEIN_CFG_STR_LEN       (4*8)
+
+/* bit field definitions in config block treeInfo word */
+#define SKEIN_CFG_TREE_LEAF_SIZE_POS  ( 0)
+#define SKEIN_CFG_TREE_NODE_SIZE_POS  ( 8)
+#define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
+
+#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
+#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
+#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
+
+#define SKEIN_CFG_TREE_INFO(leaf,node,maxLvl)                   \
+    ( (((u64b_t)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+      (((u64b_t)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+      (((u64b_t)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )
+
+#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0,0,0) /* use as treeInfo in InitExt() call for sequential processing */
+
+/*
+**   Skein macros for getting/setting tweak words, etc.
+**   These are useful for partial input bytes, hash tree init/update, etc.
+**/
+#define Skein_Get_Tweak(ctxPtr,TWK_NUM)         ((ctxPtr)->h.T[TWK_NUM])
+#define Skein_Set_Tweak(ctxPtr,TWK_NUM,tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal);}
+
+#define Skein_Get_T0(ctxPtr)    Skein_Get_Tweak(ctxPtr,0)
+#define Skein_Get_T1(ctxPtr)    Skein_Get_Tweak(ctxPtr,1)
+#define Skein_Set_T0(ctxPtr,T0) Skein_Set_Tweak(ctxPtr,0,T0)
+#define Skein_Set_T1(ctxPtr,T1) Skein_Set_Tweak(ctxPtr,1,T1)
+
+/* set both tweak words at once */
+#define Skein_Set_T0_T1(ctxPtr,T0,T1)           \
+    {                                           \
+    Skein_Set_T0(ctxPtr,(T0));                  \
+    Skein_Set_T1(ctxPtr,(T1));                  \
+    }
+
+#define Skein_Set_Type(ctxPtr,BLK_TYPE)         \
+    Skein_Set_T1(ctxPtr,SKEIN_T1_BLK_TYPE_##BLK_TYPE)
+
+/* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
+#define Skein_Start_New_Type(ctxPtr,BLK_TYPE)   \
+    { Skein_Set_T0_T1(ctxPtr,0,SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt=0; }
+
+#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
+#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
+
+#define Skein_Set_Tree_Level(hdr,height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height);}
+
+/*****************************************************************
+** "Internal" Skein definitions for debugging and error checking
+******************************************************************/
+#ifdef  SKEIN_DEBUG             /* examine/display intermediate values? */
+#include "skein_debug.h"
+#else                           /* default is no callouts */
+#define Skein_Show_Block(bits,ctx,X,blkPtr,wPtr,ksEvenPtr,ksOddPtr)
+#define Skein_Show_Round(bits,ctx,r,X)
+#define Skein_Show_R_Ptr(bits,ctx,r,X_ptr)
+#define Skein_Show_Final(bits,ctx,cnt,outPtr)
+#define Skein_Show_Key(bits,ctx,key,keyBytes)
+#endif
+
+#ifndef SKEIN_ERR_CHECK        /* run-time checks (e.g., bad params, uninitialized context)? */
+#define Skein_Assert(x,retCode)/* default: ignore all Asserts, for performance */
+#define Skein_assert(x)
+#elif   defined(SKEIN_ASSERT)
+#include <assert.h>     
+#define Skein_Assert(x,retCode) assert(x) 
+#define Skein_assert(x)         assert(x) 
+#else
+#include <assert.h>     
+#define Skein_Assert(x,retCode) { if (!(x)) return retCode; } /*  caller  error */
+#define Skein_assert(x)         assert(x)                     /* internal error */
+#endif
+
+/*****************************************************************
+** Skein block function constants (shared across Ref and Opt code)
+******************************************************************/
+enum    
+    {   
+        /* Skein_256 round rotation constants */
+    R_256_0_0=14, R_256_0_1=16,
+    R_256_1_0=52, R_256_1_1=57,
+    R_256_2_0=23, R_256_2_1=40,
+    R_256_3_0= 5, R_256_3_1=37,
+    R_256_4_0=25, R_256_4_1=33,
+    R_256_5_0=46, R_256_5_1=12,
+    R_256_6_0=58, R_256_6_1=22,
+    R_256_7_0=32, R_256_7_1=32,
+
+        /* Skein_512 round rotation constants */
+    R_512_0_0=46, R_512_0_1=36, R_512_0_2=19, R_512_0_3=37,
+    R_512_1_0=33, R_512_1_1=27, R_512_1_2=14, R_512_1_3=42,
+    R_512_2_0=17, R_512_2_1=49, R_512_2_2=36, R_512_2_3=39,
+    R_512_3_0=44, R_512_3_1= 9, R_512_3_2=54, R_512_3_3=56,
+    R_512_4_0=39, R_512_4_1=30, R_512_4_2=34, R_512_4_3=24,
+    R_512_5_0=13, R_512_5_1=50, R_512_5_2=10, R_512_5_3=17,
+    R_512_6_0=25, R_512_6_1=29, R_512_6_2=39, R_512_6_3=43,
+    R_512_7_0= 8, R_512_7_1=35, R_512_7_2=56, R_512_7_3=22,
+
+        /* Skein1024 round rotation constants */
+    R1024_0_0=24, R1024_0_1=13, R1024_0_2= 8, R1024_0_3=47, R1024_0_4= 8, R1024_0_5=17, R1024_0_6=22, R1024_0_7=37,
+    R1024_1_0=38, R1024_1_1=19, R1024_1_2=10, R1024_1_3=55, R1024_1_4=49, R1024_1_5=18, R1024_1_6=23, R1024_1_7=52,
+    R1024_2_0=33, R1024_2_1= 4, R1024_2_2=51, R1024_2_3=13, R1024_2_4=34, R1024_2_5=41, R1024_2_6=59, R1024_2_7=17,
+    R1024_3_0= 5, R1024_3_1=20, R1024_3_2=48, R1024_3_3=41, R1024_3_4=47, R1024_3_5=28, R1024_3_6=16, R1024_3_7=25,
+    R1024_4_0=41, R1024_4_1= 9, R1024_4_2=37, R1024_4_3=31, R1024_4_4=12, R1024_4_5=47, R1024_4_6=44, R1024_4_7=30,
+    R1024_5_0=16, R1024_5_1=34, R1024_5_2=56, R1024_5_3=51, R1024_5_4= 4, R1024_5_5=53, R1024_5_6=42, R1024_5_7=41,
+    R1024_6_0=31, R1024_6_1=44, R1024_6_2=47, R1024_6_3=46, R1024_6_4=19, R1024_6_5=42, R1024_6_6=44, R1024_6_7=25,
+    R1024_7_0= 9, R1024_7_1=48, R1024_7_2=35, R1024_7_3=52, R1024_7_4=23, R1024_7_5=31, R1024_7_6=37, R1024_7_7=20
+    };
+
+#ifndef SKEIN_ROUNDS
+#define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
+#define SKEIN_512_ROUNDS_TOTAL (72)
+#define SKEIN1024_ROUNDS_TOTAL (80)
+#else                                        /* allow command-line define in range 8*(5..14)   */
+#define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
+#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/ 10) + 5) % 10) + 5))
+#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS    ) + 5) % 10) + 5))
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* ifndef _SKEIN_H_ */
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
new file mode 100755
index 000000000000..19c3225460fc
--- /dev/null
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -0,0 +1,239 @@
+/*
+Copyright (c) 2010 Werner Dittmann
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+
+*/
+
+#ifndef SKEINAPI_H
+#define SKEINAPI_H
+
+/**
+ * @file skeinApi.h
+ * @brief A Skein API and its functions.
+ * @{
+ *
+ * This API and the functions that implement this API simplify the usage
+ * of Skein. The design and the way to use the functions follow the openSSL
+ * design but at the same time take care of some Skein specific behaviour
+ * and possibilities.
+ * 
+ * The functions enable applications to create a normal Skein hashes and
+ * message authentication codes (MAC).
+ * 
+ * Using these functions is simple and straight forward:
+ * 
+ * @code
+ * 
+ * #include <skeinApi.h>
+ * 
+ * ...
+ * SkeinCtx_t ctx;             // a Skein hash or MAC context
+ * 
+ * // prepare context, here for a Skein with a state size of 512 bits.
+ * skeinCtxPrepare(&ctx, Skein512);
+ * 
+ * // Initialize the context to set the requested hash length in bits
+ * // here request a output hash size of 31 bits (Skein supports variable
+ * // output sizes even very strange sizes)
+ * skeinInit(&ctx, 31);
+ * 
+ * // Now update Skein with any number of message bits. A function that
+ * // takes a number of bytes is also available.
+ * skeinUpdateBits(&ctx, message, msgLength);
+ * 
+ * // Now get the result of the Skein hash. The output buffer must be
+ * // large enough to hold the request number of output bits. The application
+ * // may now extract the bits.
+ * skeinFinal(&ctx, result);
+ * ...
+ * @endcode
+ * 
+ * An application may use @c skeinReset to reset a Skein context and use
+ * it for creation of another hash with the same Skein state size and output
+ * bit length. In this case the API implementation restores some internal
+ * internal state data and saves a full Skein initialization round.
+ * 
+ * To create a MAC the application just uses @c skeinMacInit instead of 
+ * @c skeinInit. All other functions calls remain the same.
+ * 
+ */
+
+#include <skein.h>
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+    /**
+     * Which Skein size to use
+     */
+    typedef enum SkeinSize {
+        Skein256 = 256,     /*!< Skein with 256 bit state */
+        Skein512 = 512,     /*!< Skein with 512 bit state */
+        Skein1024 = 1024    /*!< Skein with 1024 bit state */
+    } SkeinSize_t;
+
+    /**
+     * Context for Skein.
+     *
+     * This structure was setup with some know-how of the internal
+     * Skein structures, in particular ordering of header and size dependent
+     * variables. If Skein implementation changes this, then adapt these
+     * structures as well.
+     */
+    typedef struct SkeinCtx {
+        u64b_t skeinSize;
+        u64b_t  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
+        union {
+            Skein_Ctxt_Hdr_t h;
+            Skein_256_Ctxt_t s256;
+            Skein_512_Ctxt_t s512;
+            Skein1024_Ctxt_t s1024;
+        } m;
+    } SkeinCtx_t;
+
+    /**
+     * Prepare a Skein context.
+     * 
+     * An application must call this function before it can use the Skein
+     * context. The functions clears memory and initializes size dependent
+     * variables.
+     *
+     * @param ctx
+     *     Pointer to a Skein context.
+     * @param size
+     *     Which Skein size to use.
+     * @return
+     *     SKEIN_SUCESS of SKEIN_FAIL
+     */
+    int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size);
+
+    /**
+     * Initialize a Skein context.
+     *
+     * Initializes the context with this data and saves the resulting Skein 
+     * state variables for further use.
+     *
+     * @param ctx
+     *     Pointer to a Skein context.
+     * @param hashBitLen
+     *     Number of MAC hash bits to compute
+     * @return
+     *     SKEIN_SUCESS of SKEIN_FAIL
+     * @see skeinReset
+     */
+    int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen);
+
+    /**
+     * Resets a Skein context for further use.
+     * 
+     * Restores the saved chaining variables to reset the Skein context. 
+     * Thus applications can reuse the same setup to  process several 
+     * messages. This saves a complete Skein initialization cycle.
+     * 
+     * @param ctx
+     *     Pointer to a pre-initialized Skein MAC context
+     */
+    void skeinReset(SkeinCtx_t* ctx);
+    
+    /**
+     * Initializes a Skein context for MAC usage.
+     * 
+     * Initializes the context with this data and saves the resulting Skein 
+     * state variables for further use.
+     *
+     * Applications call the normal Skein functions to update the MAC and
+     * get the final result.
+     *
+     * @param ctx
+     *     Pointer to an empty or preinitialized Skein MAC context
+     * @param key
+     *     Pointer to key bytes or NULL
+     * @param keyLen
+     *     Length of the key in bytes or zero
+     * @param hashBitLen
+     *     Number of MAC hash bits to compute
+     * @return
+     *     SKEIN_SUCESS of SKEIN_FAIL
+     */
+    int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
+                     size_t hashBitLen);
+
+    /**
+     * Update Skein with the next part of the message.
+     *
+     * @param ctx
+     *     Pointer to initialized Skein context
+     * @param msg
+     *     Pointer to the message.
+     * @param msgByteCnt
+     *     Length of the message in @b bytes
+     * @return
+     *     Success or error code.
+     */
+    int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
+                    size_t msgByteCnt);
+
+    /**
+     * Update the hash with a message bit string.
+     *
+     * Skein can handle data not only as bytes but also as bit strings of
+     * arbitrary length (up to its maximum design size).
+     *
+     * @param ctx
+     *     Pointer to initialized Skein context
+     * @param msg
+     *     Pointer to the message.
+     * @param msgBitCnt
+     *     Length of the message in @b bits.
+     */
+    int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
+                        size_t msgBitCnt);
+
+    /**
+     * Finalize Skein and return the hash.
+     * 
+     * Before an application can reuse a Skein setup the application must
+     * reset the Skein context.
+     *
+     * @param ctx
+     *     Pointer to initialized Skein context
+     * @param hash
+     *     Pointer to buffer that receives the hash. The buffer must be large
+     *     enough to store @c hashBitLen bits.
+     * @return
+     *     Success or error code.
+     * @see skeinReset
+     */
+    int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash);
+
+#ifdef __cplusplus
+}
+#endif
+
+/**
+ * @}
+ */
+#endif
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
new file mode 100644
index 000000000000..555ea619500b
--- /dev/null
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -0,0 +1,199 @@
+#ifndef _SKEIN_IV_H_
+#define _SKEIN_IV_H_
+
+#include <skein.h>    /* get Skein macros and types */
+
+/*
+***************** Pre-computed Skein IVs *******************
+**
+** NOTE: these values are not "magic" constants, but
+** are generated using the Threefish block function.
+** They are pre-computed here only for speed; i.e., to
+** avoid the need for a Threefish call during Init().
+**
+** The IV for any fixed hash length may be pre-computed.
+** Only the most common values are included here.
+**
+************************************************************
+**/
+
+#define MK_64 SKEIN_MK_64
+
+/* blkSize =  256 bits. hashSize =  128 bits */
+const u64b_t SKEIN_256_IV_128[] =
+    {
+    MK_64(0xE1111906,0x964D7260),
+    MK_64(0x883DAAA7,0x7C8D811C),
+    MK_64(0x10080DF4,0x91960F7A),
+    MK_64(0xCCF7DDE5,0xB45BC1C2)
+    };
+
+/* blkSize =  256 bits. hashSize =  160 bits */
+const u64b_t SKEIN_256_IV_160[] =
+    {
+    MK_64(0x14202314,0x72825E98),
+    MK_64(0x2AC4E9A2,0x5A77E590),
+    MK_64(0xD47A5856,0x8838D63E),
+    MK_64(0x2DD2E496,0x8586AB7D)
+    };
+
+/* blkSize =  256 bits. hashSize =  224 bits */
+const u64b_t SKEIN_256_IV_224[] =
+    {
+    MK_64(0xC6098A8C,0x9AE5EA0B),
+    MK_64(0x876D5686,0x08C5191C),
+    MK_64(0x99CB88D7,0xD7F53884),
+    MK_64(0x384BDDB1,0xAEDDB5DE)
+    };
+
+/* blkSize =  256 bits. hashSize =  256 bits */
+const u64b_t SKEIN_256_IV_256[] =
+    {
+    MK_64(0xFC9DA860,0xD048B449),
+    MK_64(0x2FCA6647,0x9FA7D833),
+    MK_64(0xB33BC389,0x6656840F),
+    MK_64(0x6A54E920,0xFDE8DA69)
+    };
+
+/* blkSize =  512 bits. hashSize =  128 bits */
+const u64b_t SKEIN_512_IV_128[] =
+    {
+    MK_64(0xA8BC7BF3,0x6FBF9F52),
+    MK_64(0x1E9872CE,0xBD1AF0AA),
+    MK_64(0x309B1790,0xB32190D3),
+    MK_64(0xBCFBB854,0x3F94805C),
+    MK_64(0x0DA61BCD,0x6E31B11B),
+    MK_64(0x1A18EBEA,0xD46A32E3),
+    MK_64(0xA2CC5B18,0xCE84AA82),
+    MK_64(0x6982AB28,0x9D46982D)
+    };
+
+/* blkSize =  512 bits. hashSize =  160 bits */
+const u64b_t SKEIN_512_IV_160[] =
+    {
+    MK_64(0x28B81A2A,0xE013BD91),
+    MK_64(0xC2F11668,0xB5BDF78F),
+    MK_64(0x1760D8F3,0xF6A56F12),
+    MK_64(0x4FB74758,0x8239904F),
+    MK_64(0x21EDE07F,0x7EAF5056),
+    MK_64(0xD908922E,0x63ED70B8),
+    MK_64(0xB8EC76FF,0xECCB52FA),
+    MK_64(0x01A47BB8,0xA3F27A6E)
+    };
+
+/* blkSize =  512 bits. hashSize =  224 bits */
+const u64b_t SKEIN_512_IV_224[] =
+    {
+    MK_64(0xCCD06162,0x48677224),
+    MK_64(0xCBA65CF3,0xA92339EF),
+    MK_64(0x8CCD69D6,0x52FF4B64),
+    MK_64(0x398AED7B,0x3AB890B4),
+    MK_64(0x0F59D1B1,0x457D2BD0),
+    MK_64(0x6776FE65,0x75D4EB3D),
+    MK_64(0x99FBC70E,0x997413E9),
+    MK_64(0x9E2CFCCF,0xE1C41EF7)
+    };
+
+/* blkSize =  512 bits. hashSize =  256 bits */
+const u64b_t SKEIN_512_IV_256[] =
+    {
+    MK_64(0xCCD044A1,0x2FDB3E13),
+    MK_64(0xE8359030,0x1A79A9EB),
+    MK_64(0x55AEA061,0x4F816E6F),
+    MK_64(0x2A2767A4,0xAE9B94DB),
+    MK_64(0xEC06025E,0x74DD7683),
+    MK_64(0xE7A436CD,0xC4746251),
+    MK_64(0xC36FBAF9,0x393AD185),
+    MK_64(0x3EEDBA18,0x33EDFC13)
+    };
+
+/* blkSize =  512 bits. hashSize =  384 bits */
+const u64b_t SKEIN_512_IV_384[] =
+    {
+    MK_64(0xA3F6C6BF,0x3A75EF5F),
+    MK_64(0xB0FEF9CC,0xFD84FAA4),
+    MK_64(0x9D77DD66,0x3D770CFE),
+    MK_64(0xD798CBF3,0xB468FDDA),
+    MK_64(0x1BC4A666,0x8A0E4465),
+    MK_64(0x7ED7D434,0xE5807407),
+    MK_64(0x548FC1AC,0xD4EC44D6),
+    MK_64(0x266E1754,0x6AA18FF8)
+    };
+
+/* blkSize =  512 bits. hashSize =  512 bits */
+const u64b_t SKEIN_512_IV_512[] =
+    {
+    MK_64(0x4903ADFF,0x749C51CE),
+    MK_64(0x0D95DE39,0x9746DF03),
+    MK_64(0x8FD19341,0x27C79BCE),
+    MK_64(0x9A255629,0xFF352CB1),
+    MK_64(0x5DB62599,0xDF6CA7B0),
+    MK_64(0xEABE394C,0xA9D5C3F4),
+    MK_64(0x991112C7,0x1A75B523),
+    MK_64(0xAE18A40B,0x660FCC33)
+    };
+
+/* blkSize = 1024 bits. hashSize =  384 bits */
+const u64b_t SKEIN1024_IV_384[] =
+    {
+    MK_64(0x5102B6B8,0xC1894A35),
+    MK_64(0xFEEBC9E3,0xFE8AF11A),
+    MK_64(0x0C807F06,0xE32BED71),
+    MK_64(0x60C13A52,0xB41A91F6),
+    MK_64(0x9716D35D,0xD4917C38),
+    MK_64(0xE780DF12,0x6FD31D3A),
+    MK_64(0x797846B6,0xC898303A),
+    MK_64(0xB172C2A8,0xB3572A3B),
+    MK_64(0xC9BC8203,0xA6104A6C),
+    MK_64(0x65909338,0xD75624F4),
+    MK_64(0x94BCC568,0x4B3F81A0),
+    MK_64(0x3EBBF51E,0x10ECFD46),
+    MK_64(0x2DF50F0B,0xEEB08542),
+    MK_64(0x3B5A6530,0x0DBC6516),
+    MK_64(0x484B9CD2,0x167BBCE1),
+    MK_64(0x2D136947,0xD4CBAFEA)
+    };
+
+/* blkSize = 1024 bits. hashSize =  512 bits */
+const u64b_t SKEIN1024_IV_512[] =
+    {
+    MK_64(0xCAEC0E5D,0x7C1B1B18),
+    MK_64(0xA01B0E04,0x5F03E802),
+    MK_64(0x33840451,0xED912885),
+    MK_64(0x374AFB04,0xEAEC2E1C),
+    MK_64(0xDF25A0E2,0x813581F7),
+    MK_64(0xE4004093,0x8B12F9D2),
+    MK_64(0xA662D539,0xC2ED39B6),
+    MK_64(0xFA8B85CF,0x45D8C75A),
+    MK_64(0x8316ED8E,0x29EDE796),
+    MK_64(0x053289C0,0x2E9F91B8),
+    MK_64(0xC3F8EF1D,0x6D518B73),
+    MK_64(0xBDCEC3C4,0xD5EF332E),
+    MK_64(0x549A7E52,0x22974487),
+    MK_64(0x67070872,0x5B749816),
+    MK_64(0xB9CD28FB,0xF0581BD1),
+    MK_64(0x0E2940B8,0x15804974)
+    };
+
+/* blkSize = 1024 bits. hashSize = 1024 bits */
+const u64b_t SKEIN1024_IV_1024[] =
+    {
+    MK_64(0xD593DA07,0x41E72355),
+    MK_64(0x15B5E511,0xAC73E00C),
+    MK_64(0x5180E5AE,0xBAF2C4F0),
+    MK_64(0x03BD41D3,0xFCBCAFAF),
+    MK_64(0x1CAEC6FD,0x1983A898),
+    MK_64(0x6E510B8B,0xCDD0589F),
+    MK_64(0x77E2BDFD,0xC6394ADA),
+    MK_64(0xC11E1DB5,0x24DCB0A3),
+    MK_64(0xD6D14AF9,0xC6329AB5),
+    MK_64(0x6A9B0BFC,0x6EB67E0D),
+    MK_64(0x9243C60D,0xCCFF1332),
+    MK_64(0x1A1F1DDE,0x743F02D4),
+    MK_64(0x0996753C,0x10ED0BB8),
+    MK_64(0x6572DD22,0xF2B4969A),
+    MK_64(0x61FD3062,0xD00A579A),
+    MK_64(0x1DE0536E,0x8682E539)
+    };
+
+#endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
new file mode 100644
index 000000000000..18d892553c8d
--- /dev/null
+++ b/drivers/staging/skein/include/skein_port.h
@@ -0,0 +1,124 @@
+#ifndef _SKEIN_PORT_H_
+#define _SKEIN_PORT_H_
+/*******************************************************************
+**
+** Platform-specific definitions for Skein hash function.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+** Many thanks to Brian Gladman for his portable header files.
+**
+** To port Skein to an "unsupported" platform, change the definitions
+** in this file appropriately.
+** 
+********************************************************************/
+
+#include <brg_types.h>                      /* get integer type definitions */
+
+typedef unsigned int    uint_t;             /* native unsigned integer */
+typedef uint_8t         u08b_t;             /*  8-bit unsigned integer */
+typedef uint_64t        u64b_t;             /* 64-bit unsigned integer */
+
+#ifndef RotL_64
+#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
+#endif
+
+/*
+ * Skein is "natively" little-endian (unlike SHA-xxx), for optimal
+ * performance on x86 CPUs.  The Skein code requires the following
+ * definitions for dealing with endianness:
+ *
+ *    SKEIN_NEED_SWAP:  0 for little-endian, 1 for big-endian
+ *    Skein_Put64_LSB_First
+ *    Skein_Get64_LSB_First
+ *    Skein_Swap64
+ *
+ * If SKEIN_NEED_SWAP is defined at compile time, it is used here
+ * along with the portable versions of Put64/Get64/Swap64, which 
+ * are slow in general.
+ *
+ * Otherwise, an "auto-detect" of endianness is attempted below.
+ * If the default handling doesn't work well, the user may insert
+ * platform-specific code instead (e.g., for big-endian CPUs).
+ *
+ */
+#ifndef SKEIN_NEED_SWAP /* compile-time "override" for endianness? */
+
+#include <brg_endian.h>              /* get endianness selection */
+#if   PLATFORM_BYTE_ORDER == IS_BIG_ENDIAN
+    /* here for big-endian CPUs */
+#define SKEIN_NEED_SWAP   (1)
+#elif PLATFORM_BYTE_ORDER == IS_LITTLE_ENDIAN
+    /* here for x86 and x86-64 CPUs (and other detected little-endian CPUs) */
+#define SKEIN_NEED_SWAP   (0)
+#if   PLATFORM_MUST_ALIGN == 0              /* ok to use "fast" versions? */
+#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
+#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
+#endif
+#else
+#error "Skein needs endianness setting!"
+#endif
+
+#endif /* ifndef SKEIN_NEED_SWAP */
+
+/*
+ ******************************************************************
+ *      Provide any definitions still needed.
+ ******************************************************************
+ */
+#ifndef Skein_Swap64  /* swap for big-endian, nop for little-endian */
+#if     SKEIN_NEED_SWAP
+#define Skein_Swap64(w64)                       \
+  ( (( ((u64b_t)(w64))       & 0xFF) << 56) |   \
+    (((((u64b_t)(w64)) >> 8) & 0xFF) << 48) |   \
+    (((((u64b_t)(w64)) >>16) & 0xFF) << 40) |   \
+    (((((u64b_t)(w64)) >>24) & 0xFF) << 32) |   \
+    (((((u64b_t)(w64)) >>32) & 0xFF) << 24) |   \
+    (((((u64b_t)(w64)) >>40) & 0xFF) << 16) |   \
+    (((((u64b_t)(w64)) >>48) & 0xFF) <<  8) |   \
+    (((((u64b_t)(w64)) >>56) & 0xFF)      ) )
+#else
+#define Skein_Swap64(w64)  (w64)
+#endif
+#endif  /* ifndef Skein_Swap64 */
+
+
+#ifndef Skein_Put64_LSB_First
+void    Skein_Put64_LSB_First(u08b_t *dst,const u64b_t *src,size_t bCnt)
+#ifdef  SKEIN_PORT_CODE /* instantiate the function code here? */
+    { /* this version is fully portable (big-endian or little-endian), but slow */
+    size_t n;
+
+    for (n=0;n<bCnt;n++)
+        dst[n] = (u08b_t) (src[n>>3] >> (8*(n&7)));
+    }
+#else
+    ;    /* output only the function prototype */
+#endif
+#endif   /* ifndef Skein_Put64_LSB_First */
+
+
+#ifndef Skein_Get64_LSB_First
+void    Skein_Get64_LSB_First(u64b_t *dst,const u08b_t *src,size_t wCnt)
+#ifdef  SKEIN_PORT_CODE /* instantiate the function code here? */
+    { /* this version is fully portable (big-endian or little-endian), but slow */
+    size_t n;
+
+    for (n=0;n<8*wCnt;n+=8)
+        dst[n/8] = (((u64b_t) src[n  ])      ) +
+                   (((u64b_t) src[n+1]) <<  8) +
+                   (((u64b_t) src[n+2]) << 16) +
+                   (((u64b_t) src[n+3]) << 24) +
+                   (((u64b_t) src[n+4]) << 32) +
+                   (((u64b_t) src[n+5]) << 40) +
+                   (((u64b_t) src[n+6]) << 48) +
+                   (((u64b_t) src[n+7]) << 56) ;
+    }
+#else
+    ;    /* output only the function prototype */
+#endif
+#endif   /* ifndef Skein_Get64_LSB_First */
+
+#endif   /* ifndef _SKEIN_PORT_H_ */
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
new file mode 100644
index 000000000000..85afd72fe987
--- /dev/null
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -0,0 +1,167 @@
+
+#ifndef THREEFISHAPI_H
+#define THREEFISHAPI_H
+
+/**
+ * @file threefishApi.h
+ * @brief A Threefish cipher API and its functions.
+ * @{
+ *
+ * This API and the functions that implement this API simplify the usage
+ * of the Threefish cipher. The design and the way to use the functions 
+ * follow the openSSL design but at the same time take care of some Threefish
+ * specific behaviour and possibilities.
+ *
+ * These are the low level functions that deal with Threefisch blocks only.
+ * Implementations for cipher modes such as ECB, CFB, or CBC may use these 
+ * functions.
+ * 
+@code
+    // Threefish cipher context data
+    ThreefishKey_t keyCtx;
+
+    // Initialize the context
+    threefishSetKey(&keyCtx, Threefish512, key, tweak);
+
+    // Encrypt
+    threefishEncryptBlockBytes(&keyCtx, input, cipher);
+@endcode
+ */
+
+#include <skein.h>
+#include <stdint.h>
+
+#define KeyScheduleConst 0x1BD11BDAA9FC1A22L
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+    /**
+     * Which Threefish size to use
+     */
+    typedef enum ThreefishSize {
+        Threefish256 = 256,     /*!< Skein with 256 bit state */
+        Threefish512 = 512,     /*!< Skein with 512 bit state */
+        Threefish1024 = 1024    /*!< Skein with 1024 bit state */
+    } ThreefishSize_t;
+    
+    /**
+     * Context for Threefish key and tweak words.
+     * 
+     * This structure was setup with some know-how of the internal
+     * Skein structures, in particular ordering of header and size dependent
+     * variables. If Skein implementation changes this, the adapt these
+     * structures as well.
+     */
+    typedef struct ThreefishKey {
+        u64b_t stateSize;
+        u64b_t key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
+        u64b_t tweak[3];
+    } ThreefishKey_t;
+
+    /**
+     * Set Threefish key and tweak data.
+     * 
+     * This function sets the key and tweak data for the Threefish cipher of
+     * the given size. The key data must have the same length (number of bits)
+     * as the state size 
+     *
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param size
+     *     Which Skein size to use.
+     * @param keyData
+     *     Pointer to the key words (word has 64 bits).
+     * @param tweak
+     *     Pointer to the two tweak words (word has 64 bits).
+     */
+    void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize, uint64_t* keyData, uint64_t* tweak);
+    
+    /**
+     * Encrypt Threefisch block (bytes).
+     * 
+     * The buffer must have at least the same length (number of bits) aas the 
+     * state size for this key. The function uses the first @c stateSize bits
+     * of the input buffer, encrypts them and stores the result in the output
+     * buffer.
+     * 
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param in
+     *     Poionter to plaintext data buffer.
+     * @param out
+     *     Pointer to cipher buffer.
+     */
+    void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in, uint8_t* out);
+    
+    /**
+     * Encrypt Threefisch block (words).
+     * 
+     * The buffer must have at least the same length (number of bits) aas the 
+     * state size for this key. The function uses the first @c stateSize bits
+     * of the input buffer, encrypts them and stores the result in the output
+     * buffer.
+     * 
+     * The wordsize ist set to 64 bits.
+     * 
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param in
+     *     Poionter to plaintext data buffer.
+     * @param out
+     *     Pointer to cipher buffer.
+     */
+    void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in, uint64_t* out);
+
+    /**
+     * Decrypt Threefisch block (bytes).
+     * 
+     * The buffer must have at least the same length (number of bits) aas the 
+     * state size for this key. The function uses the first @c stateSize bits
+     * of the input buffer, decrypts them and stores the result in the output
+     * buffer
+     * 
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param in
+     *     Poionter to cipher data buffer.
+     * @param out
+     *     Pointer to plaintext buffer.
+     */
+    void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in, uint8_t* out);
+
+    /**
+     * Decrypt Threefisch block (words).
+     * 
+     * The buffer must have at least the same length (number of bits) aas the 
+     * state size for this key. The function uses the first @c stateSize bits
+     * of the input buffer, encrypts them and stores the result in the output
+     * buffer.
+     * 
+     * The wordsize ist set to 64 bits.
+     * 
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param in
+     *     Poionter to cipher data buffer.
+     * @param out
+     *     Pointer to plaintext buffer.
+     */
+    void threefishDecryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in, uint64_t* out);
+
+    void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+#ifdef __cplusplus
+}
+#endif
+
+/**
+ * @}
+ */
+#endif
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
new file mode 100644
index 000000000000..f0b176ac1dc7
--- /dev/null
+++ b/drivers/staging/skein/skein.c
@@ -0,0 +1,742 @@
+/***********************************************************************
+**
+** Implementation of the Skein hash function.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+************************************************************************/
+
+#define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
+
+#include <string.h>       /* get the memcpy/memset functions */
+#include <skein.h> /* get the Skein API definitions   */
+#include <skein_iv.h>    /* get precomputed IVs */
+
+/*****************************************************************/
+/* External function to process blkCnt (nonzero) full block(s) of data. */
+void    Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
+
+/*****************************************************************/
+/*     256-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u08b_t  b[SKEIN_256_STATE_BYTES];
+        u64b_t  w[SKEIN_256_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {             /* use pre-computed values, where available */
+    case  256:
+        memcpy(ctx->X,SKEIN_256_IV_256,sizeof(ctx->X));
+        break;
+    case  224:
+        memcpy(ctx->X,SKEIN_256_IV_224,sizeof(ctx->X));
+        break;
+    case  160:
+        memcpy(ctx->X,SKEIN_256_IV_160,sizeof(ctx->X));
+        break;
+    case  128:
+        memcpy(ctx->X,SKEIN_256_IV_128,sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_256_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        break;
+    }
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+{
+    union
+    {
+        u08b_t  b[SKEIN_256_STATE_BYTES];
+        u64b_t  w[SKEIN_256_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_256_Update(ctx,key,keyBytes);     /* hash the key */
+        Skein_256_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+#if SKEIN_NEED_SWAP
+        {
+            uint_t i;
+            for (i=0;i<SKEIN_256_STATE_WORDS;i++)   /* convert key bytes to context words */
+                ctx->X[i] = Skein_Swap64(ctx->X[i]);
+        }
+#endif
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx,CFG_FINAL);
+
+    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(256,&ctx->h,key,keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein_256_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
+            Skein_256_Process_Block(ctx,ctx->b,1,SKEIN_256_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein_256_Process_Block(ctx,msg,n,SKEIN_256_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
+            msg        += n * SKEIN_256_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein_256_Final(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN_256_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein_256_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_256_BLOCK_BYTES)
+            n  = SKEIN_256_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*****************************************************************/
+/*     512-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u08b_t  b[SKEIN_512_STATE_BYTES];
+        u64b_t  w[SKEIN_512_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {             /* use pre-computed values, where available */
+    case  512:
+        memcpy(ctx->X,SKEIN_512_IV_512,sizeof(ctx->X));
+        break;
+    case  384:
+        memcpy(ctx->X,SKEIN_512_IV_384,sizeof(ctx->X));
+        break;
+    case  256:
+        memcpy(ctx->X,SKEIN_512_IV_256,sizeof(ctx->X));
+        break;
+    case  224:
+        memcpy(ctx->X,SKEIN_512_IV_224,sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_512_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        break;
+    }
+
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+{
+    union
+    {
+        u08b_t  b[SKEIN_512_STATE_BYTES];
+        u64b_t  w[SKEIN_512_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_512_Update(ctx,key,keyBytes);     /* hash the key */
+        Skein_512_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+#if SKEIN_NEED_SWAP
+        {
+            uint_t i;
+            for (i=0;i<SKEIN_512_STATE_WORDS;i++)   /* convert key bytes to context words */
+                ctx->X[i] = Skein_Swap64(ctx->X[i]);
+        }
+#endif
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx,CFG_FINAL);
+
+    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(512,&ctx->h,key,keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein_512_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
+            Skein_512_Process_Block(ctx,ctx->b,1,SKEIN_512_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein_512_Process_Block(ctx,msg,n,SKEIN_512_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
+            msg        += n * SKEIN_512_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein_512_Final(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN_512_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein_512_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_512_BLOCK_BYTES)
+            n  = SKEIN_512_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(512,&ctx->h,n,hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*****************************************************************/
+/*    1024-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u08b_t  b[SKEIN1024_STATE_BYTES];
+        u64b_t  w[SKEIN1024_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {              /* use pre-computed values, where available */
+    case  512:
+        memcpy(ctx->X,SKEIN1024_IV_512 ,sizeof(ctx->X));
+        break;
+    case  384:
+        memcpy(ctx->X,SKEIN1024_IV_384 ,sizeof(ctx->X));
+        break;
+    case 1024:
+        memcpy(ctx->X,SKEIN1024_IV_1024,sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
+        Skein1024_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        break;
+    }
+
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+{
+    union
+    {
+        u08b_t  b[SKEIN1024_STATE_BYTES];
+        u64b_t  w[SKEIN1024_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein1024_Update(ctx,key,keyBytes);     /* hash the key */
+        Skein1024_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+#if SKEIN_NEED_SWAP
+        {
+            uint_t i;
+            for (i=0;i<SKEIN1024_STATE_WORDS;i++)   /* convert key bytes to context words */
+                ctx->X[i] = Skein_Swap64(ctx->X[i]);
+        }
+#endif
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx,CFG_FINAL);
+
+    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(1024,&ctx->h,key,keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein1024_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
+            Skein1024_Process_Block(ctx,ctx->b,1,SKEIN1024_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein1024_Process_Block(ctx,msg,n,SKEIN1024_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
+            msg        += n * SKEIN1024_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN1024_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN1024_BLOCK_BYTES)
+            n  = SKEIN1024_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(1024,&ctx->h,n,hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/**************** Functions to support MAC/tree hashing ***************/
+/*   (this code is identical for Optimized and Reference versions)    */
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_256_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_512_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+#if SKEIN_TREE_HASH
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein_256_Output(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN_256_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_256_BLOCK_BYTES)
+            n  = SKEIN_256_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein_512_Output(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN_512_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_512_BLOCK_BYTES)
+            n  = SKEIN_512_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein1024_Output(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN1024_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN1024_BLOCK_BYTES)
+            n  = SKEIN1024_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+#endif
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
new file mode 100755
index 000000000000..7b963758d32c
--- /dev/null
+++ b/drivers/staging/skein/skeinApi.c
@@ -0,0 +1,221 @@
+/*
+Copyright (c) 2010 Werner Dittmann
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+
+*/
+
+#define SKEIN_ERR_CHECK 1
+#include <skeinApi.h>
+#include <string.h>
+#include <stdio.h>
+
+int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size)
+{
+    Skein_Assert(ctx && size, SKEIN_FAIL);
+
+    memset(ctx ,0, sizeof(SkeinCtx_t));
+    ctx->skeinSize = size;
+
+    return SKEIN_SUCCESS;
+}
+
+int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen)
+{
+    int ret = SKEIN_FAIL;
+    size_t Xlen = 0;
+    u64b_t*  X = NULL;
+    uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+
+    Skein_Assert(ctx, SKEIN_FAIL);
+    /*
+     * The following two lines rely of the fact that the real Skein contexts are
+     * a union in out context and thus have tha maximum memory available.
+     * The beauty of C :-) .
+     */
+    X = ctx->m.s256.X;
+    Xlen = ctx->skeinSize/8;
+    /*
+     * If size is the same and hash bit length is zero then reuse
+     * the save chaining variables.
+     */
+    switch (ctx->skeinSize) {
+    case Skein256:
+        ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
+                                treeInfo, NULL, 0);
+        break;
+    case Skein512:
+        ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
+                                treeInfo, NULL, 0);
+        break;
+    case Skein1024:
+        ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
+                                treeInfo, NULL, 0);
+        break;
+    }
+
+    if (ret == SKEIN_SUCCESS) {
+        /* Save chaining variables for this combination of size and hashBitLen */
+        memcpy(ctx->XSave, X, Xlen);
+    }
+    return ret;
+}
+
+int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
+                 size_t hashBitLen)
+{
+    int ret = SKEIN_FAIL;
+    u64b_t*  X = NULL;
+    size_t Xlen = 0;
+    uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+
+    Skein_Assert(ctx, SKEIN_FAIL);
+
+    X = ctx->m.s256.X;
+    Xlen = ctx->skeinSize/8;
+
+    Skein_Assert(hashBitLen, SKEIN_BAD_HASHLEN);
+
+    switch (ctx->skeinSize) {
+    case Skein256:
+        ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
+                                treeInfo,
+                                (const u08b_t*)key, keyLen);
+
+        break;
+    case Skein512:
+        ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
+                                treeInfo,
+                                (const u08b_t*)key, keyLen);
+        break;
+    case Skein1024:
+        ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
+                                treeInfo,
+                                (const u08b_t*)key, keyLen);
+
+        break;
+    }
+    if (ret == SKEIN_SUCCESS) {
+        /* Save chaining variables for this combination of key, keyLen, hashBitLen */
+        memcpy(ctx->XSave, X, Xlen);
+    }
+    return ret;
+}
+
+void skeinReset(SkeinCtx_t* ctx)
+{
+    size_t Xlen = 0;
+    u64b_t*  X = NULL;
+
+    /*
+     * The following two lines rely of the fact that the real Skein contexts are
+     * a union in out context and thus have tha maximum memory available.
+     * The beautiy of C :-) .
+     */
+    X = ctx->m.s256.X;
+    Xlen = ctx->skeinSize/8;
+    /* Restore the chaing variable, reset byte counter */
+    memcpy(X, ctx->XSave, Xlen);
+
+    /* Setup context to process the message */
+    Skein_Start_New_Type(&ctx->m, MSG);
+}
+
+int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
+                size_t msgByteCnt)
+{
+    int ret = SKEIN_FAIL;
+    Skein_Assert(ctx, SKEIN_FAIL);
+
+    switch (ctx->skeinSize) {
+    case Skein256:
+        ret = Skein_256_Update(&ctx->m.s256, (const u08b_t*)msg, msgByteCnt);
+        break;
+    case Skein512:
+        ret = Skein_512_Update(&ctx->m.s512, (const u08b_t*)msg, msgByteCnt);
+        break;
+    case Skein1024:
+        ret = Skein1024_Update(&ctx->m.s1024, (const u08b_t*)msg, msgByteCnt);
+        break;
+    }
+    return ret;
+
+}
+
+int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
+                    size_t msgBitCnt)
+{
+    /*
+     * I've used the bit pad implementation from skein_test.c (see NIST CD)
+     * and modified it to use the convenience functions and added some pointer
+     * arithmetic.
+     */
+    size_t length;
+    uint8_t mask;
+    uint8_t* up;
+
+    /* only the final Update() call is allowed do partial bytes, else assert an error */
+    Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
+
+    /* if number of bits is a multiple of bytes - that's easy */
+    if ((msgBitCnt & 0x7) == 0) {
+        return skeinUpdate(ctx, msg, msgBitCnt >> 3);
+    }
+    skeinUpdate(ctx, msg, (msgBitCnt >> 3) + 1);
+
+    /*
+     * The next line rely on the fact that the real Skein contexts
+     * are a union in our context. After the addition the pointer points to
+     * Skein's real partial block buffer.
+     * If this layout ever changes we have to adapt this as well.
+     */
+    up = (uint8_t*)ctx->m.s256.X + ctx->skeinSize / 8;
+
+    Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
+
+    /* now "pad" the final partial byte the way NIST likes */
+    length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
+    Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
+    mask = (uint8_t) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
+    up[length-1]  = (uint8_t)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
+
+    return SKEIN_SUCCESS;
+}
+
+int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash)
+{
+    int ret = SKEIN_FAIL;
+    Skein_Assert(ctx, SKEIN_FAIL);
+
+    switch (ctx->skeinSize) {
+    case Skein256:
+        ret = Skein_256_Final(&ctx->m.s256, (u08b_t*)hash);
+        break;
+    case Skein512:
+        ret = Skein_512_Final(&ctx->m.s512, (u08b_t*)hash);
+        break;
+    case Skein1024:
+        ret = Skein1024_Final(&ctx->m.s1024, (u08b_t*)hash);
+        break;
+    }
+    return ret;
+}
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
new file mode 100644
index 000000000000..4ad6c50360e7
--- /dev/null
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -0,0 +1,172 @@
+
+#include <string.h>
+#include <skein.h>
+#include <threefishApi.h>
+
+
+/*****************************  Skein_256 ******************************/
+void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u08b_t *blkPtr,
+                             size_t blkCnt, size_t byteCntAdd)
+{
+    ThreefishKey_t key;
+    u64b_t tweak[2];
+    int i;
+    u64b_t  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+    u64b_t words[3];
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64b_t carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish256, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN_256_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = ctx->X[0] ^ w[0];
+        ctx->X[1] = ctx->X[1] ^ w[1];
+        ctx->X[2] = ctx->X[2] ^ w[2];
+        ctx->X[3] = ctx->X[3] ^ w[3];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
+
+void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u08b_t *blkPtr,
+                             size_t blkCnt, size_t byteCntAdd)
+{
+    ThreefishKey_t key;
+    u64b_t tweak[2];
+    int i;
+    u64b_t words[3];
+    u64b_t  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64b_t carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish512, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN_512_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = ctx->X[0] ^ w[0];
+        ctx->X[1] = ctx->X[1] ^ w[1];
+        ctx->X[2] = ctx->X[2] ^ w[2];
+        ctx->X[3] = ctx->X[3] ^ w[3];
+        ctx->X[4] = ctx->X[4] ^ w[4];
+        ctx->X[5] = ctx->X[5] ^ w[5];
+        ctx->X[6] = ctx->X[6] ^ w[6];
+        ctx->X[7] = ctx->X[7] ^ w[7];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
+
+void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx, const u08b_t *blkPtr,
+                              size_t blkCnt, size_t byteCntAdd)
+{
+    ThreefishKey_t key;
+    u64b_t tweak[2];
+    int i;
+    u64b_t words[3];
+    u64b_t  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64b_t carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish1024, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN1024_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[ 0] = ctx->X[ 0] ^ w[ 0];
+        ctx->X[ 1] = ctx->X[ 1] ^ w[ 1];
+        ctx->X[ 2] = ctx->X[ 2] ^ w[ 2];
+        ctx->X[ 3] = ctx->X[ 3] ^ w[ 3];
+        ctx->X[ 4] = ctx->X[ 4] ^ w[ 4];
+        ctx->X[ 5] = ctx->X[ 5] ^ w[ 5];
+        ctx->X[ 6] = ctx->X[ 6] ^ w[ 6];
+        ctx->X[ 7] = ctx->X[ 7] ^ w[ 7];
+        ctx->X[ 8] = ctx->X[ 8] ^ w[ 8];
+        ctx->X[ 9] = ctx->X[ 9] ^ w[ 9];
+        ctx->X[10] = ctx->X[10] ^ w[10];
+        ctx->X[11] = ctx->X[11] ^ w[11];
+        ctx->X[12] = ctx->X[12] ^ w[12];
+        ctx->X[13] = ctx->X[13] ^ w[13];
+        ctx->X[14] = ctx->X[14] ^ w[14];
+        ctx->X[15] = ctx->X[15] ^ w[15];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
new file mode 100644
index 000000000000..86724a2443b5
--- /dev/null
+++ b/drivers/staging/skein/skein_block.c
@@ -0,0 +1,689 @@
+/***********************************************************************
+**
+** Implementation of the Skein block functions.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+** Compile-time switches:
+**
+**  SKEIN_USE_ASM  -- set bits (256/512/1024) to select which
+**                    versions use ASM code for block processing
+**                    [default: use C for all block sizes]
+**
+************************************************************************/
+
+#include <string.h>
+#include <skein.h>
+
+#ifndef SKEIN_USE_ASM
+#define SKEIN_USE_ASM   (0)                     /* default is all C code (no ASM) */
+#endif
+
+#ifndef SKEIN_LOOP
+#define SKEIN_LOOP 001                          /* default: unroll 256 and 512, but not 1024 */
+#endif
+
+#define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
+#define KW_TWK_BASE     (0)
+#define KW_KEY_BASE     (3)
+#define ks              (kw + KW_KEY_BASE)                
+#define ts              (kw + KW_TWK_BASE)
+
+#ifdef SKEIN_DEBUG
+#define DebugSaveTweak(ctx) { ctx->h.T[0] = ts[0]; ctx->h.T[1] = ts[1]; }
+#else
+#define DebugSaveTweak(ctx)
+#endif
+
+/*****************************  Skein_256 ******************************/
+#if !(SKEIN_USE_ASM & 256)
+void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+    { /* do it in C */
+    enum
+        {
+        WCNT = SKEIN_256_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
+
+#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
+#else
+#define SKEIN_UNROLL_256 (0)
+#endif
+
+#if SKEIN_UNROLL_256
+#if (RCNT % SKEIN_UNROLL_256)
+#error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+    u64b_t  X0,X1,X2,X3;                        /* local copy of context vars, for speed */
+    u64b_t  w [WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64b_t *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+#endif
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0] = ctx->X[0];     
+        ks[1] = ctx->X[1];
+        ks[2] = ctx->X[2];
+        ks[3] = ctx->X[3];
+        ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
+
+        ts[2] = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w,blkPtr,WCNT);   /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+
+        X0 = w[0] + ks[0];                      /* do the first full key injection */
+        X1 = w[1] + ks[1] + ts[0];
+        X2 = w[2] + ks[2] + ts[1];
+        X3 = w[3] + ks[3];
+
+        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);    /* show starting state values */
+
+        blkPtr += SKEIN_256_BLOCK_BYTES;
+
+        /* run the rounds */
+
+#define Round256(p0,p1,p2,p3,ROT,rNum)                              \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2; \
+
+#if SKEIN_UNROLL_256 == 0                       
+#define R256(p0,p1,p2,p3,ROT,rNum)           /* fully unrolled */   \
+    Round256(p0,p1,p2,p3,ROT,rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rNum,Xptr);
+
+#define I256(R)                                                     \
+    X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
+    X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
+    X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
+    X3   += ks[((R)+4) % 5] +     (R)+1;                            \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+#else                                       /* looping version */
+#define R256(p0,p1,p2,p3,ROT,rNum)                                  \
+    Round256(p0,p1,p2,p3,ROT,rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rNum,Xptr);
+
+#define I256(R)                                                     \
+    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
+    X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
+    X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
+    X3   += ks[r+(R)+3] +    r+(R)   ;                              \
+    ks[r + (R)+4    ]   = ks[r+(R)-1];     /* rotate key schedule */\
+    ts[r + (R)+2    ]   = ts[r+(R)-1];                              \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+
+    for (r=1;r < 2*RCNT;r+=2*SKEIN_UNROLL_256)  /* loop thru it */
+#endif  
+        {    
+#define R256_8_rounds(R)                  \
+        R256(0,1,2,3,R_256_0,8*(R) + 1);  \
+        R256(0,3,2,1,R_256_1,8*(R) + 2);  \
+        R256(0,1,2,3,R_256_2,8*(R) + 3);  \
+        R256(0,3,2,1,R_256_3,8*(R) + 4);  \
+        I256(2*(R));                      \
+        R256(0,1,2,3,R_256_4,8*(R) + 5);  \
+        R256(0,3,2,1,R_256_5,8*(R) + 6);  \
+        R256(0,1,2,3,R_256_6,8*(R) + 7);  \
+        R256(0,3,2,1,R_256_7,8*(R) + 8);  \
+        I256(2*(R)+1);
+
+        R256_8_rounds( 0);
+
+#define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
+
+  #if   R256_Unroll_R( 1)
+        R256_8_rounds( 1);
+  #endif
+  #if   R256_Unroll_R( 2)
+        R256_8_rounds( 2);
+  #endif
+  #if   R256_Unroll_R( 3)
+        R256_8_rounds( 3);
+  #endif
+  #if   R256_Unroll_R( 4)
+        R256_8_rounds( 4);
+  #endif
+  #if   R256_Unroll_R( 5)
+        R256_8_rounds( 5);
+  #endif
+  #if   R256_Unroll_R( 6)
+        R256_8_rounds( 6);
+  #endif
+  #if   R256_Unroll_R( 7)
+        R256_8_rounds( 7);
+  #endif
+  #if   R256_Unroll_R( 8)
+        R256_8_rounds( 8);
+  #endif
+  #if   R256_Unroll_R( 9)
+        R256_8_rounds( 9);
+  #endif
+  #if   R256_Unroll_R(10)
+        R256_8_rounds(10);
+  #endif
+  #if   R256_Unroll_R(11)
+        R256_8_rounds(11);
+  #endif
+  #if   R256_Unroll_R(12)
+        R256_8_rounds(12);
+  #endif
+  #if   R256_Unroll_R(13)
+        R256_8_rounds(13);
+  #endif
+  #if   R256_Unroll_R(14)
+        R256_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_256 > 14)
+#error  "need more unrolling in Skein_256_Process_Block"
+  #endif
+        }
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = X0 ^ w[0];
+        ctx->X[1] = X1 ^ w[1];
+        ctx->X[2] = X2 ^ w[2];
+        ctx->X[3] = X3 ^ w[3];
+
+        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein_256_Process_Block_CodeSize(void)
+    {
+    return ((u08b_t *) Skein_256_Process_Block_CodeSize) -
+           ((u08b_t *) Skein_256_Process_Block);
+    }
+uint_t Skein_256_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_256;
+    }
+#endif
+#endif
+
+/*****************************  Skein_512 ******************************/
+#if !(SKEIN_USE_ASM & 512)
+void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+    { /* do it in C */
+    enum
+        {
+        WCNT = SKEIN_512_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
+
+#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
+#else
+#define SKEIN_UNROLL_512 (0)
+#endif
+
+#if SKEIN_UNROLL_512
+#if (RCNT % SKEIN_UNROLL_512)
+#error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+    u64b_t  X0,X1,X2,X3,X4,X5,X6,X7;            /* local copy of vars, for speed */
+    u64b_t  w [WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64b_t *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+    Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
+#endif
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0] = ctx->X[0];
+        ks[1] = ctx->X[1];
+        ks[2] = ctx->X[2];
+        ks[3] = ctx->X[3];
+        ks[4] = ctx->X[4];
+        ks[5] = ctx->X[5];
+        ks[6] = ctx->X[6];
+        ks[7] = ctx->X[7];
+        ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
+                ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
+
+        ts[2] = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+
+        X0   = w[0] + ks[0];                    /* do the first full key injection */
+        X1   = w[1] + ks[1];
+        X2   = w[2] + ks[2];
+        X3   = w[3] + ks[3];
+        X4   = w[4] + ks[4];
+        X5   = w[5] + ks[5] + ts[0];
+        X6   = w[6] + ks[6] + ts[1];
+        X7   = w[7] + ks[7];
+
+        blkPtr += SKEIN_512_BLOCK_BYTES;
+
+        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);
+        /* run the rounds */
+#define Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                  \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2; \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4; \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6; \
+
+#if SKEIN_UNROLL_512 == 0                       
+#define R512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)      /* unrolled */  \
+    Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rNum,Xptr);
+
+#define I512(R)                                                     \
+    X0   += ks[((R)+1) % 9];   /* inject the key schedule value */  \
+    X1   += ks[((R)+2) % 9];                                        \
+    X2   += ks[((R)+3) % 9];                                        \
+    X3   += ks[((R)+4) % 9];                                        \
+    X4   += ks[((R)+5) % 9];                                        \
+    X5   += ks[((R)+6) % 9] + ts[((R)+1) % 3];                      \
+    X6   += ks[((R)+7) % 9] + ts[((R)+2) % 3];                      \
+    X7   += ks[((R)+8) % 9] +     (R)+1;                            \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+#else                                       /* looping version */
+#define R512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
+    Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rNum,Xptr);
+
+#define I512(R)                                                     \
+    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
+    X1   += ks[r+(R)+1];                                            \
+    X2   += ks[r+(R)+2];                                            \
+    X3   += ks[r+(R)+3];                                            \
+    X4   += ks[r+(R)+4];                                            \
+    X5   += ks[r+(R)+5] + ts[r+(R)+0];                              \
+    X6   += ks[r+(R)+6] + ts[r+(R)+1];                              \
+    X7   += ks[r+(R)+7] +    r+(R)   ;                              \
+    ks[r +       (R)+8] = ks[r+(R)-1];  /* rotate key schedule */   \
+    ts[r +       (R)+2] = ts[r+(R)-1];                              \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+
+    for (r=1;r < 2*RCNT;r+=2*SKEIN_UNROLL_512)   /* loop thru it */
+#endif                         /* end of looped code definitions */
+        {
+#define R512_8_rounds(R)  /* do 8 full rounds */  \
+        R512(0,1,2,3,4,5,6,7,R_512_0,8*(R)+ 1);   \
+        R512(2,1,4,7,6,5,0,3,R_512_1,8*(R)+ 2);   \
+        R512(4,1,6,3,0,5,2,7,R_512_2,8*(R)+ 3);   \
+        R512(6,1,0,7,2,5,4,3,R_512_3,8*(R)+ 4);   \
+        I512(2*(R));                              \
+        R512(0,1,2,3,4,5,6,7,R_512_4,8*(R)+ 5);   \
+        R512(2,1,4,7,6,5,0,3,R_512_5,8*(R)+ 6);   \
+        R512(4,1,6,3,0,5,2,7,R_512_6,8*(R)+ 7);   \
+        R512(6,1,0,7,2,5,4,3,R_512_7,8*(R)+ 8);   \
+        I512(2*(R)+1);        /* and key injection */
+
+        R512_8_rounds( 0);
+
+#define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
+
+  #if   R512_Unroll_R( 1)
+        R512_8_rounds( 1);
+  #endif
+  #if   R512_Unroll_R( 2)
+        R512_8_rounds( 2);
+  #endif
+  #if   R512_Unroll_R( 3)
+        R512_8_rounds( 3);
+  #endif
+  #if   R512_Unroll_R( 4)
+        R512_8_rounds( 4);
+  #endif
+  #if   R512_Unroll_R( 5)
+        R512_8_rounds( 5);
+  #endif
+  #if   R512_Unroll_R( 6)
+        R512_8_rounds( 6);
+  #endif
+  #if   R512_Unroll_R( 7)
+        R512_8_rounds( 7);
+  #endif
+  #if   R512_Unroll_R( 8)
+        R512_8_rounds( 8);
+  #endif
+  #if   R512_Unroll_R( 9)
+        R512_8_rounds( 9);
+  #endif
+  #if   R512_Unroll_R(10)
+        R512_8_rounds(10);
+  #endif
+  #if   R512_Unroll_R(11)
+        R512_8_rounds(11);
+  #endif
+  #if   R512_Unroll_R(12)
+        R512_8_rounds(12);
+  #endif
+  #if   R512_Unroll_R(13)
+        R512_8_rounds(13);
+  #endif
+  #if   R512_Unroll_R(14)
+        R512_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_512 > 14)
+#error  "need more unrolling in Skein_512_Process_Block"
+  #endif
+        }
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = X0 ^ w[0];
+        ctx->X[1] = X1 ^ w[1];
+        ctx->X[2] = X2 ^ w[2];
+        ctx->X[3] = X3 ^ w[3];
+        ctx->X[4] = X4 ^ w[4];
+        ctx->X[5] = X5 ^ w[5];
+        ctx->X[6] = X6 ^ w[6];
+        ctx->X[7] = X7 ^ w[7];
+        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein_512_Process_Block_CodeSize(void)
+    {
+    return ((u08b_t *) Skein_512_Process_Block_CodeSize) -
+           ((u08b_t *) Skein_512_Process_Block);
+    }
+uint_t Skein_512_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_512;
+    }
+#endif
+#endif
+
+/*****************************  Skein1024 ******************************/
+#if !(SKEIN_USE_ASM & 1024)
+void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+    { /* do it in C, always looping (unrolled is bigger AND slower!) */
+    enum
+        {
+        WCNT = SKEIN1024_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
+
+#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
+#else
+#define SKEIN_UNROLL_1024 (0)
+#endif
+
+#if (SKEIN_UNROLL_1024 != 0)
+#if (RCNT % SKEIN_UNROLL_1024)
+#error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+
+    u64b_t  X00,X01,X02,X03,X04,X05,X06,X07,    /* local copy of vars, for speed */
+            X08,X09,X10,X11,X12,X13,X14,X15;
+    u64b_t  w [WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64b_t *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+    Xptr[ 0] = &X00;  Xptr[ 1] = &X01;  Xptr[ 2] = &X02;  Xptr[ 3] = &X03;
+    Xptr[ 4] = &X04;  Xptr[ 5] = &X05;  Xptr[ 6] = &X06;  Xptr[ 7] = &X07;
+    Xptr[ 8] = &X08;  Xptr[ 9] = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
+    Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
+#endif
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[ 0] = ctx->X[ 0];
+        ks[ 1] = ctx->X[ 1];
+        ks[ 2] = ctx->X[ 2];
+        ks[ 3] = ctx->X[ 3];
+        ks[ 4] = ctx->X[ 4];
+        ks[ 5] = ctx->X[ 5];
+        ks[ 6] = ctx->X[ 6];
+        ks[ 7] = ctx->X[ 7];
+        ks[ 8] = ctx->X[ 8];
+        ks[ 9] = ctx->X[ 9];
+        ks[10] = ctx->X[10];
+        ks[11] = ctx->X[11];
+        ks[12] = ctx->X[12];
+        ks[13] = ctx->X[13];
+        ks[14] = ctx->X[14];
+        ks[15] = ctx->X[15];
+        ks[16] = ks[ 0] ^ ks[ 1] ^ ks[ 2] ^ ks[ 3] ^
+                 ks[ 4] ^ ks[ 5] ^ ks[ 6] ^ ks[ 7] ^
+                 ks[ 8] ^ ks[ 9] ^ ks[10] ^ ks[11] ^
+                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
+
+        ts[2]  = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+
+        X00    = w[ 0] + ks[ 0];                 /* do the first full key injection */
+        X01    = w[ 1] + ks[ 1];
+        X02    = w[ 2] + ks[ 2];
+        X03    = w[ 3] + ks[ 3];
+        X04    = w[ 4] + ks[ 4];
+        X05    = w[ 5] + ks[ 5];
+        X06    = w[ 6] + ks[ 6];
+        X07    = w[ 7] + ks[ 7];
+        X08    = w[ 8] + ks[ 8];
+        X09    = w[ 9] + ks[ 9];
+        X10    = w[10] + ks[10];
+        X11    = w[11] + ks[11];
+        X12    = w[12] + ks[12];
+        X13    = w[13] + ks[13] + ts[0];
+        X14    = w[14] + ks[14] + ts[1];
+        X15    = w[15] + ks[15];
+
+        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);
+
+#define Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rNum) \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0;   \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2;   \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4;   \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6;   \
+    X##p8 += X##p9; X##p9 = RotL_64(X##p9,ROT##_4); X##p9 ^= X##p8;   \
+    X##pA += X##pB; X##pB = RotL_64(X##pB,ROT##_5); X##pB ^= X##pA;   \
+    X##pC += X##pD; X##pD = RotL_64(X##pD,ROT##_6); X##pD ^= X##pC;   \
+    X##pE += X##pF; X##pF = RotL_64(X##pF,ROT##_7); X##pF ^= X##pE;   \
+
+#if SKEIN_UNROLL_1024 == 0                      
+#define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
+    Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rn,Xptr);
+
+#define I1024(R)                                                      \
+    X00   += ks[((R)+ 1) % 17]; /* inject the key schedule value */   \
+    X01   += ks[((R)+ 2) % 17];                                       \
+    X02   += ks[((R)+ 3) % 17];                                       \
+    X03   += ks[((R)+ 4) % 17];                                       \
+    X04   += ks[((R)+ 5) % 17];                                       \
+    X05   += ks[((R)+ 6) % 17];                                       \
+    X06   += ks[((R)+ 7) % 17];                                       \
+    X07   += ks[((R)+ 8) % 17];                                       \
+    X08   += ks[((R)+ 9) % 17];                                       \
+    X09   += ks[((R)+10) % 17];                                       \
+    X10   += ks[((R)+11) % 17];                                       \
+    X11   += ks[((R)+12) % 17];                                       \
+    X12   += ks[((R)+13) % 17];                                       \
+    X13   += ks[((R)+14) % 17] + ts[((R)+1) % 3];                     \
+    X14   += ks[((R)+15) % 17] + ts[((R)+2) % 3];                     \
+    X15   += ks[((R)+16) % 17] +     (R)+1;                           \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr); 
+#else                                       /* looping version */
+#define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
+    Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rn,Xptr);
+
+#define I1024(R)                                                      \
+    X00   += ks[r+(R)+ 0];    /* inject the key schedule value */     \
+    X01   += ks[r+(R)+ 1];                                            \
+    X02   += ks[r+(R)+ 2];                                            \
+    X03   += ks[r+(R)+ 3];                                            \
+    X04   += ks[r+(R)+ 4];                                            \
+    X05   += ks[r+(R)+ 5];                                            \
+    X06   += ks[r+(R)+ 6];                                            \
+    X07   += ks[r+(R)+ 7];                                            \
+    X08   += ks[r+(R)+ 8];                                            \
+    X09   += ks[r+(R)+ 9];                                            \
+    X10   += ks[r+(R)+10];                                            \
+    X11   += ks[r+(R)+11];                                            \
+    X12   += ks[r+(R)+12];                                            \
+    X13   += ks[r+(R)+13] + ts[r+(R)+0];                              \
+    X14   += ks[r+(R)+14] + ts[r+(R)+1];                              \
+    X15   += ks[r+(R)+15] +    r+(R)   ;                              \
+    ks[r  +       (R)+16] = ks[r+(R)-1];  /* rotate key schedule */   \
+    ts[r  +       (R)+ 2] = ts[r+(R)-1];                              \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+
+    for (r=1;r <= 2*RCNT;r+=2*SKEIN_UNROLL_1024)    /* loop thru it */
+#endif  
+        {
+#define R1024_8_rounds(R)    /* do 8 full rounds */                               \
+        R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_0,8*(R) + 1); \
+        R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_1,8*(R) + 2); \
+        R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_2,8*(R) + 3); \
+        R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_3,8*(R) + 4); \
+        I1024(2*(R));                                                             \
+        R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_4,8*(R) + 5); \
+        R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_5,8*(R) + 6); \
+        R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_6,8*(R) + 7); \
+        R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_7,8*(R) + 8); \
+        I1024(2*(R)+1);
+
+        R1024_8_rounds( 0);
+
+#define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
+
+  #if   R1024_Unroll_R( 1)
+        R1024_8_rounds( 1);
+  #endif
+  #if   R1024_Unroll_R( 2)
+        R1024_8_rounds( 2);
+  #endif
+  #if   R1024_Unroll_R( 3)
+        R1024_8_rounds( 3);
+  #endif
+  #if   R1024_Unroll_R( 4)
+        R1024_8_rounds( 4);
+  #endif
+  #if   R1024_Unroll_R( 5)
+        R1024_8_rounds( 5);
+  #endif
+  #if   R1024_Unroll_R( 6)
+        R1024_8_rounds( 6);
+  #endif
+  #if   R1024_Unroll_R( 7)
+        R1024_8_rounds( 7);
+  #endif
+  #if   R1024_Unroll_R( 8)
+        R1024_8_rounds( 8);
+  #endif
+  #if   R1024_Unroll_R( 9)
+        R1024_8_rounds( 9);
+  #endif
+  #if   R1024_Unroll_R(10)
+        R1024_8_rounds(10);
+  #endif
+  #if   R1024_Unroll_R(11)
+        R1024_8_rounds(11);
+  #endif
+  #if   R1024_Unroll_R(12)
+        R1024_8_rounds(12);
+  #endif
+  #if   R1024_Unroll_R(13)
+        R1024_8_rounds(13);
+  #endif
+  #if   R1024_Unroll_R(14)
+        R1024_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_1024 > 14)
+#error  "need more unrolling in Skein_1024_Process_Block"
+  #endif
+        }
+        /* do the final "feedforward" xor, update context chaining vars */
+
+        ctx->X[ 0] = X00 ^ w[ 0];
+        ctx->X[ 1] = X01 ^ w[ 1];
+        ctx->X[ 2] = X02 ^ w[ 2];
+        ctx->X[ 3] = X03 ^ w[ 3];
+        ctx->X[ 4] = X04 ^ w[ 4];
+        ctx->X[ 5] = X05 ^ w[ 5];
+        ctx->X[ 6] = X06 ^ w[ 6];
+        ctx->X[ 7] = X07 ^ w[ 7];
+        ctx->X[ 8] = X08 ^ w[ 8];
+        ctx->X[ 9] = X09 ^ w[ 9];
+        ctx->X[10] = X10 ^ w[10];
+        ctx->X[11] = X11 ^ w[11];
+        ctx->X[12] = X12 ^ w[12];
+        ctx->X[13] = X13 ^ w[13];
+        ctx->X[14] = X14 ^ w[14];
+        ctx->X[15] = X15 ^ w[15];
+
+        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+        
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        blkPtr += SKEIN1024_BLOCK_BYTES;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein1024_Process_Block_CodeSize(void)
+    {
+    return ((u08b_t *) Skein1024_Process_Block_CodeSize) -
+           ((u08b_t *) Skein1024_Process_Block);
+    }
+uint_t Skein1024_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_1024;
+    }
+#endif
+#endif
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
new file mode 100644
index 000000000000..8b43586f46bc
--- /dev/null
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -0,0 +1,1385 @@
+#include <threefishApi.h>
+#include <stdint.h>
+#include <string.h>
+
+
+void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+        {
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3],
+      b4 = input[4], b5 = input[5],
+      b6 = input[6], b7 = input[7],
+      b8 = input[8], b9 = input[9],
+      b10 = input[10], b11 = input[11],
+      b12 = input[12], b13 = input[13],
+      b14 = input[14], b15 = input[15];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+      k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+      k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+      k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+      k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+      k16 = keyCtx->key[16];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+            b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k3; b2 += b3 + k2; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k5; b4 += b5 + k4; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k7; b6 += b7 + k6; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k9; b8 += b9 + k8; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k11; b10 += b11 + k10; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k13 + t0; b12 += b13 + k12; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k15; b14 += b15 + k14 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k2; b0 += b1 + k1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k4; b2 += b3 + k3; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k6; b4 += b5 + k5; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k8; b6 += b7 + k7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k10; b8 += b9 + k9; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k12; b10 += b11 + k11; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k14 + t1; b12 += b13 + k13; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k16 + 1; b14 += b15 + k15 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k3; b0 += b1 + k2; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k5; b2 += b3 + k4; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k7; b4 += b5 + k6; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k9; b6 += b7 + k8; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k11; b8 += b9 + k10; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k13; b10 += b11 + k12; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k15 + t2; b12 += b13 + k14; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k0 + 2; b14 += b15 + k16 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k4; b0 += b1 + k3; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k6; b2 += b3 + k5; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k8; b4 += b5 + k7; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k10; b6 += b7 + k9; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k12; b8 += b9 + k11; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k14; b10 += b11 + k13; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k16 + t0; b12 += b13 + k15; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k1 + 3; b14 += b15 + k0 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k5; b0 += b1 + k4; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k7; b2 += b3 + k6; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k9; b4 += b5 + k8; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k11; b6 += b7 + k10; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k13; b8 += b9 + k12; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k15; b10 += b11 + k14; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k0 + t1; b12 += b13 + k16; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k2 + 4; b14 += b15 + k1 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k6; b0 += b1 + k5; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k8; b2 += b3 + k7; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k10; b4 += b5 + k9; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k12; b6 += b7 + k11; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k14; b8 += b9 + k13; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k16; b10 += b11 + k15; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k1 + t2; b12 += b13 + k0; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k3 + 5; b14 += b15 + k2 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k7; b0 += b1 + k6; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k9; b2 += b3 + k8; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k11; b4 += b5 + k10; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k13; b6 += b7 + k12; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k15; b8 += b9 + k14; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k0; b10 += b11 + k16; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k2 + t0; b12 += b13 + k1; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k4 + 6; b14 += b15 + k3 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k8; b0 += b1 + k7; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k10; b2 += b3 + k9; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k12; b4 += b5 + k11; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k14; b6 += b7 + k13; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k16; b8 += b9 + k15; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k1; b10 += b11 + k0; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k3 + t1; b12 += b13 + k2; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k5 + 7; b14 += b15 + k4 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k9; b0 += b1 + k8; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k11; b2 += b3 + k10; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k13; b4 += b5 + k12; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k15; b6 += b7 + k14; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k0; b8 += b9 + k16; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k2; b10 += b11 + k1; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k4 + t2; b12 += b13 + k3; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k6 + 8; b14 += b15 + k5 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k10; b0 += b1 + k9; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k12; b2 += b3 + k11; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k14; b4 += b5 + k13; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k16; b6 += b7 + k15; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k1; b8 += b9 + k0; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k3; b10 += b11 + k2; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k5 + t0; b12 += b13 + k4; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k7 + 9; b14 += b15 + k6 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k11; b0 += b1 + k10; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k13; b2 += b3 + k12; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k15; b4 += b5 + k14; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k0; b6 += b7 + k16; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k2; b8 += b9 + k1; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k4; b10 += b11 + k3; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k6 + t1; b12 += b13 + k5; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k8 + 10; b14 += b15 + k7 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k12; b0 += b1 + k11; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k14; b2 += b3 + k13; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k16; b4 += b5 + k15; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k1; b6 += b7 + k0; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k3; b8 += b9 + k2; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k5; b10 += b11 + k4; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k7 + t2; b12 += b13 + k6; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k9 + 11; b14 += b15 + k8 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k13; b0 += b1 + k12; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k15; b2 += b3 + k14; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k0; b4 += b5 + k16; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k2; b6 += b7 + k1; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k4; b8 += b9 + k3; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k6; b10 += b11 + k5; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k8 + t0; b12 += b13 + k7; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k10 + 12; b14 += b15 + k9 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k14; b0 += b1 + k13; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k16; b2 += b3 + k15; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k1; b4 += b5 + k0; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k3; b6 += b7 + k2; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k5; b8 += b9 + k4; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k7; b10 += b11 + k6; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k9 + t1; b12 += b13 + k8; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k11 + 13; b14 += b15 + k10 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k15; b0 += b1 + k14; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k0; b2 += b3 + k16; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k2; b4 += b5 + k1; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k4; b6 += b7 + k3; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k6; b8 += b9 + k5; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k8; b10 += b11 + k7; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k10 + t2; b12 += b13 + k9; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k12 + 14; b14 += b15 + k11 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k16; b0 += b1 + k15; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k1; b2 += b3 + k0; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k3; b4 += b5 + k2; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k5; b6 += b7 + k4; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k7; b8 += b9 + k6; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k9; b10 += b11 + k8; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k11 + t0; b12 += b13 + k10; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k13 + 15; b14 += b15 + k12 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k0; b0 += b1 + k16; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k2; b2 += b3 + k1; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k4; b4 += b5 + k3; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k6; b6 += b7 + k5; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k8; b8 += b9 + k7; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k10; b10 += b11 + k9; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k12 + t1; b12 += b13 + k11; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k14 + 16; b14 += b15 + k13 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k1; b0 += b1 + k0; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k3; b2 += b3 + k2; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k5; b4 += b5 + k4; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k7; b6 += b7 + k6; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k9; b8 += b9 + k8; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k11; b10 += b11 + k10; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k13 + t2; b12 += b13 + k12; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k15 + 17; b14 += b15 + k14 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k2; b0 += b1 + k1; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k4; b2 += b3 + k3; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k6; b4 += b5 + k5; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k8; b6 += b7 + k7; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k10; b8 += b9 + k9; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k12; b10 += b11 + k11; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k14 + t0; b12 += b13 + k13; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k16 + 18; b14 += b15 + k15 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k3; b0 += b1 + k2; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k5; b2 += b3 + k4; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k7; b4 += b5 + k6; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k9; b6 += b7 + k8; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k11; b8 += b9 + k10; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k13; b10 += b11 + k12; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k15 + t1; b12 += b13 + k14; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k0 + 19; b14 += b15 + k16 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+            output[0] = b0 + k3;
+            output[1] = b1 + k4;
+            output[2] = b2 + k5;
+            output[3] = b3 + k6;
+            output[4] = b4 + k7;
+            output[5] = b5 + k8;
+            output[6] = b6 + k9;
+            output[7] = b7 + k10;
+            output[8] = b8 + k11;
+            output[9] = b9 + k12;
+            output[10] = b10 + k13;
+            output[11] = b11 + k14;
+            output[12] = b12 + k15;
+            output[13] = b13 + k16 + t2;
+            output[14] = b14 + k0 + t0;
+            output[15] = b15 + k1 + 20;
+        }
+
+void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+{
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3],
+      b4 = input[4], b5 = input[5],
+      b6 = input[6], b7 = input[7],
+      b8 = input[8], b9 = input[9],
+      b10 = input[10], b11 = input[11],
+      b12 = input[12], b13 = input[13],
+      b14 = input[14], b15 = input[15];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+      k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+      k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+      k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+      k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+      k16 = keyCtx->key[16];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+    uint64_t tmp;
+
+            b0 -= k3;
+            b1 -= k4;
+            b2 -= k5;
+            b3 -= k6;
+            b4 -= k7;
+            b5 -= k8;
+            b6 -= k9;
+            b7 -= k10;
+            b8 -= k11;
+            b9 -= k12;
+            b10 -= k13;
+            b11 -= k14;
+            b12 -= k15;
+            b13 -= k16 + t2;
+            b14 -= k0 + t0;
+            b15 -= k1 + 20;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k16 + t2; b15 -= k0 + 19;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k14; b13 -= k15 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k12; b11 -= k13;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k10; b9 -= k11;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k8; b7 -= k9;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k6; b5 -= k7;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k4; b3 -= k5;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k2; b1 -= k3;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k15 + t1; b15 -= k16 + 18;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k13; b13 -= k14 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k11; b11 -= k12;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k9; b9 -= k10;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k7; b7 -= k8;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k5; b5 -= k6;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k3; b3 -= k4;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k1; b1 -= k2;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k14 + t0; b15 -= k15 + 17;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k12; b13 -= k13 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k10; b11 -= k11;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k8; b9 -= k9;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k6; b7 -= k7;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k4; b5 -= k5;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k2; b3 -= k3;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k0; b1 -= k1;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k13 + t2; b15 -= k14 + 16;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k11; b13 -= k12 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k9; b11 -= k10;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k7; b9 -= k8;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k5; b7 -= k6;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k3; b5 -= k4;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k1; b3 -= k2;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k16; b1 -= k0;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k12 + t1; b15 -= k13 + 15;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k10; b13 -= k11 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k8; b11 -= k9;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k6; b9 -= k7;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k4; b7 -= k5;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k2; b5 -= k3;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k0; b3 -= k1;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k15; b1 -= k16;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k11 + t0; b15 -= k12 + 14;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k9; b13 -= k10 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k7; b11 -= k8;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k5; b9 -= k6;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k3; b7 -= k4;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k1; b5 -= k2;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k16; b3 -= k0;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k14; b1 -= k15;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k10 + t2; b15 -= k11 + 13;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k8; b13 -= k9 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k6; b11 -= k7;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k4; b9 -= k5;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k2; b7 -= k3;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k0; b5 -= k1;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k15; b3 -= k16;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k13; b1 -= k14;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k9 + t1; b15 -= k10 + 12;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k7; b13 -= k8 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k5; b11 -= k6;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k3; b9 -= k4;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k1; b7 -= k2;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k16; b5 -= k0;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k14; b3 -= k15;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k12; b1 -= k13;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k8 + t0; b15 -= k9 + 11;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k6; b13 -= k7 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k4; b11 -= k5;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k2; b9 -= k3;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k0; b7 -= k1;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k15; b5 -= k16;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k13; b3 -= k14;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k11; b1 -= k12;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k7 + t2; b15 -= k8 + 10;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k5; b13 -= k6 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k3; b11 -= k4;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k1; b9 -= k2;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k16; b7 -= k0;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k14; b5 -= k15;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k12; b3 -= k13;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k10; b1 -= k11;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k6 + t1; b15 -= k7 + 9;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k4; b13 -= k5 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k2; b11 -= k3;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k0; b9 -= k1;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k15; b7 -= k16;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k13; b5 -= k14;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k11; b3 -= k12;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k9; b1 -= k10;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k5 + t0; b15 -= k6 + 8;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k3; b13 -= k4 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k1; b11 -= k2;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k16; b9 -= k0;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k14; b7 -= k15;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k12; b5 -= k13;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k10; b3 -= k11;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k8; b1 -= k9;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k4 + t2; b15 -= k5 + 7;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k2; b13 -= k3 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k0; b11 -= k1;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k15; b9 -= k16;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k13; b7 -= k14;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k11; b5 -= k12;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k9; b3 -= k10;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k7; b1 -= k8;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k3 + t1; b15 -= k4 + 6;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k1; b13 -= k2 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k16; b11 -= k0;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k14; b9 -= k15;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k12; b7 -= k13;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k10; b5 -= k11;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k8; b3 -= k9;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k6; b1 -= k7;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k2 + t0; b15 -= k3 + 5;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k0; b13 -= k1 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k15; b11 -= k16;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k13; b9 -= k14;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k11; b7 -= k12;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k9; b5 -= k10;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k7; b3 -= k8;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k5; b1 -= k6;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k1 + t2; b15 -= k2 + 4;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k16; b13 -= k0 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k14; b11 -= k15;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k12; b9 -= k13;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k10; b7 -= k11;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k8; b5 -= k9;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k6; b3 -= k7;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k4; b1 -= k5;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k0 + t1; b15 -= k1 + 3;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k15; b13 -= k16 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k13; b11 -= k14;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k11; b9 -= k12;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k9; b7 -= k10;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k7; b5 -= k8;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k5; b3 -= k6;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k3; b1 -= k4;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k16 + t0; b15 -= k0 + 2;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k14; b13 -= k15 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k12; b11 -= k13;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k10; b9 -= k11;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k8; b7 -= k9;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k6; b5 -= k7;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k4; b3 -= k5;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k2; b1 -= k3;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k15 + t2; b15 -= k16 + 1;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k13; b13 -= k14 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k11; b11 -= k12;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k9; b9 -= k10;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k7; b7 -= k8;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k5; b5 -= k6;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k3; b3 -= k4;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k1; b1 -= k2;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k14 + t1; b15 -= k15;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k12; b13 -= k13 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k10; b11 -= k11;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k8; b9 -= k9;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k6; b7 -= k7;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k4; b5 -= k5;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k2; b3 -= k3;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k0; b1 -= k1;
+
+            output[15] = b15;
+            output[14] = b14;
+            output[13] = b13;
+            output[12] = b12;
+            output[11] = b11;
+            output[10] = b10;
+            output[9] = b9;
+            output[8] = b8;
+            output[7] = b7;
+            output[6] = b6;
+            output[5] = b5;
+            output[4] = b4;
+            output[3] = b3;
+            output[2] = b2;
+            output[1] = b1;
+            output[0] = b0;
+}
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
new file mode 100644
index 000000000000..db2b81978c91
--- /dev/null
+++ b/drivers/staging/skein/threefish256Block.c
@@ -0,0 +1,349 @@
+#include <threefishApi.h>
+#include <stdint.h>
+#include <string.h>
+
+
+void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+  {
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+    b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k3; b2 += b3 + k2 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k4 + 1; b2 += b3 + k3 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k0 + 2; b2 += b3 + k4 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k4 + t0; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k1 + 3; b2 += b3 + k0 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k0 + t1; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k2 + 4; b2 += b3 + k1 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k1 + t2; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k3 + 5; b2 += b3 + k2 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k2 + t0; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k4 + 6; b2 += b3 + k3 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k3 + t1; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k0 + 7; b2 += b3 + k4 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k4 + t2; b0 += b1 + k3; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k1 + 8; b2 += b3 + k0 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k0 + t0; b0 += b1 + k4; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k2 + 9; b2 += b3 + k1 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k1 + t1; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k3 + 10; b2 += b3 + k2 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k2 + t2; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k4 + 11; b2 += b3 + k3 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k3 + t0; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k0 + 12; b2 += b3 + k4 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k4 + t1; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k1 + 13; b2 += b3 + k0 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k0 + t2; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k2 + 14; b2 += b3 + k1 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k3 + 15; b2 += b3 + k2 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k4 + 16; b2 += b3 + k3 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k0 + 17; b2 += b3 + k4 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    output[0] = b0 + k3;
+    output[1] = b1 + k4 + t0;
+    output[2] = b2 + k0 + t1;
+    output[3] = b3 + k1 + 18;
+  }
+
+void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+  {
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+    uint64_t tmp;
+
+    b0 -= k3;
+    b1 -= k4 + t0;
+    b2 -= k0 + t1;
+    b3 -= k1 + 18;
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t0; b3 -= k0 + 17;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t2; b3 -= k4 + 16;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t1; b3 -= k3 + 15;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t0; b3 -= k2 + 14;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t2; b3 -= k1 + 13;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t1; b3 -= k0 + 12;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t0; b3 -= k4 + 11;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t2; b3 -= k3 + 10;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k4; b1 -= k0 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k1 + t1; b3 -= k2 + 9;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k3; b1 -= k4 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k0 + t0; b3 -= k1 + 8;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t2; b3 -= k0 + 7;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t1; b3 -= k4 + 6;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t0; b3 -= k3 + 5;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t2; b3 -= k2 + 4;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t1; b3 -= k1 + 3;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t0; b3 -= k0 + 2;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t2; b3 -= k4 + 1;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t1; b3 -= k3;
+
+    output[0] = b0;
+    output[1] = b1;
+    output[2] = b2;
+    output[3] = b3;
+  }
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
new file mode 100644
index 000000000000..4fe708fea066
--- /dev/null
+++ b/drivers/staging/skein/threefish512Block.c
@@ -0,0 +1,643 @@
+#include <threefishApi.h>
+#include <stdint.h>
+#include <string.h>
+
+
+void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+    {
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3],
+      b4 = input[4], b5 = input[5],
+      b6 = input[6], b7 = input[7];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+      k8 = keyCtx->key[8];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+        b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k3; b2 += b3 + k2; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k7; b6 += b7 + k6 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k2; b0 += b1 + k1; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k4; b2 += b3 + k3; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k8 + 1; b6 += b7 + k7 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k3; b0 += b1 + k2; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k5; b2 += b3 + k4; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k0 + 2; b6 += b7 + k8 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k4; b0 += b1 + k3; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k6; b2 += b3 + k5; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k1 + 3; b6 += b7 + k0 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k5; b0 += b1 + k4; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k7; b2 += b3 + k6; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k2 + 4; b6 += b7 + k1 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k6; b0 += b1 + k5; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k8; b2 += b3 + k7; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k3 + 5; b6 += b7 + k2 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k7; b0 += b1 + k6; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k0; b2 += b3 + k8; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k4 + 6; b6 += b7 + k3 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k8; b0 += b1 + k7; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k1; b2 += b3 + k0; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k5 + 7; b6 += b7 + k4 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k0; b0 += b1 + k8; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k2; b2 += b3 + k1; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k6 + 8; b6 += b7 + k5 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k1; b0 += b1 + k0; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k3; b2 += b3 + k2; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k7 + 9; b6 += b7 + k6 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k2; b0 += b1 + k1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k4; b2 += b3 + k3; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k8 + 10; b6 += b7 + k7 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k3; b0 += b1 + k2; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k5; b2 += b3 + k4; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k0 + 11; b6 += b7 + k8 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k4; b0 += b1 + k3; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k6; b2 += b3 + k5; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k1 + 12; b6 += b7 + k0 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k5; b0 += b1 + k4; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k7; b2 += b3 + k6; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k2 + 13; b6 += b7 + k1 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k6; b0 += b1 + k5; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k8; b2 += b3 + k7; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k3 + 14; b6 += b7 + k2 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k7; b0 += b1 + k6; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k0; b2 += b3 + k8; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k4 + 15; b6 += b7 + k3 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k8; b0 += b1 + k7; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k1; b2 += b3 + k0; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k5 + 16; b6 += b7 + k4 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k0; b0 += b1 + k8; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k2; b2 += b3 + k1; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k6 + 17; b6 += b7 + k5 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+        output[0] = b0 + k0;
+        output[1] = b1 + k1;
+        output[2] = b2 + k2;
+        output[3] = b3 + k3;
+        output[4] = b4 + k4;
+        output[5] = b5 + k5 + t0;
+        output[6] = b6 + k6 + t1;
+        output[7] = b7 + k7 + 18;
+    }
+
+void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+    {
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3],
+      b4 = input[4], b5 = input[5],
+      b6 = input[6], b7 = input[7];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+      k8 = keyCtx->key[8];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+      uint64_t tmp;
+
+        b0 -= k0;
+        b1 -= k1;
+        b2 -= k2;
+        b3 -= k3;
+        b4 -= k4;
+        b5 -= k5 + t0;
+        b6 -= k6 + t1;
+        b7 -= k7 + 18;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k5 + t0; b7 -= k6 + 17;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k3; b5 -= k4 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k1; b3 -= k2;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k8; b1 -= k0;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k4 + t2; b7 -= k5 + 16;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k2; b5 -= k3 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k0; b3 -= k1;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k7; b1 -= k8;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k3 + t1; b7 -= k4 + 15;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k1; b5 -= k2 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k8; b3 -= k0;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k6; b1 -= k7;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k2 + t0; b7 -= k3 + 14;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k0; b5 -= k1 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k7; b3 -= k8;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k5; b1 -= k6;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k1 + t2; b7 -= k2 + 13;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k8; b5 -= k0 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k6; b3 -= k7;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k4; b1 -= k5;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k0 + t1; b7 -= k1 + 12;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k7; b5 -= k8 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k5; b3 -= k6;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k3; b1 -= k4;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k8 + t0; b7 -= k0 + 11;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k6; b5 -= k7 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k4; b3 -= k5;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k2; b1 -= k3;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k7 + t2; b7 -= k8 + 10;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k5; b5 -= k6 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k3; b3 -= k4;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k1; b1 -= k2;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k6 + t1; b7 -= k7 + 9;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k4; b5 -= k5 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k2; b3 -= k3;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k0; b1 -= k1;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k5 + t0; b7 -= k6 + 8;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k3; b5 -= k4 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k1; b3 -= k2;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k8; b1 -= k0;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k4 + t2; b7 -= k5 + 7;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k2; b5 -= k3 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k0; b3 -= k1;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k7; b1 -= k8;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k3 + t1; b7 -= k4 + 6;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k1; b5 -= k2 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k8; b3 -= k0;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k6; b1 -= k7;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k2 + t0; b7 -= k3 + 5;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k0; b5 -= k1 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k7; b3 -= k8;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k5; b1 -= k6;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k1 + t2; b7 -= k2 + 4;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k8; b5 -= k0 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k6; b3 -= k7;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k4; b1 -= k5;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k0 + t1; b7 -= k1 + 3;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k7; b5 -= k8 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k5; b3 -= k6;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k3; b1 -= k4;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k8 + t0; b7 -= k0 + 2;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k6; b5 -= k7 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k4; b3 -= k5;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k2; b1 -= k3;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k7 + t2; b7 -= k8 + 1;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k5; b5 -= k6 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k3; b3 -= k4;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k1; b1 -= k2;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k6 + t1; b7 -= k7;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k4; b5 -= k5 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k2; b3 -= k3;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k0; b1 -= k1;
+
+    output[0] = b0;
+    output[1] = b1;
+    output[2] = b2;
+    output[3] = b3;
+
+        output[7] = b7;
+        output[6] = b6;
+        output[5] = b5;
+        output[4] = b4;
+}
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
new file mode 100644
index 000000000000..5afa0338aef4
--- /dev/null
+++ b/drivers/staging/skein/threefishApi.c
@@ -0,0 +1,79 @@
+
+
+#include <threefishApi.h>
+#include <stdlib.h>
+#include <string.h>
+
+void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
+                     uint64_t* keyData, uint64_t* tweak)
+{
+    int keyWords = stateSize / 64;
+    int i;
+    uint64_t parity = KeyScheduleConst;
+
+    keyCtx->tweak[0] = tweak[0];
+    keyCtx->tweak[1] = tweak[1];
+    keyCtx->tweak[2] = tweak[0] ^ tweak[1];
+
+    for (i = 0; i < keyWords; i++) {
+        keyCtx->key[i] = keyData[i];
+        parity ^= keyData[i];
+    }
+    keyCtx->key[i] = parity;
+    keyCtx->stateSize = stateSize;
+}
+
+void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
+                                uint8_t* out)
+{
+    u64b_t plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+    u64b_t cipher[SKEIN_MAX_STATE_WORDS];
+    
+    Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
+    threefishEncryptBlockWords(keyCtx, plain, cipher);
+    Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
+}
+
+void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
+                                uint64_t* out)
+{
+    switch (keyCtx->stateSize) {
+        case Threefish256:
+            threefishEncrypt256(keyCtx, in, out);
+            break;
+        case Threefish512:
+            threefishEncrypt512(keyCtx, in, out);
+            break;
+        case Threefish1024:
+            threefishEncrypt1024(keyCtx, in, out);
+            break;
+    }
+}
+
+void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
+                                uint8_t* out)
+{
+    u64b_t plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+    u64b_t cipher[SKEIN_MAX_STATE_WORDS];
+    
+    Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
+    threefishDecryptBlockWords(keyCtx, cipher, plain);
+    Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
+}
+
+void threefishDecryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
+                                uint64_t* out)
+{
+    switch (keyCtx->stateSize) {
+        case Threefish256:
+            threefishDecrypt256(keyCtx, in, out);
+            break;
+        case Threefish512:
+            threefishDecrypt512(keyCtx, in, out);
+            break;
+        case Threefish1024:
+            threefishDecrypt1024(keyCtx, in, out);
+            break;
+    }
+}
+
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 03/22] staging: crypto: skein: allow building statically
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 01/22] scripts: objdiff: detect object code changes between two commits Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 02/22] staging: crypto: skein: import code from Skein3Fish.git Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-17 21:52   ` Greg KH
  2014-03-11 21:32 ` [RFC PATCH 04/22] staging: crypto: skein: remove brg_*.h includes Jason Cooper
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

These are the minimum changes required to get the code to build
statically in the kernel.  It's necessary to do this first so that we
can empirically determine that future cleanup patches aren't changing
the generated object code.

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/Kconfig                      |  2 +
 drivers/staging/Makefile                     |  1 +
 drivers/staging/skein/CMakeLists.txt         | 27 -------------
 drivers/staging/skein/Kconfig                | 32 ++++++++++++++++
 drivers/staging/skein/Makefile               | 13 +++++++
 drivers/staging/skein/include/brg_types.h    | 57 ----------------------------
 drivers/staging/skein/include/skein.h        | 10 -----
 drivers/staging/skein/include/skeinApi.h     |  2 +-
 drivers/staging/skein/include/skein_port.h   | 16 +-------
 drivers/staging/skein/include/threefishApi.h |  2 +-
 drivers/staging/skein/skein.c                |  2 +-
 drivers/staging/skein/skeinApi.c             |  4 +-
 drivers/staging/skein/skeinBlockNo3F.c       |  2 +-
 drivers/staging/skein/skein_block.c          |  2 +-
 drivers/staging/skein/threefish1024Block.c   |  3 +-
 drivers/staging/skein/threefish256Block.c    |  3 +-
 drivers/staging/skein/threefish512Block.c    |  3 +-
 drivers/staging/skein/threefishApi.c         |  3 +-
 18 files changed, 59 insertions(+), 125 deletions(-)
 delete mode 100755 drivers/staging/skein/CMakeLists.txt
 create mode 100644 drivers/staging/skein/Kconfig
 create mode 100644 drivers/staging/skein/Makefile

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 99375f0a9440..fbcbe833dc89 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -146,4 +146,6 @@ source "drivers/staging/dgnc/Kconfig"
 
 source "drivers/staging/dgap/Kconfig"
 
+source "drivers/staging/skein/Kconfig"
+
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index ddc3c4a5d39d..2dee51d1a483 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -65,3 +65,4 @@ obj-$(CONFIG_XILLYBUS)		+= xillybus/
 obj-$(CONFIG_DGNC)			+= dgnc/
 obj-$(CONFIG_DGAP)			+= dgap/
 obj-$(CONFIG_MTD_SPINAND_MT29F)	+= mt29f_spinand/
+obj-$(CONFIG_CRYPTO_SKEIN) += skein/
diff --git a/drivers/staging/skein/CMakeLists.txt b/drivers/staging/skein/CMakeLists.txt
deleted file mode 100755
index 604aaa394cb1..000000000000
--- a/drivers/staging/skein/CMakeLists.txt
+++ /dev/null
@@ -1,27 +0,0 @@
-cmake_minimum_required (VERSION 2.6)
-
-include_directories (${CMAKE_CURRENT_SOURCE_DIR}/include)
-
-# set(skeinBlock_src skein_block.c)
-set(skeinBlock_src skeinBlockNo3F.c)
-
-set(skein_src 
-    ${skeinBlock_src}
-    skein.c
-    skeinApi.c
-    )
-
-set(threefish_src
-    threefishApi.c
-    threefish256Block.c
-    threefish512Block.c
-    threefish1024Block.c
-    )
-set(s3f_src ${skein_src} ${threefish_src})
-
-add_library(skein3fish SHARED ${s3f_src})
-set_target_properties(skein3fish PROPERTIES VERSION ${VERSION} SOVERSION ${SOVERSION})
-target_link_libraries(skein3fish ${LIBS})
-
-install(TARGETS skein3fish DESTINATION ${LIBDIRNAME})
-
diff --git a/drivers/staging/skein/Kconfig b/drivers/staging/skein/Kconfig
new file mode 100644
index 000000000000..8f5a72a90ced
--- /dev/null
+++ b/drivers/staging/skein/Kconfig
@@ -0,0 +1,32 @@
+config CRYPTO_SKEIN
+	bool "Skein digest algorithm"
+	depends on (X86 || UML_X86) && 64BIT
+	select CRYPTO_THREEFISH
+	select CRYPTO_HASH
+	help
+	  Skein secure hash algorithm is one of 5 finalists from the NIST SHA3
+	  competition.
+
+	  Skein is optimized for modern, 64bit processors and is highly
+	  customizable.  See:
+
+	  http://www.skein-hash.info/sites/default/files/skein1.3.pdf
+
+	  for more information.  This module depends on the threefish block
+	  cipher module.
+
+config CRYPTO_THREEFISH
+	bool "Threefish tweakable block cipher"
+	depends on (X86 || UML_X86) && 64BIT
+	select CRYPTO_ALGAPI
+	help
+	  Threefish cipher algorithm is the tweakable block cipher underneath
+	  the Skein family of secure hash algorithms.  Skein is one of 5
+	  finalists from the NIST SHA3 competition.
+
+	  Skein is optimized for modern, 64bit processors and is highly
+	  customizable.  See:
+
+	  http://www.skein-hash.info/sites/default/files/skein1.3.pdf
+
+	  for more information.
diff --git a/drivers/staging/skein/Makefile b/drivers/staging/skein/Makefile
new file mode 100644
index 000000000000..2bb386e1e58c
--- /dev/null
+++ b/drivers/staging/skein/Makefile
@@ -0,0 +1,13 @@
+#
+# Makefile for the skein secure hash algorithm
+#
+subdir-ccflags-y := -I$(src)/include/
+
+obj-$(CONFIG_CRYPTO_SKEIN) +=   skein.o \
+				skeinApi.o \
+				skein_block.o
+
+obj-$(CONFIG_CRYPTO_THREEFISH) += threefish1024Block.o \
+				  threefish256Block.o \
+				  threefish512Block.o \
+				  threefishApi.o
diff --git a/drivers/staging/skein/include/brg_types.h b/drivers/staging/skein/include/brg_types.h
index 6db737d71b9e..3d9fe0df5238 100644
--- a/drivers/staging/skein/include/brg_types.h
+++ b/drivers/staging/skein/include/brg_types.h
@@ -46,83 +46,26 @@
 extern "C" {
 #endif
 
-#include <limits.h>
-
 #ifndef BRG_UI8
 #  define BRG_UI8
-#  if UCHAR_MAX == 255u
      typedef unsigned char uint_8t;
-#  else
-#    error Please define uint_8t as an 8-bit unsigned integer type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI16
 #  define BRG_UI16
-#  if USHRT_MAX == 65535u
      typedef unsigned short uint_16t;
-#  else
-#    error Please define uint_16t as a 16-bit unsigned short type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI32
 #  define BRG_UI32
-#  if UINT_MAX == 4294967295u
 #    define li_32(h) 0x##h##u
      typedef unsigned int uint_32t;
-#  elif ULONG_MAX == 4294967295u
-#    define li_32(h) 0x##h##ul
-     typedef unsigned long uint_32t;
-#  elif defined( _CRAY )
-#    error This code needs 32-bit data types, which Cray machines do not provide
-#  else
-#    error Please define uint_32t as a 32-bit unsigned integer type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI64
-#  if defined( __BORLANDC__ ) && !defined( __MSDOS__ )
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ui64
-     typedef unsigned __int64 uint_64t;
-#  elif defined( _MSC_VER ) && ( _MSC_VER < 1300 )    /* 1300 == VC++ 7.0 */
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ui64
-     typedef unsigned __int64 uint_64t;
-#  elif defined( __sun ) && defined(ULONG_MAX) && ULONG_MAX == 0xfffffffful
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ull
-     typedef unsigned long long uint_64t;
-#  elif defined( UINT_MAX ) && UINT_MAX > 4294967295u
-#    if UINT_MAX == 18446744073709551615u
-#      define BRG_UI64
-#      define li_64(h) 0x##h##u
-       typedef unsigned int uint_64t;
-#    endif
-#  elif defined( ULONG_MAX ) && ULONG_MAX > 4294967295u
-#    if ULONG_MAX == 18446744073709551615ul
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ul
-       typedef unsigned long uint_64t;
-#    endif
-#  elif defined( ULLONG_MAX ) && ULLONG_MAX > 4294967295u
-#    if ULLONG_MAX == 18446744073709551615ull
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ull
-       typedef unsigned long long uint_64t;
-#    endif
-#  elif defined( ULONG_LONG_MAX ) && ULONG_LONG_MAX > 4294967295u
-#    if ULONG_LONG_MAX == 18446744073709551615ull
 #      define BRG_UI64
 #      define li_64(h) 0x##h##ull
        typedef unsigned long long uint_64t;
-#    endif
-#  elif defined(__GNUC__)  /* DLW: avoid mingw problem with -ansi */
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ull
-       typedef unsigned long long uint_64t;
-#  endif
 #endif
 
 #if defined( NEED_UINT_64T ) && !defined( BRG_UI64 )
diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index cb613fa09d9e..315cdcd14413 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -261,18 +261,8 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define Skein_Show_Key(bits,ctx,key,keyBytes)
 #endif
 
-#ifndef SKEIN_ERR_CHECK        /* run-time checks (e.g., bad params, uninitialized context)? */
 #define Skein_Assert(x,retCode)/* default: ignore all Asserts, for performance */
 #define Skein_assert(x)
-#elif   defined(SKEIN_ASSERT)
-#include <assert.h>     
-#define Skein_Assert(x,retCode) assert(x) 
-#define Skein_assert(x)         assert(x) 
-#else
-#include <assert.h>     
-#define Skein_Assert(x,retCode) { if (!(x)) return retCode; } /*  caller  error */
-#define Skein_assert(x)         assert(x)                     /* internal error */
-#endif
 
 /*****************************************************************
 ** Skein block function constants (shared across Ref and Opt code)
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 19c3225460fc..734d27b79f01 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -78,8 +78,8 @@ OTHER DEALINGS IN THE SOFTWARE.
  * 
  */
 
+#include <linux/types.h>
 #include <skein.h>
-#include <stdint.h>
 
 #ifdef __cplusplus
 extern "C"
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
index 18d892553c8d..1c68070358ce 100644
--- a/drivers/staging/skein/include/skein_port.h
+++ b/drivers/staging/skein/include/skein_port.h
@@ -44,24 +44,10 @@ typedef uint_64t        u64b_t;             /* 64-bit unsigned integer */
  * platform-specific code instead (e.g., for big-endian CPUs).
  *
  */
-#ifndef SKEIN_NEED_SWAP /* compile-time "override" for endianness? */
-
-#include <brg_endian.h>              /* get endianness selection */
-#if   PLATFORM_BYTE_ORDER == IS_BIG_ENDIAN
-    /* here for big-endian CPUs */
-#define SKEIN_NEED_SWAP   (1)
-#elif PLATFORM_BYTE_ORDER == IS_LITTLE_ENDIAN
-    /* here for x86 and x86-64 CPUs (and other detected little-endian CPUs) */
 #define SKEIN_NEED_SWAP   (0)
-#if   PLATFORM_MUST_ALIGN == 0              /* ok to use "fast" versions? */
+/* below two prototype assume we are handed aligned data */
 #define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
 #define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
-#endif
-#else
-#error "Skein needs endianness setting!"
-#endif
-
-#endif /* ifndef SKEIN_NEED_SWAP */
 
 /*
  ******************************************************************
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 85afd72fe987..dae270cf71d3 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -28,8 +28,8 @@
 @endcode
  */
 
+#include <linux/types.h>
 #include <skein.h>
-#include <stdint.h>
 
 #define KeyScheduleConst 0x1BD11BDAA9FC1A22L
 
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index f0b176ac1dc7..3fae6fdf7c75 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -10,7 +10,7 @@
 
 #define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
 
-#include <string.h>       /* get the memcpy/memset functions */
+#include <linux/string.h>       /* get the memcpy/memset functions */
 #include <skein.h> /* get the Skein API definitions   */
 #include <skein_iv.h>    /* get precomputed IVs */
 
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 7b963758d32c..579b92efbf65 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -24,10 +24,8 @@ OTHER DEALINGS IN THE SOFTWARE.
 
 */
 
-#define SKEIN_ERR_CHECK 1
+#include <linux/string.h>
 #include <skeinApi.h>
-#include <string.h>
-#include <stdio.h>
 
 int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size)
 {
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 4ad6c50360e7..6a19ceb17d0f 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -1,5 +1,5 @@
 
-#include <string.h>
+#include <linux/string.h>
 #include <skein.h>
 #include <threefishApi.h>
 
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 86724a2443b5..b5be41af6d17 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -14,7 +14,7 @@
 **
 ************************************************************************/
 
-#include <string.h>
+#include <linux/string.h>
 #include <skein.h>
 
 #ifndef SKEIN_USE_ASM
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 8b43586f46bc..58a8c26a1f6f 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index db2b81978c91..a7e06f905186 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 4fe708fea066..3cbfcd9af5c9 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 5afa0338aef4..968d3d21fe61 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -1,8 +1,7 @@
 
 
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdlib.h>
-#include <string.h>
 
 void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
                      uint64_t* keyData, uint64_t* tweak)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 04/22] staging: crypto: skein: remove brg_*.h includes
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (2 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 03/22] staging: crypto: skein: allow building statically Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 05/22] staging: crypto: skein: remove skein_port.h Jason Cooper
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/brg_endian.h | 148 -----------------------------
 drivers/staging/skein/include/brg_types.h  | 131 -------------------------
 drivers/staging/skein/include/skein_port.h |   6 +-
 3 files changed, 2 insertions(+), 283 deletions(-)
 delete mode 100644 drivers/staging/skein/include/brg_endian.h
 delete mode 100644 drivers/staging/skein/include/brg_types.h

diff --git a/drivers/staging/skein/include/brg_endian.h b/drivers/staging/skein/include/brg_endian.h
deleted file mode 100644
index c03c7c5d1eb4..000000000000
--- a/drivers/staging/skein/include/brg_endian.h
+++ /dev/null
@@ -1,148 +0,0 @@
-/*
- ---------------------------------------------------------------------------
- Copyright (c) 2003, Dr Brian Gladman, Worcester, UK.   All rights reserved.
-
- LICENSE TERMS
-
- The free distribution and use of this software in both source and binary
- form is allowed (with or without changes) provided that:
-
-   1. distributions of this source code include the above copyright
-      notice, this list of conditions and the following disclaimer;
-
-   2. distributions in binary form include the above copyright
-      notice, this list of conditions and the following disclaimer
-      in the documentation and/or other associated materials;
-
-   3. the copyright holder's name is not used to endorse products
-      built using this software without specific written permission.
-
- ALTERNATIVELY, provided that this notice is retained in full, this product
- may be distributed under the terms of the GNU General Public License (GPL),
- in which case the provisions of the GPL apply INSTEAD OF those given above.
-
- DISCLAIMER
-
- This software is provided 'as is' with no explicit or implied warranties
- in respect of its properties, including, but not limited to, correctness
- and/or fitness for purpose.
- ---------------------------------------------------------------------------
- Issue 20/10/2006
-*/
-
-#ifndef BRG_ENDIAN_H
-#define BRG_ENDIAN_H
-
-#define IS_BIG_ENDIAN      4321 /* byte 0 is most significant (mc68k) */
-#define IS_LITTLE_ENDIAN   1234 /* byte 0 is least significant (i386) */
-
-/* Include files where endian defines and byteswap functions may reside */
-#if defined( __FreeBSD__ ) || defined( __OpenBSD__ ) || defined( __NetBSD__ )
-#  include <sys/endian.h>
-#elif defined( BSD ) && ( BSD >= 199103 ) || defined( __APPLE__ ) || \
-      defined( __CYGWIN32__ ) || defined( __DJGPP__ ) || defined( __osf__ )
-#  include <machine/endian.h>
-#elif defined( __linux__ ) || defined( __GNUC__ ) || defined( __GNU_LIBRARY__ )
-#  if !defined( __MINGW32__ ) && !defined(AVR)
-#    include <endian.h>
-#    if !defined( __BEOS__ )
-#      include <byteswap.h>
-#    endif
-#  endif
-#endif
-
-/* Now attempt to set the define for platform byte order using any  */
-/* of the four forms SYMBOL, _SYMBOL, __SYMBOL & __SYMBOL__, which  */
-/* seem to encompass most endian symbol definitions                 */
-
-#if defined( BIG_ENDIAN ) && defined( LITTLE_ENDIAN )
-#  if defined( BYTE_ORDER ) && BYTE_ORDER == BIG_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#  elif defined( BYTE_ORDER ) && BYTE_ORDER == LITTLE_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#  endif
-#elif defined( BIG_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#elif defined( LITTLE_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-
-#if defined( _BIG_ENDIAN ) && defined( _LITTLE_ENDIAN )
-#  if defined( _BYTE_ORDER ) && _BYTE_ORDER == _BIG_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#  elif defined( _BYTE_ORDER ) && _BYTE_ORDER == _LITTLE_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#  endif
-#elif defined( _BIG_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#elif defined( _LITTLE_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-
-#if defined( __BIG_ENDIAN ) && defined( __LITTLE_ENDIAN )
-#  if defined( __BYTE_ORDER ) && __BYTE_ORDER == __BIG_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#  elif defined( __BYTE_ORDER ) && __BYTE_ORDER == __LITTLE_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#  endif
-#elif defined( __BIG_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#elif defined( __LITTLE_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-
-#if defined( __BIG_ENDIAN__ ) && defined( __LITTLE_ENDIAN__ )
-#  if defined( __BYTE_ORDER__ ) && __BYTE_ORDER__ == __BIG_ENDIAN__
-#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#  elif defined( __BYTE_ORDER__ ) && __BYTE_ORDER__ == __LITTLE_ENDIAN__
-#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#  endif
-#elif defined( __BIG_ENDIAN__ )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#elif defined( __LITTLE_ENDIAN__ )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-
-/*  if the platform byte order could not be determined, then try to */
-/*  set this define using common machine defines                    */
-#if !defined(PLATFORM_BYTE_ORDER)
-
-#if   defined( __alpha__ ) || defined( __alpha ) || defined( i386 )       || \
-      defined( __i386__ )  || defined( _M_I86 )  || defined( _M_IX86 )    || \
-      defined( __OS2__ )   || defined( sun386 )  || defined( __TURBOC__ ) || \
-      defined( vax )       || defined( vms )     || defined( VMS )        || \
-      defined( __VMS )     || defined( _M_X64 )  || defined( AVR )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-
-#elif defined( AMIGA )   || defined( applec )    || defined( __AS400__ )  || \
-      defined( _CRAY )   || defined( __hppa )    || defined( __hp9000 )   || \
-      defined( ibm370 )  || defined( mc68000 )   || defined( m68k )       || \
-      defined( __MRC__ ) || defined( __MVS__ )   || defined( __MWERKS__ ) || \
-      defined( sparc )   || defined( __sparc)    || defined( SYMANTEC_C ) || \
-      defined( __VOS__ ) || defined( __TIGCC__ ) || defined( __TANDEM )   || \
-      defined( THINK_C ) || defined( __VMCMS__ )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-
-#elif 0     /* **** EDIT HERE IF NECESSARY **** */
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#elif 0     /* **** EDIT HERE IF NECESSARY **** */
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#else
-#  error Please edit lines 126 or 128 in brg_endian.h to set the platform byte order
-#endif
-#endif
-
-/* special handler for IA64, which may be either endianness (?)  */
-/* here we assume little-endian, but this may need to be changed */
-#if defined(__ia64) || defined(__ia64__) || defined(_M_IA64)
-#  define PLATFORM_MUST_ALIGN (1)
-#ifndef PLATFORM_BYTE_ORDER
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-#endif
-
-#ifndef   PLATFORM_MUST_ALIGN
-#  define PLATFORM_MUST_ALIGN (0)
-#endif
-
-#endif  /* ifndef BRG_ENDIAN_H */
diff --git a/drivers/staging/skein/include/brg_types.h b/drivers/staging/skein/include/brg_types.h
deleted file mode 100644
index 3d9fe0df5238..000000000000
--- a/drivers/staging/skein/include/brg_types.h
+++ /dev/null
@@ -1,131 +0,0 @@
-/*
- ---------------------------------------------------------------------------
- Copyright (c) 1998-2006, Brian Gladman, Worcester, UK. All rights reserved.
-
- LICENSE TERMS
-
- The free distribution and use of this software in both source and binary
- form is allowed (with or without changes) provided that:
-
-   1. distributions of this source code include the above copyright
-      notice, this list of conditions and the following disclaimer;
-
-   2. distributions in binary form include the above copyright
-      notice, this list of conditions and the following disclaimer
-      in the documentation and/or other associated materials;
-
-   3. the copyright holder's name is not used to endorse products
-      built using this software without specific written permission.
-
- ALTERNATIVELY, provided that this notice is retained in full, this product
- may be distributed under the terms of the GNU General Public License (GPL),
- in which case the provisions of the GPL apply INSTEAD OF those given above.
-
- DISCLAIMER
-
- This software is provided 'as is' with no explicit or implied warranties
- in respect of its properties, including, but not limited to, correctness
- and/or fitness for purpose.
- ---------------------------------------------------------------------------
- Issue 09/09/2006
-
- The unsigned integer types defined here are of the form uint_<nn>t where
- <nn> is the length of the type; for example, the unsigned 32-bit type is
- 'uint_32t'.  These are NOT the same as the 'C99 integer types' that are
- defined in the inttypes.h and stdint.h headers since attempts to use these
- types have shown that support for them is still highly variable.  However,
- since the latter are of the form uint<nn>_t, a regular expression search
- and replace (in VC++ search on 'uint_{:z}t' and replace with 'uint\1_t')
- can be used to convert the types used here to the C99 standard types.
-*/
-
-#ifndef BRG_TYPES_H
-#define BRG_TYPES_H
-
-#if defined(__cplusplus)
-extern "C" {
-#endif
-
-#ifndef BRG_UI8
-#  define BRG_UI8
-     typedef unsigned char uint_8t;
-#endif
-
-#ifndef BRG_UI16
-#  define BRG_UI16
-     typedef unsigned short uint_16t;
-#endif
-
-#ifndef BRG_UI32
-#  define BRG_UI32
-#    define li_32(h) 0x##h##u
-     typedef unsigned int uint_32t;
-#endif
-
-#ifndef BRG_UI64
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ull
-       typedef unsigned long long uint_64t;
-#endif
-
-#if defined( NEED_UINT_64T ) && !defined( BRG_UI64 )
-#  error Please define uint_64t as an unsigned 64 bit type in brg_types.h
-#endif
-
-#ifndef RETURN_VALUES
-#  define RETURN_VALUES
-#  if defined( DLL_EXPORT )
-#    if defined( _MSC_VER ) || defined ( __INTEL_COMPILER )
-#      define VOID_RETURN    __declspec( dllexport ) void __stdcall
-#      define INT_RETURN     __declspec( dllexport ) int  __stdcall
-#    elif defined( __GNUC__ )
-#      define VOID_RETURN    __declspec( __dllexport__ ) void
-#      define INT_RETURN     __declspec( __dllexport__ ) int
-#    else
-#      error Use of the DLL is only available on the Microsoft, Intel and GCC compilers
-#    endif
-#  elif defined( DLL_IMPORT )
-#    if defined( _MSC_VER ) || defined ( __INTEL_COMPILER )
-#      define VOID_RETURN    __declspec( dllimport ) void __stdcall
-#      define INT_RETURN     __declspec( dllimport ) int  __stdcall
-#    elif defined( __GNUC__ )
-#      define VOID_RETURN    __declspec( __dllimport__ ) void
-#      define INT_RETURN     __declspec( __dllimport__ ) int
-#    else
-#      error Use of the DLL is only available on the Microsoft, Intel and GCC compilers
-#    endif
-#  elif defined( __WATCOMC__ )
-#    define VOID_RETURN  void __cdecl
-#    define INT_RETURN   int  __cdecl
-#  else
-#    define VOID_RETURN  void
-#    define INT_RETURN   int
-#  endif
-#endif
-
-/*  These defines are used to declare buffers in a way that allows
-    faster operations on longer variables to be used.  In all these
-    defines 'size' must be a power of 2 and >= 8
-
-    dec_unit_type(size,x)       declares a variable 'x' of length 
-                                'size' bits
-
-    dec_bufr_type(size,bsize,x) declares a buffer 'x' of length 'bsize' 
-                                bytes defined as an array of variables
-                                each of 'size' bits (bsize must be a 
-                                multiple of size / 8)
-
-    ptr_cast(x,size)            casts a pointer to a pointer to a 
-                                varaiable of length 'size' bits
-*/
-
-#define ui_type(size)               uint_##size##t
-#define dec_unit_type(size,x)       typedef ui_type(size) x
-#define dec_bufr_type(size,bsize,x) typedef ui_type(size) x[bsize / (size >> 3)]
-#define ptr_cast(x,size)            ((ui_type(size)*)(x))
-
-#if defined(__cplusplus)
-}
-#endif
-
-#endif
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
index 1c68070358ce..e78c976dccc5 100644
--- a/drivers/staging/skein/include/skein_port.h
+++ b/drivers/staging/skein/include/skein_port.h
@@ -15,11 +15,9 @@
 ** 
 ********************************************************************/
 
-#include <brg_types.h>                      /* get integer type definitions */
-
 typedef unsigned int    uint_t;             /* native unsigned integer */
-typedef uint_8t         u08b_t;             /*  8-bit unsigned integer */
-typedef uint_64t        u64b_t;             /* 64-bit unsigned integer */
+typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
+typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
 
 #ifndef RotL_64
 #define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 05/22] staging: crypto: skein: remove skein_port.h
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (3 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 04/22] staging: crypto: skein: remove brg_*.h includes Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 06/22] staging: crypto: skein: remove __cplusplus and an unneeded stddef.h Jason Cooper
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h      |  14 +++-
 drivers/staging/skein/include/skein_port.h | 108 -----------------------------
 drivers/staging/skein/skein.c              |  21 ------
 3 files changed, 13 insertions(+), 130 deletions(-)
 delete mode 100644 drivers/staging/skein/include/skein_port.h

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 315cdcd14413..211aca1b1036 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -33,7 +33,19 @@ extern "C"
 #endif
 
 #include <stddef.h>                          /* get size_t definition */
-#include <skein_port.h>               /* get platform-specific definitions */
+
+typedef unsigned int    uint_t;             /* native unsigned integer */
+typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
+typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
+
+#ifndef RotL_64
+#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
+#endif
+
+/* below two prototype assume we are handed aligned data */
+#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
+#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
+#define Skein_Swap64(w64)  (w64)
 
 enum
     {
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
deleted file mode 100644
index e78c976dccc5..000000000000
--- a/drivers/staging/skein/include/skein_port.h
+++ /dev/null
@@ -1,108 +0,0 @@
-#ifndef _SKEIN_PORT_H_
-#define _SKEIN_PORT_H_
-/*******************************************************************
-**
-** Platform-specific definitions for Skein hash function.
-**
-** Source code author: Doug Whiting, 2008.
-**
-** This algorithm and source code is released to the public domain.
-**
-** Many thanks to Brian Gladman for his portable header files.
-**
-** To port Skein to an "unsupported" platform, change the definitions
-** in this file appropriately.
-** 
-********************************************************************/
-
-typedef unsigned int    uint_t;             /* native unsigned integer */
-typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
-typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
-
-#ifndef RotL_64
-#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
-#endif
-
-/*
- * Skein is "natively" little-endian (unlike SHA-xxx), for optimal
- * performance on x86 CPUs.  The Skein code requires the following
- * definitions for dealing with endianness:
- *
- *    SKEIN_NEED_SWAP:  0 for little-endian, 1 for big-endian
- *    Skein_Put64_LSB_First
- *    Skein_Get64_LSB_First
- *    Skein_Swap64
- *
- * If SKEIN_NEED_SWAP is defined at compile time, it is used here
- * along with the portable versions of Put64/Get64/Swap64, which 
- * are slow in general.
- *
- * Otherwise, an "auto-detect" of endianness is attempted below.
- * If the default handling doesn't work well, the user may insert
- * platform-specific code instead (e.g., for big-endian CPUs).
- *
- */
-#define SKEIN_NEED_SWAP   (0)
-/* below two prototype assume we are handed aligned data */
-#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
-#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
-
-/*
- ******************************************************************
- *      Provide any definitions still needed.
- ******************************************************************
- */
-#ifndef Skein_Swap64  /* swap for big-endian, nop for little-endian */
-#if     SKEIN_NEED_SWAP
-#define Skein_Swap64(w64)                       \
-  ( (( ((u64b_t)(w64))       & 0xFF) << 56) |   \
-    (((((u64b_t)(w64)) >> 8) & 0xFF) << 48) |   \
-    (((((u64b_t)(w64)) >>16) & 0xFF) << 40) |   \
-    (((((u64b_t)(w64)) >>24) & 0xFF) << 32) |   \
-    (((((u64b_t)(w64)) >>32) & 0xFF) << 24) |   \
-    (((((u64b_t)(w64)) >>40) & 0xFF) << 16) |   \
-    (((((u64b_t)(w64)) >>48) & 0xFF) <<  8) |   \
-    (((((u64b_t)(w64)) >>56) & 0xFF)      ) )
-#else
-#define Skein_Swap64(w64)  (w64)
-#endif
-#endif  /* ifndef Skein_Swap64 */
-
-
-#ifndef Skein_Put64_LSB_First
-void    Skein_Put64_LSB_First(u08b_t *dst,const u64b_t *src,size_t bCnt)
-#ifdef  SKEIN_PORT_CODE /* instantiate the function code here? */
-    { /* this version is fully portable (big-endian or little-endian), but slow */
-    size_t n;
-
-    for (n=0;n<bCnt;n++)
-        dst[n] = (u08b_t) (src[n>>3] >> (8*(n&7)));
-    }
-#else
-    ;    /* output only the function prototype */
-#endif
-#endif   /* ifndef Skein_Put64_LSB_First */
-
-
-#ifndef Skein_Get64_LSB_First
-void    Skein_Get64_LSB_First(u64b_t *dst,const u08b_t *src,size_t wCnt)
-#ifdef  SKEIN_PORT_CODE /* instantiate the function code here? */
-    { /* this version is fully portable (big-endian or little-endian), but slow */
-    size_t n;
-
-    for (n=0;n<8*wCnt;n+=8)
-        dst[n/8] = (((u64b_t) src[n  ])      ) +
-                   (((u64b_t) src[n+1]) <<  8) +
-                   (((u64b_t) src[n+2]) << 16) +
-                   (((u64b_t) src[n+3]) << 24) +
-                   (((u64b_t) src[n+4]) << 32) +
-                   (((u64b_t) src[n+5]) << 40) +
-                   (((u64b_t) src[n+6]) << 48) +
-                   (((u64b_t) src[n+7]) << 56) ;
-    }
-#else
-    ;    /* output only the function prototype */
-#endif
-#endif   /* ifndef Skein_Get64_LSB_First */
-
-#endif   /* ifndef _SKEIN_PORT_H_ */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 3fae6fdf7c75..44468b6701ab 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -102,13 +102,6 @@ int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
         Skein_256_Update(ctx,key,keyBytes);     /* hash the key */
         Skein_256_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
         memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
-#if SKEIN_NEED_SWAP
-        {
-            uint_t i;
-            for (i=0;i<SKEIN_256_STATE_WORDS;i++)   /* convert key bytes to context words */
-                ctx->X[i] = Skein_Swap64(ctx->X[i]);
-        }
-#endif
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
@@ -297,13 +290,6 @@ int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
         Skein_512_Update(ctx,key,keyBytes);     /* hash the key */
         Skein_512_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
         memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
-#if SKEIN_NEED_SWAP
-        {
-            uint_t i;
-            for (i=0;i<SKEIN_512_STATE_WORDS;i++)   /* convert key bytes to context words */
-                ctx->X[i] = Skein_Swap64(ctx->X[i]);
-        }
-#endif
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
@@ -489,13 +475,6 @@ int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
         Skein1024_Update(ctx,key,keyBytes);     /* hash the key */
         Skein1024_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
         memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
-#if SKEIN_NEED_SWAP
-        {
-            uint_t i;
-            for (i=0;i<SKEIN1024_STATE_WORDS;i++)   /* convert key bytes to context words */
-                ctx->X[i] = Skein_Swap64(ctx->X[i]);
-        }
-#endif
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 06/22] staging: crypto: skein: remove __cplusplus and an unneeded stddef.h
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (4 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 05/22] staging: crypto: skein: remove skein_port.h Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 07/22] staging: crypto: skein: remove unneeded typedefs Jason Cooper
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 11 -----------
 drivers/staging/skein/include/skeinApi.h     |  9 ---------
 drivers/staging/skein/include/threefishApi.h |  9 ---------
 3 files changed, 29 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 211aca1b1036..b1e55b08d150 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -27,13 +27,6 @@
 **                                1: return SKEIN_FAIL to flag errors
 **
 ***************************************************************************/
-#ifdef __cplusplus
-extern "C"
-{
-#endif
-
-#include <stddef.h>                          /* get size_t definition */
-
 typedef unsigned int    uint_t;             /* native unsigned integer */
 typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
 typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
@@ -322,8 +315,4 @@ enum
 #define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS    ) + 5) % 10) + 5))
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif  /* ifndef _SKEIN_H_ */
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 734d27b79f01..f55c67e81f2b 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -81,11 +81,6 @@ OTHER DEALINGS IN THE SOFTWARE.
 #include <linux/types.h>
 #include <skein.h>
 
-#ifdef __cplusplus
-extern "C"
-{
-#endif
-
     /**
      * Which Skein size to use
      */
@@ -229,10 +224,6 @@ extern "C"
      */
     int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash);
 
-#ifdef __cplusplus
-}
-#endif
-
 /**
  * @}
  */
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index dae270cf71d3..aaecfe822142 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -33,11 +33,6 @@
 
 #define KeyScheduleConst 0x1BD11BDAA9FC1A22L
 
-#ifdef __cplusplus
-extern "C"
-{
-#endif
-
     /**
      * Which Threefish size to use
      */
@@ -157,10 +152,6 @@ extern "C"
     void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
     void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
     void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-#ifdef __cplusplus
-}
-#endif
-
 /**
  * @}
  */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 07/22] staging: crypto: skein: remove unneeded typedefs
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (5 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 06/22] staging: crypto: skein: remove __cplusplus and an unneeded stddef.h Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 08/22] staging: crypto: skein: remove all typedef {struct,enum} Jason Cooper
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 73 ++++++++++-----------
 drivers/staging/skein/include/skeinApi.h     |  4 +-
 drivers/staging/skein/include/skein_iv.h     | 26 ++++----
 drivers/staging/skein/include/threefishApi.h |  6 +-
 drivers/staging/skein/skein.c                | 96 ++++++++++++++--------------
 drivers/staging/skein/skeinApi.c             | 24 +++----
 drivers/staging/skein/skeinBlockNo3F.c       | 30 ++++-----
 drivers/staging/skein/skein_block.c          | 54 ++++++++--------
 drivers/staging/skein/threefishApi.c         |  8 +--
 9 files changed, 159 insertions(+), 162 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index b1e55b08d150..12c5c8d612b0 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -27,9 +27,6 @@
 **                                1: return SKEIN_FAIL to flag errors
 **
 ***************************************************************************/
-typedef unsigned int    uint_t;             /* native unsigned integer */
-typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
-typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
 
 #ifndef RotL_64
 #define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
@@ -70,28 +67,28 @@ typedef struct
     {
     size_t  hashBitLen;                      /* size of hash result, in bits */
     size_t  bCnt;                            /* current byte count in buffer b[] */
-    u64b_t  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+    u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
     } Skein_Ctxt_Hdr_t;
 
 typedef struct                               /*  256-bit Skein hash context structure */
     {
     Skein_Ctxt_Hdr_t h;                      /* common header context variables */
-    u64b_t  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
-    u08b_t  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
     } Skein_256_Ctxt_t;
 
 typedef struct                               /*  512-bit Skein hash context structure */
     {
     Skein_Ctxt_Hdr_t h;                      /* common header context variables */
-    u64b_t  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
-    u08b_t  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
     } Skein_512_Ctxt_t;
 
 typedef struct                               /* 1024-bit Skein hash context structure */
     {
     Skein_Ctxt_Hdr_t h;                      /* common header context variables */
-    u64b_t  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
-    u08b_t  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
     } Skein1024_Ctxt_t;
 
 /*   Skein APIs for (incremental) "straight hashing" */
@@ -99,13 +96,13 @@ int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
 int  Skein_512_Init  (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
 int  Skein1024_Init  (Skein1024_Ctxt_t *ctx, size_t hashBitLen);
 
-int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
-int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
-int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
+int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
 
-int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u8 * hashVal);
+int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u8 * hashVal);
 
 /*
 **   Skein APIs for "extended" initialization: MAC keys, tree hashing.
@@ -121,26 +118,26 @@ int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 **              to precompute the MAC IV, then a copy of the context saved and
 **              reused for each new MAC computation.
 **/
-int  Skein_256_InitExt(Skein_256_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
-int  Skein_512_InitExt(Skein_512_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
-int  Skein1024_InitExt(Skein1024_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
+int  Skein_256_InitExt(Skein_256_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_512_InitExt(Skein_512_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein1024_InitExt(Skein1024_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
 
 /*
 **   Skein APIs for MAC and tree hash:
 **      Final_Pad:  pad, do final block, but no OUTPUT type
 **      Output:     do just the output stage
 */
-int  Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 * hashVal);
+int  Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 * hashVal);
 
 #ifndef SKEIN_TREE_HASH
 #define SKEIN_TREE_HASH (1)
 #endif
 #if  SKEIN_TREE_HASH
-int  Skein_256_Output   (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein_512_Output   (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_256_Output   (Skein_256_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_512_Output   (Skein_512_Ctxt_t *ctx, u8 * hashVal);
+int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u8 * hashVal);
 #endif
 
 /*****************************************************************
@@ -161,13 +158,13 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
                                 
 /* tweak word T[1]: flag bit definition(s) */
-#define SKEIN_T1_FLAG_FIRST     (((u64b_t)  1 ) << SKEIN_T1_POS_FIRST)
-#define SKEIN_T1_FLAG_FINAL     (((u64b_t)  1 ) << SKEIN_T1_POS_FINAL)
-#define SKEIN_T1_FLAG_BIT_PAD   (((u64b_t)  1 ) << SKEIN_T1_POS_BIT_PAD)
+#define SKEIN_T1_FLAG_FIRST     (((u64)  1 ) << SKEIN_T1_POS_FIRST)
+#define SKEIN_T1_FLAG_FINAL     (((u64)  1 ) << SKEIN_T1_POS_FINAL)
+#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1 ) << SKEIN_T1_POS_BIT_PAD)
                                 
 /* tweak word T[1]: tree level bit field mask */
-#define SKEIN_T1_TREE_LVL_MASK  (((u64b_t)0x7F) << SKEIN_T1_POS_TREE_LVL)
-#define SKEIN_T1_TREE_LEVEL(n)  (((u64b_t) (n)) << SKEIN_T1_POS_TREE_LVL)
+#define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
+#define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
 
 /* tweak word T[1]: block type field */
 #define SKEIN_BLK_TYPE_KEY      ( 0)                    /* key, for MAC and KDF */
@@ -180,7 +177,7 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
 #define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
 
-#define SKEIN_T1_BLK_TYPE(T)   (((u64b_t) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
+#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
 #define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
 #define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
 #define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
@@ -200,7 +197,7 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
 #endif
 
-#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64b_t) (hi32)) << 32))
+#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64) (hi32)) << 32))
 #define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION,SKEIN_ID_STRING_LE)
 #define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA,0xA9FC1A22)
 
@@ -211,14 +208,14 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define SKEIN_CFG_TREE_NODE_SIZE_POS  ( 8)
 #define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
 
-#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
-#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
-#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
+#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
+#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
+#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
 
 #define SKEIN_CFG_TREE_INFO(leaf,node,maxLvl)                   \
-    ( (((u64b_t)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
-      (((u64b_t)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
-      (((u64b_t)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )
+    ( (((u64)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+      (((u64)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+      (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )
 
 #define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0,0,0) /* use as treeInfo in InitExt() call for sequential processing */
 
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index f55c67e81f2b..fb4a7c8e7f7a 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -99,8 +99,8 @@ OTHER DEALINGS IN THE SOFTWARE.
      * structures as well.
      */
     typedef struct SkeinCtx {
-        u64b_t skeinSize;
-        u64b_t  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
+        u64 skeinSize;
+        u64  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
         union {
             Skein_Ctxt_Hdr_t h;
             Skein_256_Ctxt_t s256;
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index 555ea619500b..94ac2f7cde76 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -20,7 +20,7 @@
 #define MK_64 SKEIN_MK_64
 
 /* blkSize =  256 bits. hashSize =  128 bits */
-const u64b_t SKEIN_256_IV_128[] =
+const u64 SKEIN_256_IV_128[] =
     {
     MK_64(0xE1111906,0x964D7260),
     MK_64(0x883DAAA7,0x7C8D811C),
@@ -29,7 +29,7 @@ const u64b_t SKEIN_256_IV_128[] =
     };
 
 /* blkSize =  256 bits. hashSize =  160 bits */
-const u64b_t SKEIN_256_IV_160[] =
+const u64 SKEIN_256_IV_160[] =
     {
     MK_64(0x14202314,0x72825E98),
     MK_64(0x2AC4E9A2,0x5A77E590),
@@ -38,7 +38,7 @@ const u64b_t SKEIN_256_IV_160[] =
     };
 
 /* blkSize =  256 bits. hashSize =  224 bits */
-const u64b_t SKEIN_256_IV_224[] =
+const u64 SKEIN_256_IV_224[] =
     {
     MK_64(0xC6098A8C,0x9AE5EA0B),
     MK_64(0x876D5686,0x08C5191C),
@@ -47,7 +47,7 @@ const u64b_t SKEIN_256_IV_224[] =
     };
 
 /* blkSize =  256 bits. hashSize =  256 bits */
-const u64b_t SKEIN_256_IV_256[] =
+const u64 SKEIN_256_IV_256[] =
     {
     MK_64(0xFC9DA860,0xD048B449),
     MK_64(0x2FCA6647,0x9FA7D833),
@@ -56,7 +56,7 @@ const u64b_t SKEIN_256_IV_256[] =
     };
 
 /* blkSize =  512 bits. hashSize =  128 bits */
-const u64b_t SKEIN_512_IV_128[] =
+const u64 SKEIN_512_IV_128[] =
     {
     MK_64(0xA8BC7BF3,0x6FBF9F52),
     MK_64(0x1E9872CE,0xBD1AF0AA),
@@ -69,7 +69,7 @@ const u64b_t SKEIN_512_IV_128[] =
     };
 
 /* blkSize =  512 bits. hashSize =  160 bits */
-const u64b_t SKEIN_512_IV_160[] =
+const u64 SKEIN_512_IV_160[] =
     {
     MK_64(0x28B81A2A,0xE013BD91),
     MK_64(0xC2F11668,0xB5BDF78F),
@@ -82,7 +82,7 @@ const u64b_t SKEIN_512_IV_160[] =
     };
 
 /* blkSize =  512 bits. hashSize =  224 bits */
-const u64b_t SKEIN_512_IV_224[] =
+const u64 SKEIN_512_IV_224[] =
     {
     MK_64(0xCCD06162,0x48677224),
     MK_64(0xCBA65CF3,0xA92339EF),
@@ -95,7 +95,7 @@ const u64b_t SKEIN_512_IV_224[] =
     };
 
 /* blkSize =  512 bits. hashSize =  256 bits */
-const u64b_t SKEIN_512_IV_256[] =
+const u64 SKEIN_512_IV_256[] =
     {
     MK_64(0xCCD044A1,0x2FDB3E13),
     MK_64(0xE8359030,0x1A79A9EB),
@@ -108,7 +108,7 @@ const u64b_t SKEIN_512_IV_256[] =
     };
 
 /* blkSize =  512 bits. hashSize =  384 bits */
-const u64b_t SKEIN_512_IV_384[] =
+const u64 SKEIN_512_IV_384[] =
     {
     MK_64(0xA3F6C6BF,0x3A75EF5F),
     MK_64(0xB0FEF9CC,0xFD84FAA4),
@@ -121,7 +121,7 @@ const u64b_t SKEIN_512_IV_384[] =
     };
 
 /* blkSize =  512 bits. hashSize =  512 bits */
-const u64b_t SKEIN_512_IV_512[] =
+const u64 SKEIN_512_IV_512[] =
     {
     MK_64(0x4903ADFF,0x749C51CE),
     MK_64(0x0D95DE39,0x9746DF03),
@@ -134,7 +134,7 @@ const u64b_t SKEIN_512_IV_512[] =
     };
 
 /* blkSize = 1024 bits. hashSize =  384 bits */
-const u64b_t SKEIN1024_IV_384[] =
+const u64 SKEIN1024_IV_384[] =
     {
     MK_64(0x5102B6B8,0xC1894A35),
     MK_64(0xFEEBC9E3,0xFE8AF11A),
@@ -155,7 +155,7 @@ const u64b_t SKEIN1024_IV_384[] =
     };
 
 /* blkSize = 1024 bits. hashSize =  512 bits */
-const u64b_t SKEIN1024_IV_512[] =
+const u64 SKEIN1024_IV_512[] =
     {
     MK_64(0xCAEC0E5D,0x7C1B1B18),
     MK_64(0xA01B0E04,0x5F03E802),
@@ -176,7 +176,7 @@ const u64b_t SKEIN1024_IV_512[] =
     };
 
 /* blkSize = 1024 bits. hashSize = 1024 bits */
-const u64b_t SKEIN1024_IV_1024[] =
+const u64 SKEIN1024_IV_1024[] =
     {
     MK_64(0xD593DA07,0x41E72355),
     MK_64(0x15B5E511,0xAC73E00C),
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index aaecfe822142..0123a575b606 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -51,9 +51,9 @@
      * structures as well.
      */
     typedef struct ThreefishKey {
-        u64b_t stateSize;
-        u64b_t key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
-        u64b_t tweak[3];
+        u64 stateSize;
+        u64 key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
+        u64 tweak[3];
     } ThreefishKey_t;
 
     /**
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 44468b6701ab..b225642efa4a 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -16,9 +16,9 @@
 
 /*****************************************************************/
 /* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
@@ -30,8 +30,8 @@ int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
 {
     union
     {
-        u08b_t  b[SKEIN_256_STATE_BYTES];
-        u64b_t  w[SKEIN_256_STATE_WORDS];
+        u8  b[SKEIN_256_STATE_BYTES];
+        u64  w[SKEIN_256_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -76,12 +76,12 @@ int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
-        u08b_t  b[SKEIN_256_STATE_BYTES];
-        u64b_t  w[SKEIN_256_STATE_WORDS];
+        u8  b[SKEIN_256_STATE_BYTES];
+        u64  w[SKEIN_256_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -126,7 +126,7 @@ int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -174,10 +174,10 @@ int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein_256_Final(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_256_Final(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN_256_STATE_WORDS];
+    u64 X[SKEIN_256_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
@@ -194,9 +194,9 @@ int Skein_256_Final(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_256_BLOCK_BYTES)
             n  = SKEIN_256_BLOCK_BYTES;
@@ -217,8 +217,8 @@ int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
 {
     union
     {
-        u08b_t  b[SKEIN_512_STATE_BYTES];
-        u64b_t  w[SKEIN_512_STATE_WORDS];
+        u8  b[SKEIN_512_STATE_BYTES];
+        u64  w[SKEIN_512_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -264,12 +264,12 @@ int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
-        u08b_t  b[SKEIN_512_STATE_BYTES];
-        u64b_t  w[SKEIN_512_STATE_WORDS];
+        u8  b[SKEIN_512_STATE_BYTES];
+        u64  w[SKEIN_512_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -314,7 +314,7 @@ int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -362,10 +362,10 @@ int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein_512_Final(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_512_Final(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN_512_STATE_WORDS];
+    u64 X[SKEIN_512_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
@@ -382,9 +382,9 @@ int Skein_512_Final(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_512_BLOCK_BYTES)
             n  = SKEIN_512_BLOCK_BYTES;
@@ -405,8 +405,8 @@ int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
 {
     union
     {
-        u08b_t  b[SKEIN1024_STATE_BYTES];
-        u64b_t  w[SKEIN1024_STATE_WORDS];
+        u8  b[SKEIN1024_STATE_BYTES];
+        u64  w[SKEIN1024_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -449,12 +449,12 @@ int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
-        u08b_t  b[SKEIN1024_STATE_BYTES];
-        u64b_t  w[SKEIN1024_STATE_WORDS];
+        u8  b[SKEIN1024_STATE_BYTES];
+        u64  w[SKEIN1024_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -499,7 +499,7 @@ int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -547,10 +547,10 @@ int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein1024_Final(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN1024_STATE_WORDS];
+    u64 X[SKEIN1024_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
@@ -567,9 +567,9 @@ int Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN1024_BLOCK_BYTES)
             n  = SKEIN1024_BLOCK_BYTES;
@@ -585,7 +585,7 @@ int Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -601,7 +601,7 @@ int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -617,7 +617,7 @@ int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -634,10 +634,10 @@ int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
 #if SKEIN_TREE_HASH
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein_256_Output(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_256_Output(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN_256_STATE_WORDS];
+    u64 X[SKEIN_256_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
@@ -648,9 +648,9 @@ int Skein_256_Output(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_256_BLOCK_BYTES)
             n  = SKEIN_256_BLOCK_BYTES;
@@ -663,10 +663,10 @@ int Skein_256_Output(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein_512_Output(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_512_Output(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN_512_STATE_WORDS];
+    u64 X[SKEIN_512_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
@@ -677,9 +677,9 @@ int Skein_512_Output(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_512_BLOCK_BYTES)
             n  = SKEIN_512_BLOCK_BYTES;
@@ -692,10 +692,10 @@ int Skein_512_Output(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein1024_Output(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein1024_Output(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN1024_STATE_WORDS];
+    u64 X[SKEIN1024_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
@@ -706,9 +706,9 @@ int Skein1024_Output(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN1024_BLOCK_BYTES)
             n  = SKEIN1024_BLOCK_BYTES;
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 579b92efbf65..ef021086bc61 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -41,7 +41,7 @@ int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
     size_t Xlen = 0;
-    u64b_t*  X = NULL;
+    u64*  X = NULL;
     uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
     Skein_Assert(ctx, SKEIN_FAIL);
@@ -82,7 +82,7 @@ int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
                  size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
-    u64b_t*  X = NULL;
+    u64*  X = NULL;
     size_t Xlen = 0;
     uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
@@ -97,18 +97,18 @@ int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
     case Skein256:
         ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
                                 treeInfo,
-                                (const u08b_t*)key, keyLen);
+                                (const u8*)key, keyLen);
 
         break;
     case Skein512:
         ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
                                 treeInfo,
-                                (const u08b_t*)key, keyLen);
+                                (const u8*)key, keyLen);
         break;
     case Skein1024:
         ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
                                 treeInfo,
-                                (const u08b_t*)key, keyLen);
+                                (const u8*)key, keyLen);
 
         break;
     }
@@ -122,7 +122,7 @@ int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
 void skeinReset(SkeinCtx_t* ctx)
 {
     size_t Xlen = 0;
-    u64b_t*  X = NULL;
+    u64*  X = NULL;
 
     /*
      * The following two lines rely of the fact that the real Skein contexts are
@@ -146,13 +146,13 @@ int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
 
     switch (ctx->skeinSize) {
     case Skein256:
-        ret = Skein_256_Update(&ctx->m.s256, (const u08b_t*)msg, msgByteCnt);
+        ret = Skein_256_Update(&ctx->m.s256, (const u8*)msg, msgByteCnt);
         break;
     case Skein512:
-        ret = Skein_512_Update(&ctx->m.s512, (const u08b_t*)msg, msgByteCnt);
+        ret = Skein_512_Update(&ctx->m.s512, (const u8*)msg, msgByteCnt);
         break;
     case Skein1024:
-        ret = Skein1024_Update(&ctx->m.s1024, (const u08b_t*)msg, msgByteCnt);
+        ret = Skein1024_Update(&ctx->m.s1024, (const u8*)msg, msgByteCnt);
         break;
     }
     return ret;
@@ -206,13 +206,13 @@ int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash)
 
     switch (ctx->skeinSize) {
     case Skein256:
-        ret = Skein_256_Final(&ctx->m.s256, (u08b_t*)hash);
+        ret = Skein_256_Final(&ctx->m.s256, (u8*)hash);
         break;
     case Skein512:
-        ret = Skein_512_Final(&ctx->m.s512, (u08b_t*)hash);
+        ret = Skein_512_Final(&ctx->m.s512, (u8*)hash);
         break;
     case Skein1024:
-        ret = Skein1024_Final(&ctx->m.s1024, (u08b_t*)hash);
+        ret = Skein1024_Final(&ctx->m.s1024, (u8*)hash);
         break;
     }
     return ret;
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 6a19ceb17d0f..56c56b8ebd7e 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -5,21 +5,21 @@
 
 
 /*****************************  Skein_256 ******************************/
-void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u08b_t *blkPtr,
+void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u8 *blkPtr,
                              size_t blkCnt, size_t byteCntAdd)
 {
     ThreefishKey_t key;
-    u64b_t tweak[2];
+    u64 tweak[2];
     int i;
-    u64b_t  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
-    u64b_t words[3];
+    u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+    u64 words[3];
 
     Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
     tweak[0] = ctx->h.T[0];
     tweak[1] = ctx->h.T[1];
 
     do  {
-        u64b_t carry = byteCntAdd;
+        u64 carry = byteCntAdd;
 
         words[0] = tweak[0] & 0xffffffffL;
         words[1] = ((tweak[0] >> 32) & 0xffffffffL);
@@ -55,21 +55,21 @@ void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u08b_t *blkPtr,
     ctx->h.T[1] = tweak[1];
 }
 
-void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u08b_t *blkPtr,
+void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u8 *blkPtr,
                              size_t blkCnt, size_t byteCntAdd)
 {
     ThreefishKey_t key;
-    u64b_t tweak[2];
+    u64 tweak[2];
     int i;
-    u64b_t words[3];
-    u64b_t  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+    u64 words[3];
+    u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
 
     Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
     tweak[0] = ctx->h.T[0];
     tweak[1] = ctx->h.T[1];
 
     do  {
-        u64b_t carry = byteCntAdd;
+        u64 carry = byteCntAdd;
 
         words[0] = tweak[0] & 0xffffffffL;
         words[1] = ((tweak[0] >> 32) & 0xffffffffL);
@@ -109,21 +109,21 @@ void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u08b_t *blkPtr,
     ctx->h.T[1] = tweak[1];
 }
 
-void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx, const u08b_t *blkPtr,
+void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx, const u8 *blkPtr,
                               size_t blkCnt, size_t byteCntAdd)
 {
     ThreefishKey_t key;
-    u64b_t tweak[2];
+    u64 tweak[2];
     int i;
-    u64b_t words[3];
-    u64b_t  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+    u64 words[3];
+    u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
 
     Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
     tweak[0] = ctx->h.T[0];
     tweak[1] = ctx->h.T[1];
 
     do  {
-        u64b_t carry = byteCntAdd;
+        u64 carry = byteCntAdd;
 
         words[0] = tweak[0] & 0xffffffffL;
         words[1] = ((tweak[0] >> 32) & 0xffffffffL);
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index b5be41af6d17..98e884292044 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -39,7 +39,7 @@
 
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C */
     enum
         {
@@ -59,14 +59,14 @@ void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
 #endif
     size_t  r;
-    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64b_t  X0,X1,X2,X3;                        /* local copy of context vars, for speed */
-    u64b_t  w [WCNT];                           /* local copy of input block */
+    u64  X0,X1,X2,X3;                        /* local copy of context vars, for speed */
+    u64  w [WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64b_t *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+    const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
     Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
 #endif
     Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
@@ -212,10 +212,10 @@ void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein_256_Process_Block_CodeSize(void)
     {
-    return ((u08b_t *) Skein_256_Process_Block_CodeSize) -
-           ((u08b_t *) Skein_256_Process_Block);
+    return ((u8 *) Skein_256_Process_Block_CodeSize) -
+           ((u8 *) Skein_256_Process_Block);
     }
-uint_t Skein_256_Unroll_Cnt(void)
+unsigned int Skein_256_Unroll_Cnt(void)
     {
     return SKEIN_UNROLL_256;
     }
@@ -224,7 +224,7 @@ uint_t Skein_256_Unroll_Cnt(void)
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C */
     enum
         {
@@ -244,14 +244,14 @@ void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
 #endif
     size_t  r;
-    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64b_t  X0,X1,X2,X3,X4,X5,X6,X7;            /* local copy of vars, for speed */
-    u64b_t  w [WCNT];                           /* local copy of input block */
+    u64  X0,X1,X2,X3,X4,X5,X6,X7;            /* local copy of vars, for speed */
+    u64  w [WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64b_t *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+    const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
     Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
     Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
 #endif
@@ -420,10 +420,10 @@ void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein_512_Process_Block_CodeSize(void)
     {
-    return ((u08b_t *) Skein_512_Process_Block_CodeSize) -
-           ((u08b_t *) Skein_512_Process_Block);
+    return ((u8 *) Skein_512_Process_Block_CodeSize) -
+           ((u8 *) Skein_512_Process_Block);
     }
-uint_t Skein_512_Unroll_Cnt(void)
+unsigned int Skein_512_Unroll_Cnt(void)
     {
     return SKEIN_UNROLL_512;
     }
@@ -432,7 +432,7 @@ uint_t Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C, always looping (unrolled is bigger AND slower!) */
     enum
         {
@@ -452,16 +452,16 @@ void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
 #endif
     size_t  r;
-    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
 
-    u64b_t  X00,X01,X02,X03,X04,X05,X06,X07,    /* local copy of vars, for speed */
+    u64  X00,X01,X02,X03,X04,X05,X06,X07,    /* local copy of vars, for speed */
             X08,X09,X10,X11,X12,X13,X14,X15;
-    u64b_t  w [WCNT];                           /* local copy of input block */
+    u64  w [WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64b_t *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+    const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
     Xptr[ 0] = &X00;  Xptr[ 1] = &X01;  Xptr[ 2] = &X02;  Xptr[ 3] = &X03;
     Xptr[ 4] = &X04;  Xptr[ 5] = &X05;  Xptr[ 6] = &X06;  Xptr[ 7] = &X07;
     Xptr[ 8] = &X08;  Xptr[ 9] = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
@@ -678,10 +678,10 @@ void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein1024_Process_Block_CodeSize(void)
     {
-    return ((u08b_t *) Skein1024_Process_Block_CodeSize) -
-           ((u08b_t *) Skein1024_Process_Block);
+    return ((u8 *) Skein1024_Process_Block_CodeSize) -
+           ((u8 *) Skein1024_Process_Block);
     }
-uint_t Skein1024_Unroll_Cnt(void)
+unsigned int Skein1024_Unroll_Cnt(void)
     {
     return SKEIN_UNROLL_1024;
     }
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 968d3d21fe61..ed19ee9e3425 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -25,8 +25,8 @@ void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
 void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
                                 uint8_t* out)
 {
-    u64b_t plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
-    u64b_t cipher[SKEIN_MAX_STATE_WORDS];
+    u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+    u64 cipher[SKEIN_MAX_STATE_WORDS];
     
     Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
     threefishEncryptBlockWords(keyCtx, plain, cipher);
@@ -52,8 +52,8 @@ void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
 void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
                                 uint8_t* out)
 {
-    u64b_t plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
-    u64b_t cipher[SKEIN_MAX_STATE_WORDS];
+    u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+    u64 cipher[SKEIN_MAX_STATE_WORDS];
     
     Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
     threefishDecryptBlockWords(keyCtx, cipher, plain);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 08/22] staging: crypto: skein: remove all typedef {struct,enum}
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (6 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 07/22] staging: crypto: skein: remove unneeded typedefs Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 09/22] staging: crypto: skein: use u8, u64 vice uint*_t Jason Cooper
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 58 ++++++++++++++--------------
 drivers/staging/skein/include/skeinApi.h     | 32 +++++++--------
 drivers/staging/skein/include/threefishApi.h | 32 +++++++--------
 drivers/staging/skein/skein.c                | 42 ++++++++++----------
 drivers/staging/skein/skeinApi.c             | 16 ++++----
 drivers/staging/skein/skeinBlockNo3F.c       | 12 +++---
 drivers/staging/skein/skein_block.c          |  6 +--
 drivers/staging/skein/threefish1024Block.c   |  4 +-
 drivers/staging/skein/threefish256Block.c    |  4 +-
 drivers/staging/skein/threefish512Block.c    |  4 +-
 drivers/staging/skein/threefishApi.c         | 10 ++---
 11 files changed, 110 insertions(+), 110 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 12c5c8d612b0..77b712e73253 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -63,46 +63,46 @@ enum
 #define  SKEIN_512_BLOCK_BYTES ( 8*SKEIN_512_STATE_WORDS)
 #define  SKEIN1024_BLOCK_BYTES ( 8*SKEIN1024_STATE_WORDS)
 
-typedef struct
+struct skein_ctx_hdr
     {
     size_t  hashBitLen;                      /* size of hash result, in bits */
     size_t  bCnt;                            /* current byte count in buffer b[] */
     u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
-    } Skein_Ctxt_Hdr_t;
+    };
 
-typedef struct                               /*  256-bit Skein hash context structure */
+struct skein_256_ctx                               /*  256-bit Skein hash context structure */
     {
-    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    struct skein_ctx_hdr h;                      /* common header context variables */
     u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
     u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    } Skein_256_Ctxt_t;
+    };
 
-typedef struct                               /*  512-bit Skein hash context structure */
+struct skein_512_ctx                             /*  512-bit Skein hash context structure */
     {
-    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    struct skein_ctx_hdr h;                      /* common header context variables */
     u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
     u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    } Skein_512_Ctxt_t;
+    };
 
-typedef struct                               /* 1024-bit Skein hash context structure */
+struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
     {
-    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    struct skein_ctx_hdr h;                      /* common header context variables */
     u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
     u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    } Skein1024_Ctxt_t;
+    };
 
 /*   Skein APIs for (incremental) "straight hashing" */
-int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
-int  Skein_512_Init  (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
-int  Skein1024_Init  (Skein1024_Ctxt_t *ctx, size_t hashBitLen);
+int  Skein_256_Init  (struct skein_256_ctx *ctx, size_t hashBitLen);
+int  Skein_512_Init  (struct skein_512_ctx *ctx, size_t hashBitLen);
+int  Skein1024_Init  (struct skein1024_ctx *ctx, size_t hashBitLen);
 
-int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 
-int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u8 * hashVal);
-int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u8 * hashVal);
-int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_256_Final (struct skein_256_ctx *ctx, u8 * hashVal);
+int  Skein_512_Final (struct skein_512_ctx *ctx, u8 * hashVal);
+int  Skein1024_Final (struct skein1024_ctx *ctx, u8 * hashVal);
 
 /*
 **   Skein APIs for "extended" initialization: MAC keys, tree hashing.
@@ -118,26 +118,26 @@ int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u8 * hashVal);
 **              to precompute the MAC IV, then a copy of the context saved and
 **              reused for each new MAC computation.
 **/
-int  Skein_256_InitExt(Skein_256_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein_512_InitExt(Skein_512_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein1024_InitExt(Skein1024_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
 
 /*
 **   Skein APIs for MAC and tree hash:
 **      Final_Pad:  pad, do final block, but no OUTPUT type
 **      Output:     do just the output stage
 */
-int  Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 * hashVal);
-int  Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 * hashVal);
-int  Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 * hashVal);
+int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 * hashVal);
+int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 * hashVal);
 
 #ifndef SKEIN_TREE_HASH
 #define SKEIN_TREE_HASH (1)
 #endif
 #if  SKEIN_TREE_HASH
-int  Skein_256_Output   (Skein_256_Ctxt_t *ctx, u8 * hashVal);
-int  Skein_512_Output   (Skein_512_Ctxt_t *ctx, u8 * hashVal);
-int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_256_Output   (struct skein_256_ctx *ctx, u8 * hashVal);
+int  Skein_512_Output   (struct skein_512_ctx *ctx, u8 * hashVal);
+int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 * hashVal);
 #endif
 
 /*****************************************************************
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index fb4a7c8e7f7a..548c639431de 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -47,7 +47,7 @@ OTHER DEALINGS IN THE SOFTWARE.
  * #include <skeinApi.h>
  * 
  * ...
- * SkeinCtx_t ctx;             // a Skein hash or MAC context
+ * struct skein_ctx ctx;             // a Skein hash or MAC context
  * 
  * // prepare context, here for a Skein with a state size of 512 bits.
  * skeinCtxPrepare(&ctx, Skein512);
@@ -84,11 +84,11 @@ OTHER DEALINGS IN THE SOFTWARE.
     /**
      * Which Skein size to use
      */
-    typedef enum SkeinSize {
+    enum skein_size {
         Skein256 = 256,     /*!< Skein with 256 bit state */
         Skein512 = 512,     /*!< Skein with 512 bit state */
         Skein1024 = 1024    /*!< Skein with 1024 bit state */
-    } SkeinSize_t;
+    };
 
     /**
      * Context for Skein.
@@ -98,16 +98,16 @@ OTHER DEALINGS IN THE SOFTWARE.
      * variables. If Skein implementation changes this, then adapt these
      * structures as well.
      */
-    typedef struct SkeinCtx {
+    struct skein_ctx {
         u64 skeinSize;
         u64  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
         union {
-            Skein_Ctxt_Hdr_t h;
-            Skein_256_Ctxt_t s256;
-            Skein_512_Ctxt_t s512;
-            Skein1024_Ctxt_t s1024;
+            struct skein_ctx_hdr h;
+            struct skein_256_ctx s256;
+            struct skein_512_ctx s512;
+            struct skein1024_ctx s1024;
         } m;
-    } SkeinCtx_t;
+    };
 
     /**
      * Prepare a Skein context.
@@ -123,7 +123,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size);
+    int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size);
 
     /**
      * Initialize a Skein context.
@@ -139,7 +139,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     SKEIN_SUCESS of SKEIN_FAIL
      * @see skeinReset
      */
-    int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen);
+    int skeinInit(struct skein_ctx* ctx, size_t hashBitLen);
 
     /**
      * Resets a Skein context for further use.
@@ -151,7 +151,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @param ctx
      *     Pointer to a pre-initialized Skein MAC context
      */
-    void skeinReset(SkeinCtx_t* ctx);
+    void skeinReset(struct skein_ctx* ctx);
     
     /**
      * Initializes a Skein context for MAC usage.
@@ -173,7 +173,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
+    int skeinMacInit(struct skein_ctx* ctx, const uint8_t *key, size_t keyLen,
                      size_t hashBitLen);
 
     /**
@@ -188,7 +188,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     Success or error code.
      */
-    int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
+    int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
                     size_t msgByteCnt);
 
     /**
@@ -204,7 +204,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @param msgBitCnt
      *     Length of the message in @b bits.
      */
-    int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
+    int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
                         size_t msgBitCnt);
 
     /**
@@ -222,7 +222,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     Success or error code.
      * @see skeinReset
      */
-    int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash);
+    int skeinFinal(struct skein_ctx* ctx, uint8_t* hash);
 
 /**
  * @}
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 0123a575b606..4c1cd81f30c4 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -18,7 +18,7 @@
  * 
 @code
     // Threefish cipher context data
-    ThreefishKey_t keyCtx;
+    struct threefish_key keyCtx;
 
     // Initialize the context
     threefishSetKey(&keyCtx, Threefish512, key, tweak);
@@ -36,11 +36,11 @@
     /**
      * Which Threefish size to use
      */
-    typedef enum ThreefishSize {
+    enum threefish_size {
         Threefish256 = 256,     /*!< Skein with 256 bit state */
         Threefish512 = 512,     /*!< Skein with 512 bit state */
         Threefish1024 = 1024    /*!< Skein with 1024 bit state */
-    } ThreefishSize_t;
+    };
     
     /**
      * Context for Threefish key and tweak words.
@@ -50,11 +50,11 @@
      * variables. If Skein implementation changes this, the adapt these
      * structures as well.
      */
-    typedef struct ThreefishKey {
+    struct threefish_key {
         u64 stateSize;
         u64 key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
         u64 tweak[3];
-    } ThreefishKey_t;
+    };
 
     /**
      * Set Threefish key and tweak data.
@@ -72,7 +72,7 @@
      * @param tweak
      *     Pointer to the two tweak words (word has 64 bits).
      */
-    void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize, uint64_t* keyData, uint64_t* tweak);
+    void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize, uint64_t* keyData, uint64_t* tweak);
     
     /**
      * Encrypt Threefisch block (bytes).
@@ -89,7 +89,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in, uint8_t* out);
+    void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in, uint8_t* out);
     
     /**
      * Encrypt Threefisch block (words).
@@ -108,7 +108,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in, uint64_t* out);
+    void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in, uint64_t* out);
 
     /**
      * Decrypt Threefisch block (bytes).
@@ -125,7 +125,7 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in, uint8_t* out);
+    void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in, uint8_t* out);
 
     /**
      * Decrypt Threefisch block (words).
@@ -144,14 +144,14 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in, uint64_t* out);
+    void threefishDecryptBlockWords(struct threefish_key* keyCtx, uint64_t* in, uint64_t* out);
 
-    void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index b225642efa4a..2bed7c163316 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -16,9 +16,9 @@
 
 /*****************************************************************/
 /* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
@@ -26,7 +26,7 @@ void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t bl
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a straight hashing operation  */
-int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
+int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 {
     union
     {
@@ -76,7 +76,7 @@ int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein_256_InitExt(struct skein_256_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -126,7 +126,7 @@ int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, cons
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -174,7 +174,7 @@ int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein_256_Final(Skein_256_Ctxt_t *ctx, u8 *hashVal)
+int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN_256_STATE_WORDS];
@@ -213,7 +213,7 @@ int Skein_256_Final(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a straight hashing operation  */
-int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
+int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 {
     union
     {
@@ -264,7 +264,7 @@ int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein_512_InitExt(struct skein_512_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -314,7 +314,7 @@ int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, cons
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -362,7 +362,7 @@ int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein_512_Final(Skein_512_Ctxt_t *ctx, u8 *hashVal)
+int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN_512_STATE_WORDS];
@@ -401,7 +401,7 @@ int Skein_512_Final(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a straight hashing operation  */
-int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
+int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 {
     union
     {
@@ -449,7 +449,7 @@ int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein1024_InitExt(struct skein1024_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -499,7 +499,7 @@ int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, cons
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -547,7 +547,7 @@ int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein1024_Final(Skein1024_Ctxt_t *ctx, u8 *hashVal)
+int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN1024_STATE_WORDS];
@@ -585,7 +585,7 @@ int Skein1024_Final(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 *hashVal)
+int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -601,7 +601,7 @@ int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 *hashVal)
+int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -617,7 +617,7 @@ int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 *hashVal)
+int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -634,7 +634,7 @@ int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 #if SKEIN_TREE_HASH
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein_256_Output(Skein_256_Ctxt_t *ctx, u8 *hashVal)
+int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN_256_STATE_WORDS];
@@ -663,7 +663,7 @@ int Skein_256_Output(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein_512_Output(Skein_512_Ctxt_t *ctx, u8 *hashVal)
+int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN_512_STATE_WORDS];
@@ -692,7 +692,7 @@ int Skein_512_Output(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein1024_Output(Skein1024_Ctxt_t *ctx, u8 *hashVal)
+int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN1024_STATE_WORDS];
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index ef021086bc61..ce5c5ae575e7 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -27,17 +27,17 @@ OTHER DEALINGS IN THE SOFTWARE.
 #include <linux/string.h>
 #include <skeinApi.h>
 
-int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size)
+int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size)
 {
     Skein_Assert(ctx && size, SKEIN_FAIL);
 
-    memset(ctx ,0, sizeof(SkeinCtx_t));
+    memset(ctx ,0, sizeof(struct skein_ctx));
     ctx->skeinSize = size;
 
     return SKEIN_SUCCESS;
 }
 
-int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen)
+int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
     size_t Xlen = 0;
@@ -78,7 +78,7 @@ int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen)
     return ret;
 }
 
-int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
+int skeinMacInit(struct skein_ctx* ctx, const uint8_t *key, size_t keyLen,
                  size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
@@ -119,7 +119,7 @@ int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
     return ret;
 }
 
-void skeinReset(SkeinCtx_t* ctx)
+void skeinReset(struct skein_ctx* ctx)
 {
     size_t Xlen = 0;
     u64*  X = NULL;
@@ -138,7 +138,7 @@ void skeinReset(SkeinCtx_t* ctx)
     Skein_Start_New_Type(&ctx->m, MSG);
 }
 
-int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
+int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
                 size_t msgByteCnt)
 {
     int ret = SKEIN_FAIL;
@@ -159,7 +159,7 @@ int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
 
 }
 
-int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
+int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
                     size_t msgBitCnt)
 {
     /*
@@ -199,7 +199,7 @@ int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
     return SKEIN_SUCCESS;
 }
 
-int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash)
+int skeinFinal(struct skein_ctx* ctx, uint8_t* hash)
 {
     int ret = SKEIN_FAIL;
     Skein_Assert(ctx, SKEIN_FAIL);
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 56c56b8ebd7e..02e68dbab0d4 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -5,10 +5,10 @@
 
 
 /*****************************  Skein_256 ******************************/
-void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u8 *blkPtr,
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
                              size_t blkCnt, size_t byteCntAdd)
 {
-    ThreefishKey_t key;
+    struct threefish_key key;
     u64 tweak[2];
     int i;
     u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
@@ -55,10 +55,10 @@ void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u8 *blkPtr,
     ctx->h.T[1] = tweak[1];
 }
 
-void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u8 *blkPtr,
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
                              size_t blkCnt, size_t byteCntAdd)
 {
-    ThreefishKey_t key;
+    struct threefish_key key;
     u64 tweak[2];
     int i;
     u64 words[3];
@@ -109,10 +109,10 @@ void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u8 *blkPtr,
     ctx->h.T[1] = tweak[1];
 }
 
-void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx, const u8 *blkPtr,
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
                               size_t blkCnt, size_t byteCntAdd)
 {
-    ThreefishKey_t key;
+    struct threefish_key key;
     u64 tweak[2];
     int i;
     u64 words[3];
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 98e884292044..179bde121380 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -39,7 +39,7 @@
 
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C */
     enum
         {
@@ -224,7 +224,7 @@ unsigned int Skein_256_Unroll_Cnt(void)
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C */
     enum
         {
@@ -432,7 +432,7 @@ unsigned int Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C, always looping (unrolled is bigger AND slower!) */
     enum
         {
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 58a8c26a1f6f..738ec523406b 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
         {
 
     uint64_t b0 = input[0], b1 = input[1],
@@ -684,7 +684,7 @@ void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* out
             output[15] = b15 + k1 + 20;
         }
 
-void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
 {
 
     uint64_t b0 = input[0], b1 = input[1],
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index a7e06f905186..b81cb3a65b04 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
   {
 
     uint64_t b0 = input[0], b1 = input[1],
@@ -172,7 +172,7 @@ void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* outp
     output[3] = b3 + k1 + 18;
   }
 
-void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
   {
     uint64_t b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3];
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 3cbfcd9af5c9..7eed6aeb3742 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
     {
 
     uint64_t b0 = input[0], b1 = input[1],
@@ -316,7 +316,7 @@ void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* outp
         output[7] = b7 + k7 + 18;
     }
 
-void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
     {
 
     uint64_t b0 = input[0], b1 = input[1],
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index ed19ee9e3425..5cd3eb9bd9f2 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -3,7 +3,7 @@
 #include <linux/string.h>
 #include <threefishApi.h>
 
-void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
+void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize,
                      uint64_t* keyData, uint64_t* tweak)
 {
     int keyWords = stateSize / 64;
@@ -22,7 +22,7 @@ void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
     keyCtx->stateSize = stateSize;
 }
 
-void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
+void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
                                 uint8_t* out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
@@ -33,7 +33,7 @@ void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
     Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
 }
 
-void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
+void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
                                 uint64_t* out)
 {
     switch (keyCtx->stateSize) {
@@ -49,7 +49,7 @@ void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
     }
 }
 
-void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
+void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
                                 uint8_t* out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
@@ -60,7 +60,7 @@ void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
     Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
 }
 
-void threefishDecryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
+void threefishDecryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
                                 uint64_t* out)
 {
     switch (keyCtx->stateSize) {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 09/22] staging: crypto: skein: use u8, u64 vice uint*_t
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (7 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 08/22] staging: crypto: skein: remove all typedef {struct,enum} Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 10/22] staging: crypto: skein: fixup pointer whitespace Jason Cooper
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skeinApi.h     |  8 ++++----
 drivers/staging/skein/include/threefishApi.h | 22 +++++++++++-----------
 drivers/staging/skein/skeinApi.c             | 22 +++++++++++-----------
 drivers/staging/skein/threefish1024Block.c   | 18 +++++++++---------
 drivers/staging/skein/threefish256Block.c    | 18 +++++++++---------
 drivers/staging/skein/threefish512Block.c    | 18 +++++++++---------
 drivers/staging/skein/threefishApi.c         | 20 ++++++++++----------
 7 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 548c639431de..4ad294f7945d 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -173,7 +173,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinMacInit(struct skein_ctx* ctx, const uint8_t *key, size_t keyLen,
+    int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
                      size_t hashBitLen);
 
     /**
@@ -188,7 +188,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     Success or error code.
      */
-    int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
+    int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
                     size_t msgByteCnt);
 
     /**
@@ -204,7 +204,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @param msgBitCnt
      *     Length of the message in @b bits.
      */
-    int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
+    int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
                         size_t msgBitCnt);
 
     /**
@@ -222,7 +222,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     Success or error code.
      * @see skeinReset
      */
-    int skeinFinal(struct skein_ctx* ctx, uint8_t* hash);
+    int skeinFinal(struct skein_ctx* ctx, u8* hash);
 
 /**
  * @}
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 4c1cd81f30c4..194e313b6b62 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -72,7 +72,7 @@
      * @param tweak
      *     Pointer to the two tweak words (word has 64 bits).
      */
-    void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize, uint64_t* keyData, uint64_t* tweak);
+    void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize, u64* keyData, u64* tweak);
     
     /**
      * Encrypt Threefisch block (bytes).
@@ -89,7 +89,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in, uint8_t* out);
+    void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in, u8* out);
     
     /**
      * Encrypt Threefisch block (words).
@@ -108,7 +108,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in, uint64_t* out);
+    void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in, u64* out);
 
     /**
      * Decrypt Threefisch block (bytes).
@@ -125,7 +125,7 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in, uint8_t* out);
+    void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in, u8* out);
 
     /**
      * Decrypt Threefisch block (words).
@@ -144,14 +144,14 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockWords(struct threefish_key* keyCtx, uint64_t* in, uint64_t* out);
+    void threefishDecryptBlockWords(struct threefish_key* keyCtx, u64* in, u64* out);
 
-    void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishDecrypt256(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishDecrypt512(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishDecrypt1024(struct threefish_key* keyCtx, u64* input, u64* output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index ce5c5ae575e7..6bd2da0eaa5f 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -42,7 +42,7 @@ int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
     int ret = SKEIN_FAIL;
     size_t Xlen = 0;
     u64*  X = NULL;
-    uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+    u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
     Skein_Assert(ctx, SKEIN_FAIL);
     /*
@@ -78,13 +78,13 @@ int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
     return ret;
 }
 
-int skeinMacInit(struct skein_ctx* ctx, const uint8_t *key, size_t keyLen,
+int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
                  size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
     u64*  X = NULL;
     size_t Xlen = 0;
-    uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+    u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
     Skein_Assert(ctx, SKEIN_FAIL);
 
@@ -138,7 +138,7 @@ void skeinReset(struct skein_ctx* ctx)
     Skein_Start_New_Type(&ctx->m, MSG);
 }
 
-int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
+int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
                 size_t msgByteCnt)
 {
     int ret = SKEIN_FAIL;
@@ -159,7 +159,7 @@ int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
 
 }
 
-int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
+int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
                     size_t msgBitCnt)
 {
     /*
@@ -168,8 +168,8 @@ int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
      * arithmetic.
      */
     size_t length;
-    uint8_t mask;
-    uint8_t* up;
+    u8 mask;
+    u8* up;
 
     /* only the final Update() call is allowed do partial bytes, else assert an error */
     Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
@@ -186,20 +186,20 @@ int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
      * Skein's real partial block buffer.
      * If this layout ever changes we have to adapt this as well.
      */
-    up = (uint8_t*)ctx->m.s256.X + ctx->skeinSize / 8;
+    up = (u8*)ctx->m.s256.X + ctx->skeinSize / 8;
 
     Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
 
     /* now "pad" the final partial byte the way NIST likes */
     length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
     Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
-    mask = (uint8_t) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
-    up[length-1]  = (uint8_t)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
+    mask = (u8) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
+    up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
 
     return SKEIN_SUCCESS;
 }
 
-int skeinFinal(struct skein_ctx* ctx, uint8_t* hash)
+int skeinFinal(struct skein_ctx* ctx, u8* hash)
 {
     int ret = SKEIN_FAIL;
     Skein_Assert(ctx, SKEIN_FAIL);
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 738ec523406b..9e821fcdb067 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -2,10 +2,10 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
         {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3],
       b4 = input[4], b5 = input[5],
       b6 = input[6], b7 = input[7],
@@ -13,7 +13,7 @@ void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
       b10 = input[10], b11 = input[11],
       b12 = input[12], b13 = input[13],
       b14 = input[14], b15 = input[15];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4], k5 = keyCtx->key[5],
       k6 = keyCtx->key[6], k7 = keyCtx->key[7],
@@ -22,7 +22,7 @@ void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
       k12 = keyCtx->key[12], k13 = keyCtx->key[13],
       k14 = keyCtx->key[14], k15 = keyCtx->key[15],
       k16 = keyCtx->key[16];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
             b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
@@ -684,10 +684,10 @@ void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
             output[15] = b15 + k1 + 20;
         }
 
-void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
 {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3],
       b4 = input[4], b5 = input[5],
       b6 = input[6], b7 = input[7],
@@ -695,7 +695,7 @@ void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
       b10 = input[10], b11 = input[11],
       b12 = input[12], b13 = input[13],
       b14 = input[14], b15 = input[15];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4], k5 = keyCtx->key[5],
       k6 = keyCtx->key[6], k7 = keyCtx->key[7],
@@ -704,9 +704,9 @@ void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
       k12 = keyCtx->key[12], k13 = keyCtx->key[13],
       k14 = keyCtx->key[14], k15 = keyCtx->key[15],
       k16 = keyCtx->key[16];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
-    uint64_t tmp;
+    u64 tmp;
 
             b0 -= k3;
             b1 -= k4;
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index b81cb3a65b04..68ac4c50f01e 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -2,15 +2,15 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
   {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
     b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
@@ -172,17 +172,17 @@ void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t
     output[3] = b3 + k1 + 18;
   }
 
-void threefishDecrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
   {
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
-    uint64_t tmp;
+    u64 tmp;
 
     b0 -= k3;
     b1 -= k4 + t0;
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 7eed6aeb3742..e94bb93722df 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -2,19 +2,19 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
     {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3],
       b4 = input[4], b5 = input[5],
       b6 = input[6], b7 = input[7];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4], k5 = keyCtx->key[5],
       k6 = keyCtx->key[6], k7 = keyCtx->key[7],
       k8 = keyCtx->key[8];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
         b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
@@ -316,22 +316,22 @@ void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t
         output[7] = b7 + k7 + 18;
     }
 
-void threefishDecrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
     {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3],
       b4 = input[4], b5 = input[5],
       b6 = input[6], b7 = input[7];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4], k5 = keyCtx->key[5],
       k6 = keyCtx->key[6], k7 = keyCtx->key[7],
       k8 = keyCtx->key[8];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
-      uint64_t tmp;
+      u64 tmp;
 
         b0 -= k0;
         b1 -= k1;
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 5cd3eb9bd9f2..37f96215159d 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -4,11 +4,11 @@
 #include <threefishApi.h>
 
 void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize,
-                     uint64_t* keyData, uint64_t* tweak)
+                     u64* keyData, u64* tweak)
 {
     int keyWords = stateSize / 64;
     int i;
-    uint64_t parity = KeyScheduleConst;
+    u64 parity = KeyScheduleConst;
 
     keyCtx->tweak[0] = tweak[0];
     keyCtx->tweak[1] = tweak[1];
@@ -22,8 +22,8 @@ void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize
     keyCtx->stateSize = stateSize;
 }
 
-void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
-                                uint8_t* out)
+void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in,
+                                u8* out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
     u64 cipher[SKEIN_MAX_STATE_WORDS];
@@ -33,8 +33,8 @@ void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
     Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
 }
 
-void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
-                                uint64_t* out)
+void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in,
+                                u64* out)
 {
     switch (keyCtx->stateSize) {
         case Threefish256:
@@ -49,8 +49,8 @@ void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
     }
 }
 
-void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
-                                uint8_t* out)
+void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in,
+                                u8* out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
     u64 cipher[SKEIN_MAX_STATE_WORDS];
@@ -60,8 +60,8 @@ void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
     Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
 }
 
-void threefishDecryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
-                                uint64_t* out)
+void threefishDecryptBlockWords(struct threefish_key* keyCtx, u64* in,
+                                u64* out)
 {
     switch (keyCtx->stateSize) {
         case Threefish256:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 10/22] staging: crypto: skein: fixup pointer whitespace
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (8 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 09/22] staging: crypto: skein: use u8, u64 vice uint*_t Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 11/22] staging: crypto: skein: cleanup whitespace around operators/punc Jason Cooper
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 18 +++++++++---------
 drivers/staging/skein/include/skeinApi.h     | 10 +++++-----
 drivers/staging/skein/include/threefishApi.h | 22 +++++++++++-----------
 drivers/staging/skein/skeinApi.c             | 18 +++++++++---------
 drivers/staging/skein/threefish1024Block.c   |  4 ++--
 drivers/staging/skein/threefish256Block.c    |  4 ++--
 drivers/staging/skein/threefish512Block.c    |  4 ++--
 drivers/staging/skein/threefishApi.c         | 20 ++++++++++----------
 8 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 77b712e73253..b7cd6c0cef2f 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -100,9 +100,9 @@ int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCn
 int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 
-int  Skein_256_Final (struct skein_256_ctx *ctx, u8 * hashVal);
-int  Skein_512_Final (struct skein_512_ctx *ctx, u8 * hashVal);
-int  Skein1024_Final (struct skein1024_ctx *ctx, u8 * hashVal);
+int  Skein_256_Final (struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final (struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final (struct skein1024_ctx *ctx, u8 *hashVal);
 
 /*
 **   Skein APIs for "extended" initialization: MAC keys, tree hashing.
@@ -127,17 +127,17 @@ int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInf
 **      Final_Pad:  pad, do final block, but no OUTPUT type
 **      Output:     do just the output stage
 */
-int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 * hashVal);
-int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 * hashVal);
-int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 * hashVal);
+int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal);
 
 #ifndef SKEIN_TREE_HASH
 #define SKEIN_TREE_HASH (1)
 #endif
 #if  SKEIN_TREE_HASH
-int  Skein_256_Output   (struct skein_256_ctx *ctx, u8 * hashVal);
-int  Skein_512_Output   (struct skein_512_ctx *ctx, u8 * hashVal);
-int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 * hashVal);
+int  Skein_256_Output   (struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Output   (struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
 #endif
 
 /*****************************************************************
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 4ad294f7945d..2c52797918cf 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -123,7 +123,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size);
+    int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size);
 
     /**
      * Initialize a Skein context.
@@ -139,7 +139,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     SKEIN_SUCESS of SKEIN_FAIL
      * @see skeinReset
      */
-    int skeinInit(struct skein_ctx* ctx, size_t hashBitLen);
+    int skeinInit(struct skein_ctx *ctx, size_t hashBitLen);
 
     /**
      * Resets a Skein context for further use.
@@ -151,7 +151,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @param ctx
      *     Pointer to a pre-initialized Skein MAC context
      */
-    void skeinReset(struct skein_ctx* ctx);
+    void skeinReset(struct skein_ctx *ctx);
     
     /**
      * Initializes a Skein context for MAC usage.
@@ -173,7 +173,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
+    int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
                      size_t hashBitLen);
 
     /**
@@ -222,7 +222,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     Success or error code.
      * @see skeinReset
      */
-    int skeinFinal(struct skein_ctx* ctx, u8* hash);
+    int skeinFinal(struct skein_ctx *ctx, u8 *hash);
 
 /**
  * @}
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 194e313b6b62..1f9e6e14f50b 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -72,7 +72,7 @@
      * @param tweak
      *     Pointer to the two tweak words (word has 64 bits).
      */
-    void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize, u64* keyData, u64* tweak);
+    void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize, u64 *keyData, u64 *tweak);
     
     /**
      * Encrypt Threefisch block (bytes).
@@ -89,7 +89,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in, u8* out);
+    void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
     
     /**
      * Encrypt Threefisch block (words).
@@ -108,7 +108,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in, u64* out);
+    void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
 
     /**
      * Decrypt Threefisch block (bytes).
@@ -125,7 +125,7 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in, u8* out);
+    void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
     /**
      * Decrypt Threefisch block (words).
@@ -144,14 +144,14 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockWords(struct threefish_key* keyCtx, u64* in, u64* out);
+    void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
 
-    void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishDecrypt256(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishDecrypt512(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishDecrypt1024(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 6bd2da0eaa5f..df92806c4ec4 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -27,7 +27,7 @@ OTHER DEALINGS IN THE SOFTWARE.
 #include <linux/string.h>
 #include <skeinApi.h>
 
-int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size)
+int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size)
 {
     Skein_Assert(ctx && size, SKEIN_FAIL);
 
@@ -37,11 +37,11 @@ int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size)
     return SKEIN_SUCCESS;
 }
 
-int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
+int skeinInit(struct skein_ctx *ctx, size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
     size_t Xlen = 0;
-    u64*  X = NULL;
+    u64 *X = NULL;
     u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
     Skein_Assert(ctx, SKEIN_FAIL);
@@ -78,11 +78,11 @@ int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
     return ret;
 }
 
-int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
+int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
                  size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
-    u64*  X = NULL;
+    u64 *X = NULL;
     size_t Xlen = 0;
     u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
@@ -119,10 +119,10 @@ int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
     return ret;
 }
 
-void skeinReset(struct skein_ctx* ctx)
+void skeinReset(struct skein_ctx *ctx)
 {
     size_t Xlen = 0;
-    u64*  X = NULL;
+    u64 *X = NULL;
 
     /*
      * The following two lines rely of the fact that the real Skein contexts are
@@ -169,7 +169,7 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
      */
     size_t length;
     u8 mask;
-    u8* up;
+    u8 *up;
 
     /* only the final Update() call is allowed do partial bytes, else assert an error */
     Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
@@ -199,7 +199,7 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
     return SKEIN_SUCCESS;
 }
 
-int skeinFinal(struct skein_ctx* ctx, u8* hash)
+int skeinFinal(struct skein_ctx *ctx, u8 *hash)
 {
     int ret = SKEIN_FAIL;
     Skein_Assert(ctx, SKEIN_FAIL);
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 9e821fcdb067..e3be37ea8024 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
         {
 
     u64 b0 = input[0], b1 = input[1],
@@ -684,7 +684,7 @@ void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
             output[15] = b15 + k1 + 20;
         }
 
-void threefishDecrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
 {
 
     u64 b0 = input[0], b1 = input[1],
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index 68ac4c50f01e..09ea5099bc76 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
   {
 
     u64 b0 = input[0], b1 = input[1],
@@ -172,7 +172,7 @@ void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
     output[3] = b3 + k1 + 18;
   }
 
-void threefishDecrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
   {
     u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3];
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index e94bb93722df..5262f5a8f21b 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
     {
 
     u64 b0 = input[0], b1 = input[1],
@@ -316,7 +316,7 @@ void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
         output[7] = b7 + k7 + 18;
     }
 
-void threefishDecrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
     {
 
     u64 b0 = input[0], b1 = input[1],
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 37f96215159d..53f46f6cb9ca 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -3,8 +3,8 @@
 #include <linux/string.h>
 #include <threefishApi.h>
 
-void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize,
-                     u64* keyData, u64* tweak)
+void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize,
+                     u64 *keyData, u64 *tweak)
 {
     int keyWords = stateSize / 64;
     int i;
@@ -22,8 +22,8 @@ void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize
     keyCtx->stateSize = stateSize;
 }
 
-void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in,
-                                u8* out)
+void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
+                                u8 *out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
     u64 cipher[SKEIN_MAX_STATE_WORDS];
@@ -33,8 +33,8 @@ void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in,
     Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
 }
 
-void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in,
-                                u64* out)
+void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in,
+                                u64 *out)
 {
     switch (keyCtx->stateSize) {
         case Threefish256:
@@ -49,8 +49,8 @@ void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in,
     }
 }
 
-void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in,
-                                u8* out)
+void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
+                                u8 *out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
     u64 cipher[SKEIN_MAX_STATE_WORDS];
@@ -60,8 +60,8 @@ void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in,
     Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
 }
 
-void threefishDecryptBlockWords(struct threefish_key* keyCtx, u64* in,
-                                u64* out)
+void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in,
+                                u64 *out)
 {
     switch (keyCtx->stateSize) {
         case Threefish256:
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 11/22] staging: crypto: skein: cleanup whitespace around operators/punc.
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (9 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 10/22] staging: crypto: skein: fixup pointer whitespace Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 12/22] staging: crypto: skein: dos2unix, remove executable perms Jason Cooper
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h    | 168 +++++-----
 drivers/staging/skein/include/skein_iv.h | 224 +++++++-------
 drivers/staging/skein/skein.c            | 352 ++++++++++-----------
 drivers/staging/skein/skeinApi.c         |  22 +-
 drivers/staging/skein/skeinBlockNo3F.c   |  20 +-
 drivers/staging/skein/skein_block.c      | 513 +++++++++++++++----------------
 6 files changed, 648 insertions(+), 651 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index b7cd6c0cef2f..fef29ad64c93 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -29,12 +29,12 @@
 ***************************************************************************/
 
 #ifndef RotL_64
-#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
+#define RotL_64(x, N)    (((x) << (N)) | ((x) >> (64-(N))))
 #endif
 
 /* below two prototype assume we are handed aligned data */
-#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
-#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
+#define Skein_Put64_LSB_First(dst08, src64, bCnt) memcpy(dst08, src64, bCnt)
+#define Skein_Get64_LSB_First(dst64, src08, wCnt) memcpy(dst64, src08, 8*(wCnt))
 #define Skein_Swap64(w64)  (w64)
 
 enum
@@ -44,24 +44,24 @@ enum
     SKEIN_BAD_HASHLEN     =      2
     };
 
-#define  SKEIN_MODIFIER_WORDS  ( 2)          /* number of modifier (tweak) words */
+#define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
 
-#define  SKEIN_256_STATE_WORDS ( 4)
-#define  SKEIN_512_STATE_WORDS ( 8)
+#define  SKEIN_256_STATE_WORDS  (4)
+#define  SKEIN_512_STATE_WORDS  (8)
 #define  SKEIN1024_STATE_WORDS (16)
 #define  SKEIN_MAX_STATE_WORDS (16)
 
-#define  SKEIN_256_STATE_BYTES ( 8*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_STATE_BYTES ( 8*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_STATE_BYTES ( 8*SKEIN1024_STATE_WORDS)
+#define  SKEIN_256_STATE_BYTES  (8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BYTES  (8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BYTES  (8*SKEIN1024_STATE_WORDS)
 
 #define  SKEIN_256_STATE_BITS  (64*SKEIN_256_STATE_WORDS)
 #define  SKEIN_512_STATE_BITS  (64*SKEIN_512_STATE_WORDS)
 #define  SKEIN1024_STATE_BITS  (64*SKEIN1024_STATE_WORDS)
 
-#define  SKEIN_256_BLOCK_BYTES ( 8*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_BLOCK_BYTES ( 8*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_BLOCK_BYTES ( 8*SKEIN1024_STATE_WORDS)
+#define  SKEIN_256_BLOCK_BYTES  (8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_BLOCK_BYTES  (8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
 
 struct skein_ctx_hdr
     {
@@ -92,17 +92,17 @@ struct skein1024_ctx                              /* 1024-bit Skein hash context
     };
 
 /*   Skein APIs for (incremental) "straight hashing" */
-int  Skein_256_Init  (struct skein_256_ctx *ctx, size_t hashBitLen);
-int  Skein_512_Init  (struct skein_512_ctx *ctx, size_t hashBitLen);
-int  Skein1024_Init  (struct skein1024_ctx *ctx, size_t hashBitLen);
+int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
+int  Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen);
+int  Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen);
 
 int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 
-int  Skein_256_Final (struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Final (struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Final (struct skein1024_ctx *ctx, u8 *hashVal);
+int  Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
 
 /*
 **   Skein APIs for "extended" initialization: MAC keys, tree hashing.
@@ -135,9 +135,9 @@ int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_TREE_HASH (1)
 #endif
 #if  SKEIN_TREE_HASH
-int  Skein_256_Output   (struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Output   (struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
+int  Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #endif
 
 /*****************************************************************
@@ -158,18 +158,18 @@ int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
                                 
 /* tweak word T[1]: flag bit definition(s) */
-#define SKEIN_T1_FLAG_FIRST     (((u64)  1 ) << SKEIN_T1_POS_FIRST)
-#define SKEIN_T1_FLAG_FINAL     (((u64)  1 ) << SKEIN_T1_POS_FINAL)
-#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1 ) << SKEIN_T1_POS_BIT_PAD)
+#define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
+#define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
+#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
                                 
 /* tweak word T[1]: tree level bit field mask */
 #define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
 #define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
 
 /* tweak word T[1]: block type field */
-#define SKEIN_BLK_TYPE_KEY      ( 0)                    /* key, for MAC and KDF */
-#define SKEIN_BLK_TYPE_CFG      ( 4)                    /* configuration block */
-#define SKEIN_BLK_TYPE_PERS     ( 8)                    /* personalization string */
+#define SKEIN_BLK_TYPE_KEY       (0)                    /* key, for MAC and KDF */
+#define SKEIN_BLK_TYPE_CFG       (4)                    /* configuration block */
+#define SKEIN_BLK_TYPE_PERS      (8)                    /* personalization string */
 #define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
 #define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
 #define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
@@ -197,73 +197,73 @@ int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
 #endif
 
-#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64) (hi32)) << 32))
-#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION,SKEIN_ID_STRING_LE)
-#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA,0xA9FC1A22)
+#define SKEIN_MK_64(hi32, lo32)  ((lo32) + (((u64) (hi32)) << 32))
+#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION, SKEIN_ID_STRING_LE)
+#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA, 0xA9FC1A22)
 
 #define SKEIN_CFG_STR_LEN       (4*8)
 
 /* bit field definitions in config block treeInfo word */
-#define SKEIN_CFG_TREE_LEAF_SIZE_POS  ( 0)
-#define SKEIN_CFG_TREE_NODE_SIZE_POS  ( 8)
+#define SKEIN_CFG_TREE_LEAF_SIZE_POS  (0)
+#define SKEIN_CFG_TREE_NODE_SIZE_POS  (8)
 #define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
 
 #define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
 #define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
 #define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
 
-#define SKEIN_CFG_TREE_INFO(leaf,node,maxLvl)                   \
-    ( (((u64)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
-      (((u64)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
-      (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )
+#define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
+    ((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+     (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+     (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
 
-#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0,0,0) /* use as treeInfo in InitExt() call for sequential processing */
+#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
 
 /*
 **   Skein macros for getting/setting tweak words, etc.
 **   These are useful for partial input bytes, hash tree init/update, etc.
 **/
-#define Skein_Get_Tweak(ctxPtr,TWK_NUM)         ((ctxPtr)->h.T[TWK_NUM])
-#define Skein_Set_Tweak(ctxPtr,TWK_NUM,tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal);}
+#define Skein_Get_Tweak(ctxPtr, TWK_NUM)          ((ctxPtr)->h.T[TWK_NUM])
+#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal); }
 
-#define Skein_Get_T0(ctxPtr)    Skein_Get_Tweak(ctxPtr,0)
-#define Skein_Get_T1(ctxPtr)    Skein_Get_Tweak(ctxPtr,1)
-#define Skein_Set_T0(ctxPtr,T0) Skein_Set_Tweak(ctxPtr,0,T0)
-#define Skein_Set_T1(ctxPtr,T1) Skein_Set_Tweak(ctxPtr,1,T1)
+#define Skein_Get_T0(ctxPtr)     Skein_Get_Tweak(ctxPtr, 0)
+#define Skein_Get_T1(ctxPtr)     Skein_Get_Tweak(ctxPtr, 1)
+#define Skein_Set_T0(ctxPtr, T0) Skein_Set_Tweak(ctxPtr, 0, T0)
+#define Skein_Set_T1(ctxPtr, T1) Skein_Set_Tweak(ctxPtr, 1, T1)
 
 /* set both tweak words at once */
-#define Skein_Set_T0_T1(ctxPtr,T0,T1)           \
+#define Skein_Set_T0_T1(ctxPtr, T0, T1)           \
     {                                           \
-    Skein_Set_T0(ctxPtr,(T0));                  \
-    Skein_Set_T1(ctxPtr,(T1));                  \
+    Skein_Set_T0(ctxPtr, (T0));                  \
+    Skein_Set_T1(ctxPtr, (T1));                  \
     }
 
-#define Skein_Set_Type(ctxPtr,BLK_TYPE)         \
-    Skein_Set_T1(ctxPtr,SKEIN_T1_BLK_TYPE_##BLK_TYPE)
+#define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
+    Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
 
 /* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
-#define Skein_Start_New_Type(ctxPtr,BLK_TYPE)   \
-    { Skein_Set_T0_T1(ctxPtr,0,SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt=0; }
+#define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
+    { Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
 
 #define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
 #define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
 
-#define Skein_Set_Tree_Level(hdr,height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height);}
+#define Skein_Set_Tree_Level(hdr, height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); }
 
 /*****************************************************************
 ** "Internal" Skein definitions for debugging and error checking
 ******************************************************************/
-#ifdef  SKEIN_DEBUG             /* examine/display intermediate values? */
+#ifdef SKEIN_DEBUG             /* examine/display intermediate values? */
 #include "skein_debug.h"
 #else                           /* default is no callouts */
-#define Skein_Show_Block(bits,ctx,X,blkPtr,wPtr,ksEvenPtr,ksOddPtr)
-#define Skein_Show_Round(bits,ctx,r,X)
-#define Skein_Show_R_Ptr(bits,ctx,r,X_ptr)
-#define Skein_Show_Final(bits,ctx,cnt,outPtr)
-#define Skein_Show_Key(bits,ctx,key,keyBytes)
+#define Skein_Show_Block(bits, ctx, X, blkPtr, wPtr, ksEvenPtr, ksOddPtr)
+#define Skein_Show_Round(bits, ctx, r, X)
+#define Skein_Show_R_Ptr(bits, ctx, r, X_ptr)
+#define Skein_Show_Final(bits, ctx, cnt, outPtr)
+#define Skein_Show_Key(bits, ctx, key, keyBytes)
 #endif
 
-#define Skein_Assert(x,retCode)/* default: ignore all Asserts, for performance */
+#define Skein_Assert(x, retCode)/* default: ignore all Asserts, for performance */
 #define Skein_assert(x)
 
 /*****************************************************************
@@ -272,34 +272,34 @@ int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
 enum    
     {   
         /* Skein_256 round rotation constants */
-    R_256_0_0=14, R_256_0_1=16,
-    R_256_1_0=52, R_256_1_1=57,
-    R_256_2_0=23, R_256_2_1=40,
-    R_256_3_0= 5, R_256_3_1=37,
-    R_256_4_0=25, R_256_4_1=33,
-    R_256_5_0=46, R_256_5_1=12,
-    R_256_6_0=58, R_256_6_1=22,
-    R_256_7_0=32, R_256_7_1=32,
+    R_256_0_0 = 14, R_256_0_1 = 16,
+    R_256_1_0 = 52, R_256_1_1 = 57,
+    R_256_2_0 = 23, R_256_2_1 = 40,
+    R_256_3_0 =  5, R_256_3_1 = 37,
+    R_256_4_0 = 25, R_256_4_1 = 33,
+    R_256_5_0 = 46, R_256_5_1 = 12,
+    R_256_6_0 = 58, R_256_6_1 = 22,
+    R_256_7_0 = 32, R_256_7_1 = 32,
 
         /* Skein_512 round rotation constants */
-    R_512_0_0=46, R_512_0_1=36, R_512_0_2=19, R_512_0_3=37,
-    R_512_1_0=33, R_512_1_1=27, R_512_1_2=14, R_512_1_3=42,
-    R_512_2_0=17, R_512_2_1=49, R_512_2_2=36, R_512_2_3=39,
-    R_512_3_0=44, R_512_3_1= 9, R_512_3_2=54, R_512_3_3=56,
-    R_512_4_0=39, R_512_4_1=30, R_512_4_2=34, R_512_4_3=24,
-    R_512_5_0=13, R_512_5_1=50, R_512_5_2=10, R_512_5_3=17,
-    R_512_6_0=25, R_512_6_1=29, R_512_6_2=39, R_512_6_3=43,
-    R_512_7_0= 8, R_512_7_1=35, R_512_7_2=56, R_512_7_3=22,
+    R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
+    R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
+    R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
+    R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
+    R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
+    R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
+    R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
+    R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
 
         /* Skein1024 round rotation constants */
-    R1024_0_0=24, R1024_0_1=13, R1024_0_2= 8, R1024_0_3=47, R1024_0_4= 8, R1024_0_5=17, R1024_0_6=22, R1024_0_7=37,
-    R1024_1_0=38, R1024_1_1=19, R1024_1_2=10, R1024_1_3=55, R1024_1_4=49, R1024_1_5=18, R1024_1_6=23, R1024_1_7=52,
-    R1024_2_0=33, R1024_2_1= 4, R1024_2_2=51, R1024_2_3=13, R1024_2_4=34, R1024_2_5=41, R1024_2_6=59, R1024_2_7=17,
-    R1024_3_0= 5, R1024_3_1=20, R1024_3_2=48, R1024_3_3=41, R1024_3_4=47, R1024_3_5=28, R1024_3_6=16, R1024_3_7=25,
-    R1024_4_0=41, R1024_4_1= 9, R1024_4_2=37, R1024_4_3=31, R1024_4_4=12, R1024_4_5=47, R1024_4_6=44, R1024_4_7=30,
-    R1024_5_0=16, R1024_5_1=34, R1024_5_2=56, R1024_5_3=51, R1024_5_4= 4, R1024_5_5=53, R1024_5_6=42, R1024_5_7=41,
-    R1024_6_0=31, R1024_6_1=44, R1024_6_2=47, R1024_6_3=46, R1024_6_4=19, R1024_6_5=42, R1024_6_6=44, R1024_6_7=25,
-    R1024_7_0= 9, R1024_7_1=48, R1024_7_2=35, R1024_7_3=52, R1024_7_4=23, R1024_7_5=31, R1024_7_6=37, R1024_7_7=20
+    R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
+    R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
+    R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
+    R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
+    R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
+    R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
+    R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
+    R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
     };
 
 #ifndef SKEIN_ROUNDS
@@ -308,8 +308,8 @@ enum
 #define SKEIN1024_ROUNDS_TOTAL (80)
 #else                                        /* allow command-line define in range 8*(5..14)   */
 #define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
-#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/ 10) + 5) % 10) + 5))
-#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS    ) + 5) % 10) + 5))
+#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/10)  + 5) % 10) + 5))
+#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS)     + 5) % 10) + 5))
 #endif
 
 #endif  /* ifndef _SKEIN_H_ */
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index 94ac2f7cde76..aff9394551a0 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -22,178 +22,178 @@
 /* blkSize =  256 bits. hashSize =  128 bits */
 const u64 SKEIN_256_IV_128[] =
     {
-    MK_64(0xE1111906,0x964D7260),
-    MK_64(0x883DAAA7,0x7C8D811C),
-    MK_64(0x10080DF4,0x91960F7A),
-    MK_64(0xCCF7DDE5,0xB45BC1C2)
+    MK_64(0xE1111906, 0x964D7260),
+    MK_64(0x883DAAA7, 0x7C8D811C),
+    MK_64(0x10080DF4, 0x91960F7A),
+    MK_64(0xCCF7DDE5, 0xB45BC1C2)
     };
 
 /* blkSize =  256 bits. hashSize =  160 bits */
 const u64 SKEIN_256_IV_160[] =
     {
-    MK_64(0x14202314,0x72825E98),
-    MK_64(0x2AC4E9A2,0x5A77E590),
-    MK_64(0xD47A5856,0x8838D63E),
-    MK_64(0x2DD2E496,0x8586AB7D)
+    MK_64(0x14202314, 0x72825E98),
+    MK_64(0x2AC4E9A2, 0x5A77E590),
+    MK_64(0xD47A5856, 0x8838D63E),
+    MK_64(0x2DD2E496, 0x8586AB7D)
     };
 
 /* blkSize =  256 bits. hashSize =  224 bits */
 const u64 SKEIN_256_IV_224[] =
     {
-    MK_64(0xC6098A8C,0x9AE5EA0B),
-    MK_64(0x876D5686,0x08C5191C),
-    MK_64(0x99CB88D7,0xD7F53884),
-    MK_64(0x384BDDB1,0xAEDDB5DE)
+    MK_64(0xC6098A8C, 0x9AE5EA0B),
+    MK_64(0x876D5686, 0x08C5191C),
+    MK_64(0x99CB88D7, 0xD7F53884),
+    MK_64(0x384BDDB1, 0xAEDDB5DE)
     };
 
 /* blkSize =  256 bits. hashSize =  256 bits */
 const u64 SKEIN_256_IV_256[] =
     {
-    MK_64(0xFC9DA860,0xD048B449),
-    MK_64(0x2FCA6647,0x9FA7D833),
-    MK_64(0xB33BC389,0x6656840F),
-    MK_64(0x6A54E920,0xFDE8DA69)
+    MK_64(0xFC9DA860, 0xD048B449),
+    MK_64(0x2FCA6647, 0x9FA7D833),
+    MK_64(0xB33BC389, 0x6656840F),
+    MK_64(0x6A54E920, 0xFDE8DA69)
     };
 
 /* blkSize =  512 bits. hashSize =  128 bits */
 const u64 SKEIN_512_IV_128[] =
     {
-    MK_64(0xA8BC7BF3,0x6FBF9F52),
-    MK_64(0x1E9872CE,0xBD1AF0AA),
-    MK_64(0x309B1790,0xB32190D3),
-    MK_64(0xBCFBB854,0x3F94805C),
-    MK_64(0x0DA61BCD,0x6E31B11B),
-    MK_64(0x1A18EBEA,0xD46A32E3),
-    MK_64(0xA2CC5B18,0xCE84AA82),
-    MK_64(0x6982AB28,0x9D46982D)
+    MK_64(0xA8BC7BF3, 0x6FBF9F52),
+    MK_64(0x1E9872CE, 0xBD1AF0AA),
+    MK_64(0x309B1790, 0xB32190D3),
+    MK_64(0xBCFBB854, 0x3F94805C),
+    MK_64(0x0DA61BCD, 0x6E31B11B),
+    MK_64(0x1A18EBEA, 0xD46A32E3),
+    MK_64(0xA2CC5B18, 0xCE84AA82),
+    MK_64(0x6982AB28, 0x9D46982D)
     };
 
 /* blkSize =  512 bits. hashSize =  160 bits */
 const u64 SKEIN_512_IV_160[] =
     {
-    MK_64(0x28B81A2A,0xE013BD91),
-    MK_64(0xC2F11668,0xB5BDF78F),
-    MK_64(0x1760D8F3,0xF6A56F12),
-    MK_64(0x4FB74758,0x8239904F),
-    MK_64(0x21EDE07F,0x7EAF5056),
-    MK_64(0xD908922E,0x63ED70B8),
-    MK_64(0xB8EC76FF,0xECCB52FA),
-    MK_64(0x01A47BB8,0xA3F27A6E)
+    MK_64(0x28B81A2A, 0xE013BD91),
+    MK_64(0xC2F11668, 0xB5BDF78F),
+    MK_64(0x1760D8F3, 0xF6A56F12),
+    MK_64(0x4FB74758, 0x8239904F),
+    MK_64(0x21EDE07F, 0x7EAF5056),
+    MK_64(0xD908922E, 0x63ED70B8),
+    MK_64(0xB8EC76FF, 0xECCB52FA),
+    MK_64(0x01A47BB8, 0xA3F27A6E)
     };
 
 /* blkSize =  512 bits. hashSize =  224 bits */
 const u64 SKEIN_512_IV_224[] =
     {
-    MK_64(0xCCD06162,0x48677224),
-    MK_64(0xCBA65CF3,0xA92339EF),
-    MK_64(0x8CCD69D6,0x52FF4B64),
-    MK_64(0x398AED7B,0x3AB890B4),
-    MK_64(0x0F59D1B1,0x457D2BD0),
-    MK_64(0x6776FE65,0x75D4EB3D),
-    MK_64(0x99FBC70E,0x997413E9),
-    MK_64(0x9E2CFCCF,0xE1C41EF7)
+    MK_64(0xCCD06162, 0x48677224),
+    MK_64(0xCBA65CF3, 0xA92339EF),
+    MK_64(0x8CCD69D6, 0x52FF4B64),
+    MK_64(0x398AED7B, 0x3AB890B4),
+    MK_64(0x0F59D1B1, 0x457D2BD0),
+    MK_64(0x6776FE65, 0x75D4EB3D),
+    MK_64(0x99FBC70E, 0x997413E9),
+    MK_64(0x9E2CFCCF, 0xE1C41EF7)
     };
 
 /* blkSize =  512 bits. hashSize =  256 bits */
 const u64 SKEIN_512_IV_256[] =
     {
-    MK_64(0xCCD044A1,0x2FDB3E13),
-    MK_64(0xE8359030,0x1A79A9EB),
-    MK_64(0x55AEA061,0x4F816E6F),
-    MK_64(0x2A2767A4,0xAE9B94DB),
-    MK_64(0xEC06025E,0x74DD7683),
-    MK_64(0xE7A436CD,0xC4746251),
-    MK_64(0xC36FBAF9,0x393AD185),
-    MK_64(0x3EEDBA18,0x33EDFC13)
+    MK_64(0xCCD044A1, 0x2FDB3E13),
+    MK_64(0xE8359030, 0x1A79A9EB),
+    MK_64(0x55AEA061, 0x4F816E6F),
+    MK_64(0x2A2767A4, 0xAE9B94DB),
+    MK_64(0xEC06025E, 0x74DD7683),
+    MK_64(0xE7A436CD, 0xC4746251),
+    MK_64(0xC36FBAF9, 0x393AD185),
+    MK_64(0x3EEDBA18, 0x33EDFC13)
     };
 
 /* blkSize =  512 bits. hashSize =  384 bits */
 const u64 SKEIN_512_IV_384[] =
     {
-    MK_64(0xA3F6C6BF,0x3A75EF5F),
-    MK_64(0xB0FEF9CC,0xFD84FAA4),
-    MK_64(0x9D77DD66,0x3D770CFE),
-    MK_64(0xD798CBF3,0xB468FDDA),
-    MK_64(0x1BC4A666,0x8A0E4465),
-    MK_64(0x7ED7D434,0xE5807407),
-    MK_64(0x548FC1AC,0xD4EC44D6),
-    MK_64(0x266E1754,0x6AA18FF8)
+    MK_64(0xA3F6C6BF, 0x3A75EF5F),
+    MK_64(0xB0FEF9CC, 0xFD84FAA4),
+    MK_64(0x9D77DD66, 0x3D770CFE),
+    MK_64(0xD798CBF3, 0xB468FDDA),
+    MK_64(0x1BC4A666, 0x8A0E4465),
+    MK_64(0x7ED7D434, 0xE5807407),
+    MK_64(0x548FC1AC, 0xD4EC44D6),
+    MK_64(0x266E1754, 0x6AA18FF8)
     };
 
 /* blkSize =  512 bits. hashSize =  512 bits */
 const u64 SKEIN_512_IV_512[] =
     {
-    MK_64(0x4903ADFF,0x749C51CE),
-    MK_64(0x0D95DE39,0x9746DF03),
-    MK_64(0x8FD19341,0x27C79BCE),
-    MK_64(0x9A255629,0xFF352CB1),
-    MK_64(0x5DB62599,0xDF6CA7B0),
-    MK_64(0xEABE394C,0xA9D5C3F4),
-    MK_64(0x991112C7,0x1A75B523),
-    MK_64(0xAE18A40B,0x660FCC33)
+    MK_64(0x4903ADFF, 0x749C51CE),
+    MK_64(0x0D95DE39, 0x9746DF03),
+    MK_64(0x8FD19341, 0x27C79BCE),
+    MK_64(0x9A255629, 0xFF352CB1),
+    MK_64(0x5DB62599, 0xDF6CA7B0),
+    MK_64(0xEABE394C, 0xA9D5C3F4),
+    MK_64(0x991112C7, 0x1A75B523),
+    MK_64(0xAE18A40B, 0x660FCC33)
     };
 
 /* blkSize = 1024 bits. hashSize =  384 bits */
 const u64 SKEIN1024_IV_384[] =
     {
-    MK_64(0x5102B6B8,0xC1894A35),
-    MK_64(0xFEEBC9E3,0xFE8AF11A),
-    MK_64(0x0C807F06,0xE32BED71),
-    MK_64(0x60C13A52,0xB41A91F6),
-    MK_64(0x9716D35D,0xD4917C38),
-    MK_64(0xE780DF12,0x6FD31D3A),
-    MK_64(0x797846B6,0xC898303A),
-    MK_64(0xB172C2A8,0xB3572A3B),
-    MK_64(0xC9BC8203,0xA6104A6C),
-    MK_64(0x65909338,0xD75624F4),
-    MK_64(0x94BCC568,0x4B3F81A0),
-    MK_64(0x3EBBF51E,0x10ECFD46),
-    MK_64(0x2DF50F0B,0xEEB08542),
-    MK_64(0x3B5A6530,0x0DBC6516),
-    MK_64(0x484B9CD2,0x167BBCE1),
-    MK_64(0x2D136947,0xD4CBAFEA)
+    MK_64(0x5102B6B8, 0xC1894A35),
+    MK_64(0xFEEBC9E3, 0xFE8AF11A),
+    MK_64(0x0C807F06, 0xE32BED71),
+    MK_64(0x60C13A52, 0xB41A91F6),
+    MK_64(0x9716D35D, 0xD4917C38),
+    MK_64(0xE780DF12, 0x6FD31D3A),
+    MK_64(0x797846B6, 0xC898303A),
+    MK_64(0xB172C2A8, 0xB3572A3B),
+    MK_64(0xC9BC8203, 0xA6104A6C),
+    MK_64(0x65909338, 0xD75624F4),
+    MK_64(0x94BCC568, 0x4B3F81A0),
+    MK_64(0x3EBBF51E, 0x10ECFD46),
+    MK_64(0x2DF50F0B, 0xEEB08542),
+    MK_64(0x3B5A6530, 0x0DBC6516),
+    MK_64(0x484B9CD2, 0x167BBCE1),
+    MK_64(0x2D136947, 0xD4CBAFEA)
     };
 
 /* blkSize = 1024 bits. hashSize =  512 bits */
 const u64 SKEIN1024_IV_512[] =
     {
-    MK_64(0xCAEC0E5D,0x7C1B1B18),
-    MK_64(0xA01B0E04,0x5F03E802),
-    MK_64(0x33840451,0xED912885),
-    MK_64(0x374AFB04,0xEAEC2E1C),
-    MK_64(0xDF25A0E2,0x813581F7),
-    MK_64(0xE4004093,0x8B12F9D2),
-    MK_64(0xA662D539,0xC2ED39B6),
-    MK_64(0xFA8B85CF,0x45D8C75A),
-    MK_64(0x8316ED8E,0x29EDE796),
-    MK_64(0x053289C0,0x2E9F91B8),
-    MK_64(0xC3F8EF1D,0x6D518B73),
-    MK_64(0xBDCEC3C4,0xD5EF332E),
-    MK_64(0x549A7E52,0x22974487),
-    MK_64(0x67070872,0x5B749816),
-    MK_64(0xB9CD28FB,0xF0581BD1),
-    MK_64(0x0E2940B8,0x15804974)
+    MK_64(0xCAEC0E5D, 0x7C1B1B18),
+    MK_64(0xA01B0E04, 0x5F03E802),
+    MK_64(0x33840451, 0xED912885),
+    MK_64(0x374AFB04, 0xEAEC2E1C),
+    MK_64(0xDF25A0E2, 0x813581F7),
+    MK_64(0xE4004093, 0x8B12F9D2),
+    MK_64(0xA662D539, 0xC2ED39B6),
+    MK_64(0xFA8B85CF, 0x45D8C75A),
+    MK_64(0x8316ED8E, 0x29EDE796),
+    MK_64(0x053289C0, 0x2E9F91B8),
+    MK_64(0xC3F8EF1D, 0x6D518B73),
+    MK_64(0xBDCEC3C4, 0xD5EF332E),
+    MK_64(0x549A7E52, 0x22974487),
+    MK_64(0x67070872, 0x5B749816),
+    MK_64(0xB9CD28FB, 0xF0581BD1),
+    MK_64(0x0E2940B8, 0x15804974)
     };
 
 /* blkSize = 1024 bits. hashSize = 1024 bits */
 const u64 SKEIN1024_IV_1024[] =
     {
-    MK_64(0xD593DA07,0x41E72355),
-    MK_64(0x15B5E511,0xAC73E00C),
-    MK_64(0x5180E5AE,0xBAF2C4F0),
-    MK_64(0x03BD41D3,0xFCBCAFAF),
-    MK_64(0x1CAEC6FD,0x1983A898),
-    MK_64(0x6E510B8B,0xCDD0589F),
-    MK_64(0x77E2BDFD,0xC6394ADA),
-    MK_64(0xC11E1DB5,0x24DCB0A3),
-    MK_64(0xD6D14AF9,0xC6329AB5),
-    MK_64(0x6A9B0BFC,0x6EB67E0D),
-    MK_64(0x9243C60D,0xCCFF1332),
-    MK_64(0x1A1F1DDE,0x743F02D4),
-    MK_64(0x0996753C,0x10ED0BB8),
-    MK_64(0x6572DD22,0xF2B4969A),
-    MK_64(0x61FD3062,0xD00A579A),
-    MK_64(0x1DE0536E,0x8682E539)
+    MK_64(0xD593DA07, 0x41E72355),
+    MK_64(0x15B5E511, 0xAC73E00C),
+    MK_64(0x5180E5AE, 0xBAF2C4F0),
+    MK_64(0x03BD41D3, 0xFCBCAFAF),
+    MK_64(0x1CAEC6FD, 0x1983A898),
+    MK_64(0x6E510B8B, 0xCDD0589F),
+    MK_64(0x77E2BDFD, 0xC6394ADA),
+    MK_64(0xC11E1DB5, 0x24DCB0A3),
+    MK_64(0xD6D14AF9, 0xC6329AB5),
+    MK_64(0x6A9B0BFC, 0x6EB67E0D),
+    MK_64(0x9243C60D, 0xCCFF1332),
+    MK_64(0x1A1F1DDE, 0x743F02D4),
+    MK_64(0x0996753C, 0x10ED0BB8),
+    MK_64(0x6572DD22, 0xF2B4969A),
+    MK_64(0x61FD3062, 0xD00A579A),
+    MK_64(0x1DE0536E, 0x8682E539)
     };
 
 #endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 2bed7c163316..0ea0a6aeb168 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -16,9 +16,9 @@
 
 /*****************************************************************/
 /* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void    Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
@@ -34,41 +34,41 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
         u64  w[SKEIN_256_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
     ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
     switch (hashBitLen)
     {             /* use pre-computed values, where available */
     case  256:
-        memcpy(ctx->X,SKEIN_256_IV_256,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
         break;
     case  224:
-        memcpy(ctx->X,SKEIN_256_IV_224,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
         break;
     case  160:
-        memcpy(ctx->X,SKEIN_256_IV_160,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
         break;
     case  128:
-        memcpy(ctx->X,SKEIN_256_IV_128,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
         break;
     default:
         /* here if there is no precomputed IV value available */
         /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
 
         cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
         cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
         cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
 
         /* compute the initial chaining values from config block */
-        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_256_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
         break;
     }
     /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
     return SKEIN_SUCCESS;
 }
@@ -76,7 +76,7 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(struct skein_256_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -84,42 +84,42 @@ int Skein_256_InitExt(struct skein_256_ctx *ctx,size_t hashBitLen,u64 treeInfo,
         u64  w[SKEIN_256_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
     /* compute the initial chaining values ctx->X[], based on key */
     if (keyBytes == 0)                          /* is there a key? */
     {
-        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
     }
     else                                        /* here to pre-process a key */
     {
         Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
         /* do a mini-Init right here */
-        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_256_Update(ctx,key,keyBytes);     /* hash the key */
-        Skein_256_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx,CFG_FINAL);
+    Skein_Start_New_Type(ctx, CFG_FINAL);
 
-    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
     cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
     cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
     cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
 
-    Skein_Show_Key(256,&ctx->h,key,keyBytes);
+    Skein_Show_Key(256, &ctx->h, key, keyBytes);
 
     /* compute the initial chaining values from config block */
-    Skein_256_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+    Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
     /* The chaining vars ctx->X are now initialized */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);
+    Skein_Start_New_Type(ctx, MSG);
 
     return SKEIN_SUCCESS;
 }
@@ -130,7 +130,7 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt
 {
     size_t n;
 
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* process full blocks, if any */
     if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
@@ -141,20 +141,20 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt
             if (n)
             {
                 Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
                 msgByteCnt  -= n;
                 msg         += n;
                 ctx->h.bCnt += n;
             }
             Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
-            Skein_256_Process_Block(ctx,ctx->b,1,SKEIN_256_BLOCK_BYTES);
+            Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
             ctx->h.bCnt = 0;
         }
         /* now process any remaining full blocks, directly from input message data */
         if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
         {
             n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_256_Process_Block(ctx,msg,n,SKEIN_256_BLOCK_BYTES);
+            Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
             msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
             msg        += n * SKEIN_256_BLOCK_BYTES;
         }
@@ -165,7 +165,7 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt
     if (msgByteCnt)
     {
         Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
         ctx->h.bCnt += msgByteCnt;
     }
 
@@ -176,33 +176,33 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt
 /* finalize the hash computation and output the result */
 int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
     if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
 
-    Skein_256_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_256_BLOCK_BYTES)
             n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -221,42 +221,42 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
         u64  w[SKEIN_512_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
     ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
     switch (hashBitLen)
     {             /* use pre-computed values, where available */
     case  512:
-        memcpy(ctx->X,SKEIN_512_IV_512,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
         break;
     case  384:
-        memcpy(ctx->X,SKEIN_512_IV_384,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
         break;
     case  256:
-        memcpy(ctx->X,SKEIN_512_IV_256,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
         break;
     case  224:
-        memcpy(ctx->X,SKEIN_512_IV_224,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
         break;
     default:
         /* here if there is no precomputed IV value available */
         /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
 
         cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
         cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
         cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
 
         /* compute the initial chaining values from config block */
-        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_512_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
         break;
     }
 
     /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
     return SKEIN_SUCCESS;
 }
@@ -264,7 +264,7 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(struct skein_512_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -272,42 +272,42 @@ int Skein_512_InitExt(struct skein_512_ctx *ctx,size_t hashBitLen,u64 treeInfo,
         u64  w[SKEIN_512_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
     /* compute the initial chaining values ctx->X[], based on key */
     if (keyBytes == 0)                          /* is there a key? */
     {
-        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
     }
     else                                        /* here to pre-process a key */
     {
         Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
         /* do a mini-Init right here */
-        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_512_Update(ctx,key,keyBytes);     /* hash the key */
-        Skein_512_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx,CFG_FINAL);
+    Skein_Start_New_Type(ctx, CFG_FINAL);
 
-    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
     cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
     cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
     cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
 
-    Skein_Show_Key(512,&ctx->h,key,keyBytes);
+    Skein_Show_Key(512, &ctx->h, key, keyBytes);
 
     /* compute the initial chaining values from config block */
-    Skein_512_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+    Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
     /* The chaining vars ctx->X are now initialized */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);
+    Skein_Start_New_Type(ctx, MSG);
 
     return SKEIN_SUCCESS;
 }
@@ -318,7 +318,7 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt
 {
     size_t n;
 
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* process full blocks, if any */
     if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
@@ -329,20 +329,20 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt
             if (n)
             {
                 Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
                 msgByteCnt  -= n;
                 msg         += n;
                 ctx->h.bCnt += n;
             }
             Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
-            Skein_512_Process_Block(ctx,ctx->b,1,SKEIN_512_BLOCK_BYTES);
+            Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
             ctx->h.bCnt = 0;
         }
         /* now process any remaining full blocks, directly from input message data */
         if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
         {
             n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_512_Process_Block(ctx,msg,n,SKEIN_512_BLOCK_BYTES);
+            Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
             msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
             msg        += n * SKEIN_512_BLOCK_BYTES;
         }
@@ -353,7 +353,7 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt
     if (msgByteCnt)
     {
         Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
         ctx->h.bCnt += msgByteCnt;
     }
 
@@ -364,33 +364,33 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt
 /* finalize the hash computation and output the result */
 int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
     if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
 
-    Skein_512_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_512_BLOCK_BYTES)
             n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(512,&ctx->h,n,hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -409,39 +409,39 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
         u64  w[SKEIN1024_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
     ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
     switch (hashBitLen)
     {              /* use pre-computed values, where available */
     case  512:
-        memcpy(ctx->X,SKEIN1024_IV_512 ,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
         break;
     case  384:
-        memcpy(ctx->X,SKEIN1024_IV_384 ,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
         break;
     case 1024:
-        memcpy(ctx->X,SKEIN1024_IV_1024,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
         break;
     default:
         /* here if there is no precomputed IV value available */
         /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
 
         cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
         cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
         cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
 
         /* compute the initial chaining values from config block */
-        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
-        Skein1024_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
         break;
     }
 
     /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
     return SKEIN_SUCCESS;
 }
@@ -449,7 +449,7 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(struct skein1024_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -457,42 +457,42 @@ int Skein1024_InitExt(struct skein1024_ctx *ctx,size_t hashBitLen,u64 treeInfo,
         u64  w[SKEIN1024_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
     /* compute the initial chaining values ctx->X[], based on key */
     if (keyBytes == 0)                          /* is there a key? */
     {
-        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
     }
     else                                        /* here to pre-process a key */
     {
         Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
         /* do a mini-Init right here */
-        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein1024_Update(ctx,key,keyBytes);     /* hash the key */
-        Skein1024_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx,CFG_FINAL);
+    Skein_Start_New_Type(ctx, CFG_FINAL);
 
-    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
     cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
     cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
     cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
 
-    Skein_Show_Key(1024,&ctx->h,key,keyBytes);
+    Skein_Show_Key(1024, &ctx->h, key, keyBytes);
 
     /* compute the initial chaining values from config block */
-    Skein1024_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+    Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
     /* The chaining vars ctx->X are now initialized */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);
+    Skein_Start_New_Type(ctx, MSG);
 
     return SKEIN_SUCCESS;
 }
@@ -503,7 +503,7 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt
 {
     size_t n;
 
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* process full blocks, if any */
     if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
@@ -514,20 +514,20 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt
             if (n)
             {
                 Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
                 msgByteCnt  -= n;
                 msg         += n;
                 ctx->h.bCnt += n;
             }
             Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
-            Skein1024_Process_Block(ctx,ctx->b,1,SKEIN1024_BLOCK_BYTES);
+            Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
             ctx->h.bCnt = 0;
         }
         /* now process any remaining full blocks, directly from input message data */
         if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
         {
             n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein1024_Process_Block(ctx,msg,n,SKEIN1024_BLOCK_BYTES);
+            Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
             msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
             msg        += n * SKEIN1024_BLOCK_BYTES;
         }
@@ -538,7 +538,7 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt
     if (msgByteCnt)
     {
         Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
         ctx->h.bCnt += msgByteCnt;
     }
 
@@ -549,33 +549,33 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt
 /* finalize the hash computation and output the result */
 int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
     if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
 
-    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN1024_BLOCK_BYTES)
             n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(1024,&ctx->h,n,hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -587,14 +587,14 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
     if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_256_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
 
     return SKEIN_SUCCESS;
 }
@@ -603,14 +603,14 @@ int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
     if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_512_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
 
     return SKEIN_SUCCESS;
 }
@@ -619,14 +619,14 @@ int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
     if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
 
     return SKEIN_SUCCESS;
 }
@@ -636,27 +636,27 @@ int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 /* just do the OUTPUT stage                                       */
 int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_256_BLOCK_BYTES)
             n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -665,27 +665,27 @@ int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 /* just do the OUTPUT stage                                       */
 int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_512_BLOCK_BYTES)
             n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -694,27 +694,27 @@ int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 /* just do the OUTPUT stage                                       */
 int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN1024_BLOCK_BYTES)
             n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index df92806c4ec4..a3f471be8db3 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -31,7 +31,7 @@ int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size)
 {
     Skein_Assert(ctx && size, SKEIN_FAIL);
 
-    memset(ctx ,0, sizeof(struct skein_ctx));
+    memset(ctx , 0, sizeof(struct skein_ctx));
     ctx->skeinSize = size;
 
     return SKEIN_SUCCESS;
@@ -97,18 +97,18 @@ int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
     case Skein256:
         ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
                                 treeInfo,
-                                (const u8*)key, keyLen);
+                                (const u8 *)key, keyLen);
 
         break;
     case Skein512:
         ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
                                 treeInfo,
-                                (const u8*)key, keyLen);
+                                (const u8 *)key, keyLen);
         break;
     case Skein1024:
         ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
                                 treeInfo,
-                                (const u8*)key, keyLen);
+                                (const u8 *)key, keyLen);
 
         break;
     }
@@ -146,13 +146,13 @@ int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
 
     switch (ctx->skeinSize) {
     case Skein256:
-        ret = Skein_256_Update(&ctx->m.s256, (const u8*)msg, msgByteCnt);
+        ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg, msgByteCnt);
         break;
     case Skein512:
-        ret = Skein_512_Update(&ctx->m.s512, (const u8*)msg, msgByteCnt);
+        ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg, msgByteCnt);
         break;
     case Skein1024:
-        ret = Skein1024_Update(&ctx->m.s1024, (const u8*)msg, msgByteCnt);
+        ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg, msgByteCnt);
         break;
     }
     return ret;
@@ -186,7 +186,7 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
      * Skein's real partial block buffer.
      * If this layout ever changes we have to adapt this as well.
      */
-    up = (u8*)ctx->m.s256.X + ctx->skeinSize / 8;
+    up = (u8 *)ctx->m.s256.X + ctx->skeinSize / 8;
 
     Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
 
@@ -206,13 +206,13 @@ int skeinFinal(struct skein_ctx *ctx, u8 *hash)
 
     switch (ctx->skeinSize) {
     case Skein256:
-        ret = Skein_256_Final(&ctx->m.s256, (u8*)hash);
+        ret = Skein_256_Final(&ctx->m.s256, (u8 *)hash);
         break;
     case Skein512:
-        ret = Skein_512_Final(&ctx->m.s512, (u8*)hash);
+        ret = Skein_512_Final(&ctx->m.s512, (u8 *)hash);
         break;
     case Skein1024:
-        ret = Skein1024_Final(&ctx->m.s1024, (u8*)hash);
+        ret = Skein1024_Final(&ctx->m.s1024, (u8 *)hash);
         break;
     }
     return ret;
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 02e68dbab0d4..a4b1ec56ad83 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -147,16 +147,16 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
         blkPtr += SKEIN1024_BLOCK_BYTES;
 
         /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[ 0] = ctx->X[ 0] ^ w[ 0];
-        ctx->X[ 1] = ctx->X[ 1] ^ w[ 1];
-        ctx->X[ 2] = ctx->X[ 2] ^ w[ 2];
-        ctx->X[ 3] = ctx->X[ 3] ^ w[ 3];
-        ctx->X[ 4] = ctx->X[ 4] ^ w[ 4];
-        ctx->X[ 5] = ctx->X[ 5] ^ w[ 5];
-        ctx->X[ 6] = ctx->X[ 6] ^ w[ 6];
-        ctx->X[ 7] = ctx->X[ 7] ^ w[ 7];
-        ctx->X[ 8] = ctx->X[ 8] ^ w[ 8];
-        ctx->X[ 9] = ctx->X[ 9] ^ w[ 9];
+        ctx->X[0]  = ctx->X[0]  ^ w[0];
+        ctx->X[1]  = ctx->X[1]  ^ w[1];
+        ctx->X[2]  = ctx->X[2]  ^ w[2];
+        ctx->X[3]  = ctx->X[3]  ^ w[3];
+        ctx->X[4]  = ctx->X[4]  ^ w[4];
+        ctx->X[5]  = ctx->X[5]  ^ w[5];
+        ctx->X[6]  = ctx->X[6]  ^ w[6];
+        ctx->X[7]  = ctx->X[7]  ^ w[7];
+        ctx->X[8]  = ctx->X[8]  ^ w[8];
+        ctx->X[9]  = ctx->X[9]  ^ w[9];
         ctx->X[10] = ctx->X[10] ^ w[10];
         ctx->X[11] = ctx->X[11] ^ w[11];
         ctx->X[12] = ctx->X[12] ^ w[12];
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 179bde121380..791bacdd3d57 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -39,16 +39,15 @@
 
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
     { /* do it in C */
-    enum
-        {
+    enum {
         WCNT = SKEIN_256_STATE_WORDS
         };
 #undef  RCNT
 #define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
 
-#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
 #else
 #define SKEIN_UNROLL_256 (0)
@@ -63,8 +62,8 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t b
 #else
     u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64  X0,X1,X2,X3;                        /* local copy of context vars, for speed */
-    u64  w [WCNT];                           /* local copy of input block */
+    u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
+    u64  w[WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
     const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
     Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
@@ -85,95 +84,95 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t b
 
         ts[2] = ts[0] ^ ts[1];
 
-        Skein_Get64_LSB_First(w,blkPtr,WCNT);   /* get input block in little-endian format */
+        Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
         DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
         X0 = w[0] + ks[0];                      /* do the first full key injection */
         X1 = w[1] + ks[1] + ts[0];
         X2 = w[2] + ks[2] + ts[1];
         X3 = w[3] + ks[3];
 
-        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);    /* show starting state values */
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
 
         blkPtr += SKEIN_256_BLOCK_BYTES;
 
         /* run the rounds */
 
-#define Round256(p0,p1,p2,p3,ROT,rNum)                              \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2; \
+#define Round256(p0, p1, p2, p3, ROT, rNum)                              \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 
 #if SKEIN_UNROLL_256 == 0                       
-#define R256(p0,p1,p2,p3,ROT,rNum)           /* fully unrolled */   \
-    Round256(p0,p1,p2,p3,ROT,rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rNum,Xptr);
+#define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
+    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
 #define I256(R)                                                     \
     X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
     X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
     X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
     X3   += ks[((R)+4) % 5] +     (R)+1;                            \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
-#define R256(p0,p1,p2,p3,ROT,rNum)                                  \
-    Round256(p0,p1,p2,p3,ROT,rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rNum,Xptr);
+#define R256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
 #define I256(R)                                                     \
     X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
     X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
     X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
-    X3   += ks[r+(R)+3] +    r+(R)   ;                              \
-    ks[r + (R)+4    ]   = ks[r+(R)-1];     /* rotate key schedule */\
-    ts[r + (R)+2    ]   = ts[r+(R)-1];                              \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+    X3   += ks[r+(R)+3] +    r+(R);                              \
+    ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
+    ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 
-    for (r=1;r < 2*RCNT;r+=2*SKEIN_UNROLL_256)  /* loop thru it */
+    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
 #endif  
         {    
 #define R256_8_rounds(R)                  \
-        R256(0,1,2,3,R_256_0,8*(R) + 1);  \
-        R256(0,3,2,1,R_256_1,8*(R) + 2);  \
-        R256(0,1,2,3,R_256_2,8*(R) + 3);  \
-        R256(0,3,2,1,R_256_3,8*(R) + 4);  \
-        I256(2*(R));                      \
-        R256(0,1,2,3,R_256_4,8*(R) + 5);  \
-        R256(0,3,2,1,R_256_5,8*(R) + 6);  \
-        R256(0,1,2,3,R_256_6,8*(R) + 7);  \
-        R256(0,3,2,1,R_256_7,8*(R) + 8);  \
-        I256(2*(R)+1);
-
-        R256_8_rounds( 0);
+        R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
+        R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
+        R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
+        R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
+        I256(2 * (R));                      \
+        R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
+        R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
+        R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
+        R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
+        I256(2 * (R) + 1);
+
+        R256_8_rounds(0);
 
 #define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
 
-  #if   R256_Unroll_R( 1)
-        R256_8_rounds( 1);
+  #if   R256_Unroll_R(1)
+        R256_8_rounds(1);
   #endif
-  #if   R256_Unroll_R( 2)
-        R256_8_rounds( 2);
+  #if   R256_Unroll_R(2)
+        R256_8_rounds(2);
   #endif
-  #if   R256_Unroll_R( 3)
-        R256_8_rounds( 3);
+  #if   R256_Unroll_R(3)
+        R256_8_rounds(3);
   #endif
-  #if   R256_Unroll_R( 4)
-        R256_8_rounds( 4);
+  #if   R256_Unroll_R(4)
+        R256_8_rounds(4);
   #endif
-  #if   R256_Unroll_R( 5)
-        R256_8_rounds( 5);
+  #if   R256_Unroll_R(5)
+        R256_8_rounds(5);
   #endif
-  #if   R256_Unroll_R( 6)
-        R256_8_rounds( 6);
+  #if   R256_Unroll_R(6)
+        R256_8_rounds(6);
   #endif
-  #if   R256_Unroll_R( 7)
-        R256_8_rounds( 7);
+  #if   R256_Unroll_R(7)
+        R256_8_rounds(7);
   #endif
-  #if   R256_Unroll_R( 8)
-        R256_8_rounds( 8);
+  #if   R256_Unroll_R(8)
+        R256_8_rounds(8);
   #endif
-  #if   R256_Unroll_R( 9)
-        R256_8_rounds( 9);
+  #if   R256_Unroll_R(9)
+        R256_8_rounds(9);
   #endif
   #if   R256_Unroll_R(10)
         R256_8_rounds(10);
@@ -200,7 +199,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t b
         ctx->X[2] = X2 ^ w[2];
         ctx->X[3] = X3 ^ w[3];
 
-        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
 
         ts[1] &= ~SKEIN_T1_FLAG_FIRST;
         }
@@ -224,16 +223,15 @@ unsigned int Skein_256_Unroll_Cnt(void)
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
     { /* do it in C */
-    enum
-        {
+    enum {
         WCNT = SKEIN_512_STATE_WORDS
         };
 #undef  RCNT
 #define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
 
-#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
 #else
 #define SKEIN_UNROLL_512 (0)
@@ -248,8 +246,8 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t b
 #else
     u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64  X0,X1,X2,X3,X4,X5,X6,X7;            /* local copy of vars, for speed */
-    u64  w [WCNT];                           /* local copy of input block */
+    u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
+    u64  w[WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
     const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
     Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
@@ -277,9 +275,9 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t b
 
         ts[2] = ts[0] ^ ts[1];
 
-        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */
+        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
         DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
         X0   = w[0] + ks[0];                    /* do the first full key injection */
         X1   = w[1] + ks[1];
@@ -292,92 +290,92 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t b
 
         blkPtr += SKEIN_512_BLOCK_BYTES;
 
-        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
         /* run the rounds */
-#define Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                  \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2; \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4; \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6; \
+#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
 
 #if SKEIN_UNROLL_512 == 0                       
-#define R512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)      /* unrolled */  \
-    Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rNum,Xptr);
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
+    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
 #define I512(R)                                                     \
-    X0   += ks[((R)+1) % 9];   /* inject the key schedule value */  \
-    X1   += ks[((R)+2) % 9];                                        \
-    X2   += ks[((R)+3) % 9];                                        \
-    X3   += ks[((R)+4) % 9];                                        \
-    X4   += ks[((R)+5) % 9];                                        \
-    X5   += ks[((R)+6) % 9] + ts[((R)+1) % 3];                      \
-    X6   += ks[((R)+7) % 9] + ts[((R)+2) % 3];                      \
-    X7   += ks[((R)+8) % 9] +     (R)+1;                            \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+    X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
+    X1   += ks[((R) + 2) % 9];                                        \
+    X2   += ks[((R) + 3) % 9];                                        \
+    X3   += ks[((R) + 4) % 9];                                        \
+    X4   += ks[((R) + 5) % 9];                                        \
+    X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
+    X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
+    X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
-#define R512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
-    Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rNum,Xptr);
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
 #define I512(R)                                                     \
-    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
-    X1   += ks[r+(R)+1];                                            \
-    X2   += ks[r+(R)+2];                                            \
-    X3   += ks[r+(R)+3];                                            \
-    X4   += ks[r+(R)+4];                                            \
-    X5   += ks[r+(R)+5] + ts[r+(R)+0];                              \
-    X6   += ks[r+(R)+6] + ts[r+(R)+1];                              \
-    X7   += ks[r+(R)+7] +    r+(R)   ;                              \
-    ks[r +       (R)+8] = ks[r+(R)-1];  /* rotate key schedule */   \
-    ts[r +       (R)+2] = ts[r+(R)-1];                              \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
-
-    for (r=1;r < 2*RCNT;r+=2*SKEIN_UNROLL_512)   /* loop thru it */
+    X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
+    X1   += ks[r + (R) + 1];                                            \
+    X2   += ks[r + (R) + 2];                                            \
+    X3   += ks[r + (R) + 3];                                            \
+    X4   += ks[r + (R) + 4];                                            \
+    X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
+    X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
+    X7   += ks[r + (R) + 7] +         r + (R);                              \
+    ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
+    ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
 #endif                         /* end of looped code definitions */
         {
 #define R512_8_rounds(R)  /* do 8 full rounds */  \
-        R512(0,1,2,3,4,5,6,7,R_512_0,8*(R)+ 1);   \
-        R512(2,1,4,7,6,5,0,3,R_512_1,8*(R)+ 2);   \
-        R512(4,1,6,3,0,5,2,7,R_512_2,8*(R)+ 3);   \
-        R512(6,1,0,7,2,5,4,3,R_512_3,8*(R)+ 4);   \
-        I512(2*(R));                              \
-        R512(0,1,2,3,4,5,6,7,R_512_4,8*(R)+ 5);   \
-        R512(2,1,4,7,6,5,0,3,R_512_5,8*(R)+ 6);   \
-        R512(4,1,6,3,0,5,2,7,R_512_6,8*(R)+ 7);   \
-        R512(6,1,0,7,2,5,4,3,R_512_7,8*(R)+ 8);   \
-        I512(2*(R)+1);        /* and key injection */
-
-        R512_8_rounds( 0);
+        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
+        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
+        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
+        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
+        I512(2 * (R));                              \
+        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
+        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
+        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
+        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
+        I512(2 * (R) + 1);        /* and key injection */
+
+        R512_8_rounds(0);
 
 #define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
 
-  #if   R512_Unroll_R( 1)
-        R512_8_rounds( 1);
+  #if   R512_Unroll_R(1)
+        R512_8_rounds(1);
   #endif
-  #if   R512_Unroll_R( 2)
-        R512_8_rounds( 2);
+  #if   R512_Unroll_R(2)
+        R512_8_rounds(2);
   #endif
-  #if   R512_Unroll_R( 3)
-        R512_8_rounds( 3);
+  #if   R512_Unroll_R(3)
+        R512_8_rounds(3);
   #endif
-  #if   R512_Unroll_R( 4)
-        R512_8_rounds( 4);
+  #if   R512_Unroll_R(4)
+        R512_8_rounds(4);
   #endif
-  #if   R512_Unroll_R( 5)
-        R512_8_rounds( 5);
+  #if   R512_Unroll_R(5)
+        R512_8_rounds(5);
   #endif
-  #if   R512_Unroll_R( 6)
-        R512_8_rounds( 6);
+  #if   R512_Unroll_R(6)
+        R512_8_rounds(6);
   #endif
-  #if   R512_Unroll_R( 7)
-        R512_8_rounds( 7);
+  #if   R512_Unroll_R(7)
+        R512_8_rounds(7);
   #endif
-  #if   R512_Unroll_R( 8)
-        R512_8_rounds( 8);
+  #if   R512_Unroll_R(8)
+        R512_8_rounds(8);
   #endif
-  #if   R512_Unroll_R( 9)
-        R512_8_rounds( 9);
+  #if   R512_Unroll_R(9)
+        R512_8_rounds(9);
   #endif
   #if   R512_Unroll_R(10)
         R512_8_rounds(10);
@@ -408,7 +406,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t b
         ctx->X[5] = X5 ^ w[5];
         ctx->X[6] = X6 ^ w[6];
         ctx->X[7] = X7 ^ w[7];
-        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
 
         ts[1] &= ~SKEIN_T1_FLAG_FIRST;
         }
@@ -432,16 +430,15 @@ unsigned int Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
     { /* do it in C, always looping (unrolled is bigger AND slower!) */
-    enum
-        {
+    enum {
         WCNT = SKEIN1024_STATE_WORDS
         };
 #undef  RCNT
 #define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
 
-#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
 #else
 #define SKEIN_UNROLL_1024 (0)
@@ -457,14 +454,14 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
     u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
 
-    u64  X00,X01,X02,X03,X04,X05,X06,X07,    /* local copy of vars, for speed */
-            X08,X09,X10,X11,X12,X13,X14,X15;
-    u64  w [WCNT];                           /* local copy of input block */
+    u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
+            X08, X09, X10, X11, X12, X13, X14, X15;
+    u64  w[WCNT];                            /* local copy of input block */
 #ifdef SKEIN_DEBUG
     const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
-    Xptr[ 0] = &X00;  Xptr[ 1] = &X01;  Xptr[ 2] = &X02;  Xptr[ 3] = &X03;
-    Xptr[ 4] = &X04;  Xptr[ 5] = &X05;  Xptr[ 6] = &X06;  Xptr[ 7] = &X07;
-    Xptr[ 8] = &X08;  Xptr[ 9] = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
+    Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
+    Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
+    Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
     Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
 #endif
 
@@ -476,43 +473,43 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
         ts[0] += byteCntAdd;                    /* update processed length */
 
         /* precompute the key schedule for this block */
-        ks[ 0] = ctx->X[ 0];
-        ks[ 1] = ctx->X[ 1];
-        ks[ 2] = ctx->X[ 2];
-        ks[ 3] = ctx->X[ 3];
-        ks[ 4] = ctx->X[ 4];
-        ks[ 5] = ctx->X[ 5];
-        ks[ 6] = ctx->X[ 6];
-        ks[ 7] = ctx->X[ 7];
-        ks[ 8] = ctx->X[ 8];
-        ks[ 9] = ctx->X[ 9];
+        ks[0]  = ctx->X[0];
+        ks[1]  = ctx->X[1];
+        ks[2]  = ctx->X[2];
+        ks[3]  = ctx->X[3];
+        ks[4]  = ctx->X[4];
+        ks[5]  = ctx->X[5];
+        ks[6]  = ctx->X[6];
+        ks[7]  = ctx->X[7];
+        ks[8]  = ctx->X[8];
+        ks[9]  = ctx->X[9];
         ks[10] = ctx->X[10];
         ks[11] = ctx->X[11];
         ks[12] = ctx->X[12];
         ks[13] = ctx->X[13];
         ks[14] = ctx->X[14];
         ks[15] = ctx->X[15];
-        ks[16] = ks[ 0] ^ ks[ 1] ^ ks[ 2] ^ ks[ 3] ^
-                 ks[ 4] ^ ks[ 5] ^ ks[ 6] ^ ks[ 7] ^
-                 ks[ 8] ^ ks[ 9] ^ ks[10] ^ ks[11] ^
+        ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
+                  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
+                  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
                  ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
 
         ts[2]  = ts[0] ^ ts[1];
 
-        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */
+        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
         DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
-
-        X00    = w[ 0] + ks[ 0];                 /* do the first full key injection */
-        X01    = w[ 1] + ks[ 1];
-        X02    = w[ 2] + ks[ 2];
-        X03    = w[ 3] + ks[ 3];
-        X04    = w[ 4] + ks[ 4];
-        X05    = w[ 5] + ks[ 5];
-        X06    = w[ 6] + ks[ 6];
-        X07    = w[ 7] + ks[ 7];
-        X08    = w[ 8] + ks[ 8];
-        X09    = w[ 9] + ks[ 9];
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+        X00    =  w[0] +  ks[0];                 /* do the first full key injection */
+        X01    =  w[1] +  ks[1];
+        X02    =  w[2] +  ks[2];
+        X03    =  w[3] +  ks[3];
+        X04    =  w[4] +  ks[4];
+        X05    =  w[5] +  ks[5];
+        X06    =  w[6] +  ks[6];
+        X07    =  w[7] +  ks[7];
+        X08    =  w[8] +  ks[8];
+        X09    =  w[9] +  ks[9];
         X10    = w[10] + ks[10];
         X11    = w[11] + ks[11];
         X12    = w[12] + ks[12];
@@ -520,112 +517,112 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
         X14    = w[14] + ks[14] + ts[1];
         X15    = w[15] + ks[15];
 
-        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
 
-#define Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rNum) \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0;   \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2;   \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4;   \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6;   \
-    X##p8 += X##p9; X##p9 = RotL_64(X##p9,ROT##_4); X##p9 ^= X##p8;   \
-    X##pA += X##pB; X##pB = RotL_64(X##pB,ROT##_5); X##pB ^= X##pA;   \
-    X##pC += X##pD; X##pD = RotL_64(X##pD,ROT##_6); X##pD ^= X##pC;   \
-    X##pE += X##pF; X##pF = RotL_64(X##pF,ROT##_7); X##pF ^= X##pE;   \
+#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
+    X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
+    X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
+    X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
+    X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
 
 #if SKEIN_UNROLL_1024 == 0                      
-#define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
-    Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rn,Xptr);
-
-#define I1024(R)                                                      \
-    X00   += ks[((R)+ 1) % 17]; /* inject the key schedule value */   \
-    X01   += ks[((R)+ 2) % 17];                                       \
-    X02   += ks[((R)+ 3) % 17];                                       \
-    X03   += ks[((R)+ 4) % 17];                                       \
-    X04   += ks[((R)+ 5) % 17];                                       \
-    X05   += ks[((R)+ 6) % 17];                                       \
-    X06   += ks[((R)+ 7) % 17];                                       \
-    X07   += ks[((R)+ 8) % 17];                                       \
-    X08   += ks[((R)+ 9) % 17];                                       \
-    X09   += ks[((R)+10) % 17];                                       \
-    X10   += ks[((R)+11) % 17];                                       \
-    X11   += ks[((R)+12) % 17];                                       \
-    X12   += ks[((R)+13) % 17];                                       \
-    X13   += ks[((R)+14) % 17] + ts[((R)+1) % 3];                     \
-    X14   += ks[((R)+15) % 17] + ts[((R)+2) % 3];                     \
-    X15   += ks[((R)+16) % 17] +     (R)+1;                           \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr); 
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+
+#define I1024(R)                                                        \
+    X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
+    X01   += ks[((R) +  2) % 17];                                       \
+    X02   += ks[((R) +  3) % 17];                                       \
+    X03   += ks[((R) +  4) % 17];                                       \
+    X04   += ks[((R) +  5) % 17];                                       \
+    X05   += ks[((R) +  6) % 17];                                       \
+    X06   += ks[((R) +  7) % 17];                                       \
+    X07   += ks[((R) +  8) % 17];                                       \
+    X08   += ks[((R) +  9) % 17];                                       \
+    X09   += ks[((R) + 10) % 17];                                       \
+    X10   += ks[((R) + 11) % 17];                                       \
+    X11   += ks[((R) + 12) % 17];                                       \
+    X12   += ks[((R) + 13) % 17];                                       \
+    X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
+    X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
+    X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
 #else                                       /* looping version */
-#define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
-    Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rn,Xptr);
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
 
 #define I1024(R)                                                      \
-    X00   += ks[r+(R)+ 0];    /* inject the key schedule value */     \
-    X01   += ks[r+(R)+ 1];                                            \
-    X02   += ks[r+(R)+ 2];                                            \
-    X03   += ks[r+(R)+ 3];                                            \
-    X04   += ks[r+(R)+ 4];                                            \
-    X05   += ks[r+(R)+ 5];                                            \
-    X06   += ks[r+(R)+ 6];                                            \
-    X07   += ks[r+(R)+ 7];                                            \
-    X08   += ks[r+(R)+ 8];                                            \
-    X09   += ks[r+(R)+ 9];                                            \
-    X10   += ks[r+(R)+10];                                            \
-    X11   += ks[r+(R)+11];                                            \
-    X12   += ks[r+(R)+12];                                            \
-    X13   += ks[r+(R)+13] + ts[r+(R)+0];                              \
-    X14   += ks[r+(R)+14] + ts[r+(R)+1];                              \
-    X15   += ks[r+(R)+15] +    r+(R)   ;                              \
-    ks[r  +       (R)+16] = ks[r+(R)-1];  /* rotate key schedule */   \
-    ts[r  +       (R)+ 2] = ts[r+(R)-1];                              \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
-
-    for (r=1;r <= 2*RCNT;r+=2*SKEIN_UNROLL_1024)    /* loop thru it */
+    X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
+    X01   += ks[r + (R) +  1];                                            \
+    X02   += ks[r + (R) +  2];                                            \
+    X03   += ks[r + (R) +  3];                                            \
+    X04   += ks[r + (R) +  4];                                            \
+    X05   += ks[r + (R) +  5];                                            \
+    X06   += ks[r + (R) +  6];                                            \
+    X07   += ks[r + (R) +  7];                                            \
+    X08   += ks[r + (R) +  8];                                            \
+    X09   += ks[r + (R) +  9];                                            \
+    X10   += ks[r + (R) + 10];                                            \
+    X11   += ks[r + (R) + 11];                                            \
+    X12   += ks[r + (R) + 12];                                            \
+    X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
+    X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
+    X15   += ks[r + (R) + 15] +         r + (R);                          \
+    ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
+    ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
+    Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
 #endif  
         {
 #define R1024_8_rounds(R)    /* do 8 full rounds */                               \
-        R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_0,8*(R) + 1); \
-        R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_1,8*(R) + 2); \
-        R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_2,8*(R) + 3); \
-        R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_3,8*(R) + 4); \
+        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
+        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
+        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
+        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
         I1024(2*(R));                                                             \
-        R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_4,8*(R) + 5); \
-        R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_5,8*(R) + 6); \
-        R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_6,8*(R) + 7); \
-        R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_7,8*(R) + 8); \
+        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
+        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
+        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
+        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
         I1024(2*(R)+1);
 
-        R1024_8_rounds( 0);
+        R1024_8_rounds(0);
 
 #define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
 
-  #if   R1024_Unroll_R( 1)
-        R1024_8_rounds( 1);
+  #if   R1024_Unroll_R(1)
+        R1024_8_rounds(1);
   #endif
-  #if   R1024_Unroll_R( 2)
-        R1024_8_rounds( 2);
+  #if   R1024_Unroll_R(2)
+        R1024_8_rounds(2);
   #endif
-  #if   R1024_Unroll_R( 3)
-        R1024_8_rounds( 3);
+  #if   R1024_Unroll_R(3)
+        R1024_8_rounds(3);
   #endif
-  #if   R1024_Unroll_R( 4)
-        R1024_8_rounds( 4);
+  #if   R1024_Unroll_R(4)
+        R1024_8_rounds(4);
   #endif
-  #if   R1024_Unroll_R( 5)
-        R1024_8_rounds( 5);
+  #if   R1024_Unroll_R(5)
+        R1024_8_rounds(5);
   #endif
-  #if   R1024_Unroll_R( 6)
-        R1024_8_rounds( 6);
+  #if   R1024_Unroll_R(6)
+        R1024_8_rounds(6);
   #endif
-  #if   R1024_Unroll_R( 7)
-        R1024_8_rounds( 7);
+  #if   R1024_Unroll_R(7)
+        R1024_8_rounds(7);
   #endif
-  #if   R1024_Unroll_R( 8)
-        R1024_8_rounds( 8);
+  #if   R1024_Unroll_R(8)
+        R1024_8_rounds(8);
   #endif
-  #if   R1024_Unroll_R( 9)
-        R1024_8_rounds( 9);
+  #if   R1024_Unroll_R(9)
+        R1024_8_rounds(9);
   #endif
   #if   R1024_Unroll_R(10)
         R1024_8_rounds(10);
@@ -648,16 +645,16 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
         }
         /* do the final "feedforward" xor, update context chaining vars */
 
-        ctx->X[ 0] = X00 ^ w[ 0];
-        ctx->X[ 1] = X01 ^ w[ 1];
-        ctx->X[ 2] = X02 ^ w[ 2];
-        ctx->X[ 3] = X03 ^ w[ 3];
-        ctx->X[ 4] = X04 ^ w[ 4];
-        ctx->X[ 5] = X05 ^ w[ 5];
-        ctx->X[ 6] = X06 ^ w[ 6];
-        ctx->X[ 7] = X07 ^ w[ 7];
-        ctx->X[ 8] = X08 ^ w[ 8];
-        ctx->X[ 9] = X09 ^ w[ 9];
+        ctx->X[0] = X00 ^ w[0];
+        ctx->X[1] = X01 ^ w[1];
+        ctx->X[2] = X02 ^ w[2];
+        ctx->X[3] = X03 ^ w[3];
+        ctx->X[4] = X04 ^ w[4];
+        ctx->X[5] = X05 ^ w[5];
+        ctx->X[6] = X06 ^ w[6];
+        ctx->X[7] = X07 ^ w[7];
+        ctx->X[8] = X08 ^ w[8];
+        ctx->X[9] = X09 ^ w[9];
         ctx->X[10] = X10 ^ w[10];
         ctx->X[11] = X11 ^ w[11];
         ctx->X[12] = X12 ^ w[12];
@@ -665,7 +662,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
         ctx->X[14] = X14 ^ w[14];
         ctx->X[15] = X15 ^ w[15];
 
-        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
         
         ts[1] &= ~SKEIN_T1_FLAG_FIRST;
         blkPtr += SKEIN1024_BLOCK_BYTES;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 12/22] staging: crypto: skein: dos2unix, remove executable perms
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (10 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 11/22] staging: crypto: skein: cleanup whitespace around operators/punc Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 13/22] staging: crypto: skein: fix leading whitespace Jason Cooper
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

$ find drivers/staging/skein -type f | xargs todos -d
$ chmod -x drivers/staging/skein/skeinApi.c
$ chmod -x drivers/staging/skein/include/skeinApi.h

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h    |  630 ++++++-------
 drivers/staging/skein/include/skeinApi.h |    0
 drivers/staging/skein/include/skein_iv.h |  398 ++++-----
 drivers/staging/skein/skein.c            | 1442 +++++++++++++++---------------
 drivers/staging/skein/skeinApi.c         |    0
 drivers/staging/skein/skeinBlockNo3F.c   |  344 +++----
 drivers/staging/skein/skein_block.c      | 1372 ++++++++++++++--------------
 7 files changed, 2093 insertions(+), 2093 deletions(-)
 mode change 100755 => 100644 drivers/staging/skein/include/skeinApi.h
 mode change 100755 => 100644 drivers/staging/skein/skeinApi.c

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index fef29ad64c93..18bb15824e41 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -1,315 +1,315 @@
-#ifndef _SKEIN_H_
-#define _SKEIN_H_     1
-/**************************************************************************
-**
-** Interface declarations and internal definitions for Skein hashing.
-**
-** Source code author: Doug Whiting, 2008.
-**
-** This algorithm and source code is released to the public domain.
-**
-***************************************************************************
-** 
-** The following compile-time switches may be defined to control some
-** tradeoffs between speed, code size, error checking, and security.
-**
-** The "default" note explains what happens when the switch is not defined.
-**
-**  SKEIN_DEBUG            -- make callouts from inside Skein code
-**                            to examine/display intermediate values.
-**                            [default: no callouts (no overhead)]
-**
-**  SKEIN_ERR_CHECK        -- how error checking is handled inside Skein
-**                            code. If not defined, most error checking 
-**                            is disabled (for performance). Otherwise, 
-**                            the switch value is interpreted as:
-**                                0: use assert()      to flag errors
-**                                1: return SKEIN_FAIL to flag errors
-**
-***************************************************************************/
-
-#ifndef RotL_64
-#define RotL_64(x, N)    (((x) << (N)) | ((x) >> (64-(N))))
-#endif
-
-/* below two prototype assume we are handed aligned data */
-#define Skein_Put64_LSB_First(dst08, src64, bCnt) memcpy(dst08, src64, bCnt)
-#define Skein_Get64_LSB_First(dst64, src08, wCnt) memcpy(dst64, src08, 8*(wCnt))
-#define Skein_Swap64(w64)  (w64)
-
-enum
-    {
-    SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
-    SKEIN_FAIL            =      1,
-    SKEIN_BAD_HASHLEN     =      2
-    };
-
-#define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
-
-#define  SKEIN_256_STATE_WORDS  (4)
-#define  SKEIN_512_STATE_WORDS  (8)
-#define  SKEIN1024_STATE_WORDS (16)
-#define  SKEIN_MAX_STATE_WORDS (16)
-
-#define  SKEIN_256_STATE_BYTES  (8*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_STATE_BYTES  (8*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_STATE_BYTES  (8*SKEIN1024_STATE_WORDS)
-
-#define  SKEIN_256_STATE_BITS  (64*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_STATE_BITS  (64*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_STATE_BITS  (64*SKEIN1024_STATE_WORDS)
-
-#define  SKEIN_256_BLOCK_BYTES  (8*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_BLOCK_BYTES  (8*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
-
-struct skein_ctx_hdr
-    {
-    size_t  hashBitLen;                      /* size of hash result, in bits */
-    size_t  bCnt;                            /* current byte count in buffer b[] */
-    u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
-    };
-
-struct skein_256_ctx                               /*  256-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
-
-struct skein_512_ctx                             /*  512-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
-
-struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
-
-/*   Skein APIs for (incremental) "straight hashing" */
-int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
-int  Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen);
-int  Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen);
-
-int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-
-int  Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
-
-/*
-**   Skein APIs for "extended" initialization: MAC keys, tree hashing.
-**   After an InitExt() call, just use Update/Final calls as with Init().
-**
-**   Notes: Same parameters as _Init() calls, plus treeInfo/key/keyBytes.
-**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL, 
-**              the results of InitExt() are identical to calling Init().
-**          The function Init() may be called once to "precompute" the IV for
-**              a given hashBitLen value, then by saving a copy of the context
-**              the IV computation may be avoided in later calls.
-**          Similarly, the function InitExt() may be called once per MAC key 
-**              to precompute the MAC IV, then a copy of the context saved and
-**              reused for each new MAC computation.
-**/
-int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-
-/*
-**   Skein APIs for MAC and tree hash:
-**      Final_Pad:  pad, do final block, but no OUTPUT type
-**      Output:     do just the output stage
-*/
-int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal);
-
-#ifndef SKEIN_TREE_HASH
-#define SKEIN_TREE_HASH (1)
-#endif
-#if  SKEIN_TREE_HASH
-int  Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
-#endif
-
-/*****************************************************************
-** "Internal" Skein definitions
-**    -- not needed for sequential hashing API, but will be 
-**           helpful for other uses of Skein (e.g., tree hash mode).
-**    -- included here so that they can be shared between
-**           reference and optimized code.
-******************************************************************/
-
-/* tweak word T[1]: bit field starting positions */
-#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
-                                
-#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
-#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
-#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
-#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
-#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
-                                
-/* tweak word T[1]: flag bit definition(s) */
-#define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
-#define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
-#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
-                                
-/* tweak word T[1]: tree level bit field mask */
-#define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
-#define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
-
-/* tweak word T[1]: block type field */
-#define SKEIN_BLK_TYPE_KEY       (0)                    /* key, for MAC and KDF */
-#define SKEIN_BLK_TYPE_CFG       (4)                    /* configuration block */
-#define SKEIN_BLK_TYPE_PERS      (8)                    /* personalization string */
-#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
-#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
-#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
-#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
-#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
-#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
-
-#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
-#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
-#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
-#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
-#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
-#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
-#define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
-#define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
-#define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
-#define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */
-
-#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
-#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)
-
-#define SKEIN_VERSION           (1)
-
-#ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
-#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
-#endif
-
-#define SKEIN_MK_64(hi32, lo32)  ((lo32) + (((u64) (hi32)) << 32))
-#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION, SKEIN_ID_STRING_LE)
-#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA, 0xA9FC1A22)
-
-#define SKEIN_CFG_STR_LEN       (4*8)
-
-/* bit field definitions in config block treeInfo word */
-#define SKEIN_CFG_TREE_LEAF_SIZE_POS  (0)
-#define SKEIN_CFG_TREE_NODE_SIZE_POS  (8)
-#define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
-
-#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
-#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
-#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
-
-#define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
-    ((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
-     (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
-     (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
-
-#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
-
-/*
-**   Skein macros for getting/setting tweak words, etc.
-**   These are useful for partial input bytes, hash tree init/update, etc.
-**/
-#define Skein_Get_Tweak(ctxPtr, TWK_NUM)          ((ctxPtr)->h.T[TWK_NUM])
-#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal); }
-
-#define Skein_Get_T0(ctxPtr)     Skein_Get_Tweak(ctxPtr, 0)
-#define Skein_Get_T1(ctxPtr)     Skein_Get_Tweak(ctxPtr, 1)
-#define Skein_Set_T0(ctxPtr, T0) Skein_Set_Tweak(ctxPtr, 0, T0)
-#define Skein_Set_T1(ctxPtr, T1) Skein_Set_Tweak(ctxPtr, 1, T1)
-
-/* set both tweak words at once */
-#define Skein_Set_T0_T1(ctxPtr, T0, T1)           \
-    {                                           \
-    Skein_Set_T0(ctxPtr, (T0));                  \
-    Skein_Set_T1(ctxPtr, (T1));                  \
-    }
-
-#define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
-    Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
-
-/* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
-#define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
-    { Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
-
-#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
-#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
-
-#define Skein_Set_Tree_Level(hdr, height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); }
-
-/*****************************************************************
-** "Internal" Skein definitions for debugging and error checking
-******************************************************************/
-#ifdef SKEIN_DEBUG             /* examine/display intermediate values? */
-#include "skein_debug.h"
-#else                           /* default is no callouts */
-#define Skein_Show_Block(bits, ctx, X, blkPtr, wPtr, ksEvenPtr, ksOddPtr)
-#define Skein_Show_Round(bits, ctx, r, X)
-#define Skein_Show_R_Ptr(bits, ctx, r, X_ptr)
-#define Skein_Show_Final(bits, ctx, cnt, outPtr)
-#define Skein_Show_Key(bits, ctx, key, keyBytes)
-#endif
-
-#define Skein_Assert(x, retCode)/* default: ignore all Asserts, for performance */
-#define Skein_assert(x)
-
-/*****************************************************************
-** Skein block function constants (shared across Ref and Opt code)
-******************************************************************/
-enum    
-    {   
-        /* Skein_256 round rotation constants */
-    R_256_0_0 = 14, R_256_0_1 = 16,
-    R_256_1_0 = 52, R_256_1_1 = 57,
-    R_256_2_0 = 23, R_256_2_1 = 40,
-    R_256_3_0 =  5, R_256_3_1 = 37,
-    R_256_4_0 = 25, R_256_4_1 = 33,
-    R_256_5_0 = 46, R_256_5_1 = 12,
-    R_256_6_0 = 58, R_256_6_1 = 22,
-    R_256_7_0 = 32, R_256_7_1 = 32,
-
-        /* Skein_512 round rotation constants */
-    R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
-    R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
-    R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
-    R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
-    R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
-    R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
-    R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
-    R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
-
-        /* Skein1024 round rotation constants */
-    R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
-    R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
-    R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
-    R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
-    R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
-    R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
-    R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
-    R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
-    };
-
-#ifndef SKEIN_ROUNDS
-#define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
-#define SKEIN_512_ROUNDS_TOTAL (72)
-#define SKEIN1024_ROUNDS_TOTAL (80)
-#else                                        /* allow command-line define in range 8*(5..14)   */
-#define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
-#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/10)  + 5) % 10) + 5))
-#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS)     + 5) % 10) + 5))
-#endif
-
-#endif  /* ifndef _SKEIN_H_ */
+#ifndef _SKEIN_H_
+#define _SKEIN_H_     1
+/**************************************************************************
+**
+** Interface declarations and internal definitions for Skein hashing.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+***************************************************************************
+** 
+** The following compile-time switches may be defined to control some
+** tradeoffs between speed, code size, error checking, and security.
+**
+** The "default" note explains what happens when the switch is not defined.
+**
+**  SKEIN_DEBUG            -- make callouts from inside Skein code
+**                            to examine/display intermediate values.
+**                            [default: no callouts (no overhead)]
+**
+**  SKEIN_ERR_CHECK        -- how error checking is handled inside Skein
+**                            code. If not defined, most error checking 
+**                            is disabled (for performance). Otherwise, 
+**                            the switch value is interpreted as:
+**                                0: use assert()      to flag errors
+**                                1: return SKEIN_FAIL to flag errors
+**
+***************************************************************************/
+
+#ifndef RotL_64
+#define RotL_64(x, N)    (((x) << (N)) | ((x) >> (64-(N))))
+#endif
+
+/* below two prototype assume we are handed aligned data */
+#define Skein_Put64_LSB_First(dst08, src64, bCnt) memcpy(dst08, src64, bCnt)
+#define Skein_Get64_LSB_First(dst64, src08, wCnt) memcpy(dst64, src08, 8*(wCnt))
+#define Skein_Swap64(w64)  (w64)
+
+enum
+    {
+    SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
+    SKEIN_FAIL            =      1,
+    SKEIN_BAD_HASHLEN     =      2
+    };
+
+#define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
+
+#define  SKEIN_256_STATE_WORDS  (4)
+#define  SKEIN_512_STATE_WORDS  (8)
+#define  SKEIN1024_STATE_WORDS (16)
+#define  SKEIN_MAX_STATE_WORDS (16)
+
+#define  SKEIN_256_STATE_BYTES  (8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BYTES  (8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BYTES  (8*SKEIN1024_STATE_WORDS)
+
+#define  SKEIN_256_STATE_BITS  (64*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BITS  (64*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BITS  (64*SKEIN1024_STATE_WORDS)
+
+#define  SKEIN_256_BLOCK_BYTES  (8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_BLOCK_BYTES  (8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
+
+struct skein_ctx_hdr
+    {
+    size_t  hashBitLen;                      /* size of hash result, in bits */
+    size_t  bCnt;                            /* current byte count in buffer b[] */
+    u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+    };
+
+struct skein_256_ctx                               /*  256-bit Skein hash context structure */
+    {
+    struct skein_ctx_hdr h;                      /* common header context variables */
+    u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    };
+
+struct skein_512_ctx                             /*  512-bit Skein hash context structure */
+    {
+    struct skein_ctx_hdr h;                      /* common header context variables */
+    u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    };
+
+struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
+    {
+    struct skein_ctx_hdr h;                      /* common header context variables */
+    u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    };
+
+/*   Skein APIs for (incremental) "straight hashing" */
+int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
+int  Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen);
+int  Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen);
+
+int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+
+int  Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
+
+/*
+**   Skein APIs for "extended" initialization: MAC keys, tree hashing.
+**   After an InitExt() call, just use Update/Final calls as with Init().
+**
+**   Notes: Same parameters as _Init() calls, plus treeInfo/key/keyBytes.
+**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL, 
+**              the results of InitExt() are identical to calling Init().
+**          The function Init() may be called once to "precompute" the IV for
+**              a given hashBitLen value, then by saving a copy of the context
+**              the IV computation may be avoided in later calls.
+**          Similarly, the function InitExt() may be called once per MAC key 
+**              to precompute the MAC IV, then a copy of the context saved and
+**              reused for each new MAC computation.
+**/
+int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+
+/*
+**   Skein APIs for MAC and tree hash:
+**      Final_Pad:  pad, do final block, but no OUTPUT type
+**      Output:     do just the output stage
+*/
+int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal);
+
+#ifndef SKEIN_TREE_HASH
+#define SKEIN_TREE_HASH (1)
+#endif
+#if  SKEIN_TREE_HASH
+int  Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
+#endif
+
+/*****************************************************************
+** "Internal" Skein definitions
+**    -- not needed for sequential hashing API, but will be 
+**           helpful for other uses of Skein (e.g., tree hash mode).
+**    -- included here so that they can be shared between
+**           reference and optimized code.
+******************************************************************/
+
+/* tweak word T[1]: bit field starting positions */
+#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
+                                
+#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
+#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
+#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
+#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
+#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
+                                
+/* tweak word T[1]: flag bit definition(s) */
+#define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
+#define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
+#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
+                                
+/* tweak word T[1]: tree level bit field mask */
+#define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
+#define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
+
+/* tweak word T[1]: block type field */
+#define SKEIN_BLK_TYPE_KEY       (0)                    /* key, for MAC and KDF */
+#define SKEIN_BLK_TYPE_CFG       (4)                    /* configuration block */
+#define SKEIN_BLK_TYPE_PERS      (8)                    /* personalization string */
+#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
+#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
+#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
+#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
+#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
+#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
+
+#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
+#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
+#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
+#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
+#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
+#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
+#define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
+#define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
+#define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
+#define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */
+
+#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
+#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)
+
+#define SKEIN_VERSION           (1)
+
+#ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
+#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
+#endif
+
+#define SKEIN_MK_64(hi32, lo32)  ((lo32) + (((u64) (hi32)) << 32))
+#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION, SKEIN_ID_STRING_LE)
+#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA, 0xA9FC1A22)
+
+#define SKEIN_CFG_STR_LEN       (4*8)
+
+/* bit field definitions in config block treeInfo word */
+#define SKEIN_CFG_TREE_LEAF_SIZE_POS  (0)
+#define SKEIN_CFG_TREE_NODE_SIZE_POS  (8)
+#define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
+
+#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
+#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
+#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
+
+#define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
+    ((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+     (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+     (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
+
+#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
+
+/*
+**   Skein macros for getting/setting tweak words, etc.
+**   These are useful for partial input bytes, hash tree init/update, etc.
+**/
+#define Skein_Get_Tweak(ctxPtr, TWK_NUM)          ((ctxPtr)->h.T[TWK_NUM])
+#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal); }
+
+#define Skein_Get_T0(ctxPtr)     Skein_Get_Tweak(ctxPtr, 0)
+#define Skein_Get_T1(ctxPtr)     Skein_Get_Tweak(ctxPtr, 1)
+#define Skein_Set_T0(ctxPtr, T0) Skein_Set_Tweak(ctxPtr, 0, T0)
+#define Skein_Set_T1(ctxPtr, T1) Skein_Set_Tweak(ctxPtr, 1, T1)
+
+/* set both tweak words at once */
+#define Skein_Set_T0_T1(ctxPtr, T0, T1)           \
+    {                                           \
+    Skein_Set_T0(ctxPtr, (T0));                  \
+    Skein_Set_T1(ctxPtr, (T1));                  \
+    }
+
+#define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
+    Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
+
+/* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
+#define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
+    { Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
+
+#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
+#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
+
+#define Skein_Set_Tree_Level(hdr, height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); }
+
+/*****************************************************************
+** "Internal" Skein definitions for debugging and error checking
+******************************************************************/
+#ifdef SKEIN_DEBUG             /* examine/display intermediate values? */
+#include "skein_debug.h"
+#else                           /* default is no callouts */
+#define Skein_Show_Block(bits, ctx, X, blkPtr, wPtr, ksEvenPtr, ksOddPtr)
+#define Skein_Show_Round(bits, ctx, r, X)
+#define Skein_Show_R_Ptr(bits, ctx, r, X_ptr)
+#define Skein_Show_Final(bits, ctx, cnt, outPtr)
+#define Skein_Show_Key(bits, ctx, key, keyBytes)
+#endif
+
+#define Skein_Assert(x, retCode)/* default: ignore all Asserts, for performance */
+#define Skein_assert(x)
+
+/*****************************************************************
+** Skein block function constants (shared across Ref and Opt code)
+******************************************************************/
+enum    
+    {   
+        /* Skein_256 round rotation constants */
+    R_256_0_0 = 14, R_256_0_1 = 16,
+    R_256_1_0 = 52, R_256_1_1 = 57,
+    R_256_2_0 = 23, R_256_2_1 = 40,
+    R_256_3_0 =  5, R_256_3_1 = 37,
+    R_256_4_0 = 25, R_256_4_1 = 33,
+    R_256_5_0 = 46, R_256_5_1 = 12,
+    R_256_6_0 = 58, R_256_6_1 = 22,
+    R_256_7_0 = 32, R_256_7_1 = 32,
+
+        /* Skein_512 round rotation constants */
+    R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
+    R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
+    R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
+    R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
+    R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
+    R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
+    R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
+    R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
+
+        /* Skein1024 round rotation constants */
+    R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
+    R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
+    R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
+    R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
+    R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
+    R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
+    R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
+    R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
+    };
+
+#ifndef SKEIN_ROUNDS
+#define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
+#define SKEIN_512_ROUNDS_TOTAL (72)
+#define SKEIN1024_ROUNDS_TOTAL (80)
+#else                                        /* allow command-line define in range 8*(5..14)   */
+#define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
+#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/10)  + 5) % 10) + 5))
+#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS)     + 5) % 10) + 5))
+#endif
+
+#endif  /* ifndef _SKEIN_H_ */
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
old mode 100755
new mode 100644
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index aff9394551a0..813bad528e3c 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -1,199 +1,199 @@
-#ifndef _SKEIN_IV_H_
-#define _SKEIN_IV_H_
-
-#include <skein.h>    /* get Skein macros and types */
-
-/*
-***************** Pre-computed Skein IVs *******************
-**
-** NOTE: these values are not "magic" constants, but
-** are generated using the Threefish block function.
-** They are pre-computed here only for speed; i.e., to
-** avoid the need for a Threefish call during Init().
-**
-** The IV for any fixed hash length may be pre-computed.
-** Only the most common values are included here.
-**
-************************************************************
-**/
-
-#define MK_64 SKEIN_MK_64
-
-/* blkSize =  256 bits. hashSize =  128 bits */
-const u64 SKEIN_256_IV_128[] =
-    {
-    MK_64(0xE1111906, 0x964D7260),
-    MK_64(0x883DAAA7, 0x7C8D811C),
-    MK_64(0x10080DF4, 0x91960F7A),
-    MK_64(0xCCF7DDE5, 0xB45BC1C2)
-    };
-
-/* blkSize =  256 bits. hashSize =  160 bits */
-const u64 SKEIN_256_IV_160[] =
-    {
-    MK_64(0x14202314, 0x72825E98),
-    MK_64(0x2AC4E9A2, 0x5A77E590),
-    MK_64(0xD47A5856, 0x8838D63E),
-    MK_64(0x2DD2E496, 0x8586AB7D)
-    };
-
-/* blkSize =  256 bits. hashSize =  224 bits */
-const u64 SKEIN_256_IV_224[] =
-    {
-    MK_64(0xC6098A8C, 0x9AE5EA0B),
-    MK_64(0x876D5686, 0x08C5191C),
-    MK_64(0x99CB88D7, 0xD7F53884),
-    MK_64(0x384BDDB1, 0xAEDDB5DE)
-    };
-
-/* blkSize =  256 bits. hashSize =  256 bits */
-const u64 SKEIN_256_IV_256[] =
-    {
-    MK_64(0xFC9DA860, 0xD048B449),
-    MK_64(0x2FCA6647, 0x9FA7D833),
-    MK_64(0xB33BC389, 0x6656840F),
-    MK_64(0x6A54E920, 0xFDE8DA69)
-    };
-
-/* blkSize =  512 bits. hashSize =  128 bits */
-const u64 SKEIN_512_IV_128[] =
-    {
-    MK_64(0xA8BC7BF3, 0x6FBF9F52),
-    MK_64(0x1E9872CE, 0xBD1AF0AA),
-    MK_64(0x309B1790, 0xB32190D3),
-    MK_64(0xBCFBB854, 0x3F94805C),
-    MK_64(0x0DA61BCD, 0x6E31B11B),
-    MK_64(0x1A18EBEA, 0xD46A32E3),
-    MK_64(0xA2CC5B18, 0xCE84AA82),
-    MK_64(0x6982AB28, 0x9D46982D)
-    };
-
-/* blkSize =  512 bits. hashSize =  160 bits */
-const u64 SKEIN_512_IV_160[] =
-    {
-    MK_64(0x28B81A2A, 0xE013BD91),
-    MK_64(0xC2F11668, 0xB5BDF78F),
-    MK_64(0x1760D8F3, 0xF6A56F12),
-    MK_64(0x4FB74758, 0x8239904F),
-    MK_64(0x21EDE07F, 0x7EAF5056),
-    MK_64(0xD908922E, 0x63ED70B8),
-    MK_64(0xB8EC76FF, 0xECCB52FA),
-    MK_64(0x01A47BB8, 0xA3F27A6E)
-    };
-
-/* blkSize =  512 bits. hashSize =  224 bits */
-const u64 SKEIN_512_IV_224[] =
-    {
-    MK_64(0xCCD06162, 0x48677224),
-    MK_64(0xCBA65CF3, 0xA92339EF),
-    MK_64(0x8CCD69D6, 0x52FF4B64),
-    MK_64(0x398AED7B, 0x3AB890B4),
-    MK_64(0x0F59D1B1, 0x457D2BD0),
-    MK_64(0x6776FE65, 0x75D4EB3D),
-    MK_64(0x99FBC70E, 0x997413E9),
-    MK_64(0x9E2CFCCF, 0xE1C41EF7)
-    };
-
-/* blkSize =  512 bits. hashSize =  256 bits */
-const u64 SKEIN_512_IV_256[] =
-    {
-    MK_64(0xCCD044A1, 0x2FDB3E13),
-    MK_64(0xE8359030, 0x1A79A9EB),
-    MK_64(0x55AEA061, 0x4F816E6F),
-    MK_64(0x2A2767A4, 0xAE9B94DB),
-    MK_64(0xEC06025E, 0x74DD7683),
-    MK_64(0xE7A436CD, 0xC4746251),
-    MK_64(0xC36FBAF9, 0x393AD185),
-    MK_64(0x3EEDBA18, 0x33EDFC13)
-    };
-
-/* blkSize =  512 bits. hashSize =  384 bits */
-const u64 SKEIN_512_IV_384[] =
-    {
-    MK_64(0xA3F6C6BF, 0x3A75EF5F),
-    MK_64(0xB0FEF9CC, 0xFD84FAA4),
-    MK_64(0x9D77DD66, 0x3D770CFE),
-    MK_64(0xD798CBF3, 0xB468FDDA),
-    MK_64(0x1BC4A666, 0x8A0E4465),
-    MK_64(0x7ED7D434, 0xE5807407),
-    MK_64(0x548FC1AC, 0xD4EC44D6),
-    MK_64(0x266E1754, 0x6AA18FF8)
-    };
-
-/* blkSize =  512 bits. hashSize =  512 bits */
-const u64 SKEIN_512_IV_512[] =
-    {
-    MK_64(0x4903ADFF, 0x749C51CE),
-    MK_64(0x0D95DE39, 0x9746DF03),
-    MK_64(0x8FD19341, 0x27C79BCE),
-    MK_64(0x9A255629, 0xFF352CB1),
-    MK_64(0x5DB62599, 0xDF6CA7B0),
-    MK_64(0xEABE394C, 0xA9D5C3F4),
-    MK_64(0x991112C7, 0x1A75B523),
-    MK_64(0xAE18A40B, 0x660FCC33)
-    };
-
-/* blkSize = 1024 bits. hashSize =  384 bits */
-const u64 SKEIN1024_IV_384[] =
-    {
-    MK_64(0x5102B6B8, 0xC1894A35),
-    MK_64(0xFEEBC9E3, 0xFE8AF11A),
-    MK_64(0x0C807F06, 0xE32BED71),
-    MK_64(0x60C13A52, 0xB41A91F6),
-    MK_64(0x9716D35D, 0xD4917C38),
-    MK_64(0xE780DF12, 0x6FD31D3A),
-    MK_64(0x797846B6, 0xC898303A),
-    MK_64(0xB172C2A8, 0xB3572A3B),
-    MK_64(0xC9BC8203, 0xA6104A6C),
-    MK_64(0x65909338, 0xD75624F4),
-    MK_64(0x94BCC568, 0x4B3F81A0),
-    MK_64(0x3EBBF51E, 0x10ECFD46),
-    MK_64(0x2DF50F0B, 0xEEB08542),
-    MK_64(0x3B5A6530, 0x0DBC6516),
-    MK_64(0x484B9CD2, 0x167BBCE1),
-    MK_64(0x2D136947, 0xD4CBAFEA)
-    };
-
-/* blkSize = 1024 bits. hashSize =  512 bits */
-const u64 SKEIN1024_IV_512[] =
-    {
-    MK_64(0xCAEC0E5D, 0x7C1B1B18),
-    MK_64(0xA01B0E04, 0x5F03E802),
-    MK_64(0x33840451, 0xED912885),
-    MK_64(0x374AFB04, 0xEAEC2E1C),
-    MK_64(0xDF25A0E2, 0x813581F7),
-    MK_64(0xE4004093, 0x8B12F9D2),
-    MK_64(0xA662D539, 0xC2ED39B6),
-    MK_64(0xFA8B85CF, 0x45D8C75A),
-    MK_64(0x8316ED8E, 0x29EDE796),
-    MK_64(0x053289C0, 0x2E9F91B8),
-    MK_64(0xC3F8EF1D, 0x6D518B73),
-    MK_64(0xBDCEC3C4, 0xD5EF332E),
-    MK_64(0x549A7E52, 0x22974487),
-    MK_64(0x67070872, 0x5B749816),
-    MK_64(0xB9CD28FB, 0xF0581BD1),
-    MK_64(0x0E2940B8, 0x15804974)
-    };
-
-/* blkSize = 1024 bits. hashSize = 1024 bits */
-const u64 SKEIN1024_IV_1024[] =
-    {
-    MK_64(0xD593DA07, 0x41E72355),
-    MK_64(0x15B5E511, 0xAC73E00C),
-    MK_64(0x5180E5AE, 0xBAF2C4F0),
-    MK_64(0x03BD41D3, 0xFCBCAFAF),
-    MK_64(0x1CAEC6FD, 0x1983A898),
-    MK_64(0x6E510B8B, 0xCDD0589F),
-    MK_64(0x77E2BDFD, 0xC6394ADA),
-    MK_64(0xC11E1DB5, 0x24DCB0A3),
-    MK_64(0xD6D14AF9, 0xC6329AB5),
-    MK_64(0x6A9B0BFC, 0x6EB67E0D),
-    MK_64(0x9243C60D, 0xCCFF1332),
-    MK_64(0x1A1F1DDE, 0x743F02D4),
-    MK_64(0x0996753C, 0x10ED0BB8),
-    MK_64(0x6572DD22, 0xF2B4969A),
-    MK_64(0x61FD3062, 0xD00A579A),
-    MK_64(0x1DE0536E, 0x8682E539)
-    };
-
-#endif /* _SKEIN_IV_H_ */
+#ifndef _SKEIN_IV_H_
+#define _SKEIN_IV_H_
+
+#include <skein.h>    /* get Skein macros and types */
+
+/*
+***************** Pre-computed Skein IVs *******************
+**
+** NOTE: these values are not "magic" constants, but
+** are generated using the Threefish block function.
+** They are pre-computed here only for speed; i.e., to
+** avoid the need for a Threefish call during Init().
+**
+** The IV for any fixed hash length may be pre-computed.
+** Only the most common values are included here.
+**
+************************************************************
+**/
+
+#define MK_64 SKEIN_MK_64
+
+/* blkSize =  256 bits. hashSize =  128 bits */
+const u64 SKEIN_256_IV_128[] =
+    {
+    MK_64(0xE1111906, 0x964D7260),
+    MK_64(0x883DAAA7, 0x7C8D811C),
+    MK_64(0x10080DF4, 0x91960F7A),
+    MK_64(0xCCF7DDE5, 0xB45BC1C2)
+    };
+
+/* blkSize =  256 bits. hashSize =  160 bits */
+const u64 SKEIN_256_IV_160[] =
+    {
+    MK_64(0x14202314, 0x72825E98),
+    MK_64(0x2AC4E9A2, 0x5A77E590),
+    MK_64(0xD47A5856, 0x8838D63E),
+    MK_64(0x2DD2E496, 0x8586AB7D)
+    };
+
+/* blkSize =  256 bits. hashSize =  224 bits */
+const u64 SKEIN_256_IV_224[] =
+    {
+    MK_64(0xC6098A8C, 0x9AE5EA0B),
+    MK_64(0x876D5686, 0x08C5191C),
+    MK_64(0x99CB88D7, 0xD7F53884),
+    MK_64(0x384BDDB1, 0xAEDDB5DE)
+    };
+
+/* blkSize =  256 bits. hashSize =  256 bits */
+const u64 SKEIN_256_IV_256[] =
+    {
+    MK_64(0xFC9DA860, 0xD048B449),
+    MK_64(0x2FCA6647, 0x9FA7D833),
+    MK_64(0xB33BC389, 0x6656840F),
+    MK_64(0x6A54E920, 0xFDE8DA69)
+    };
+
+/* blkSize =  512 bits. hashSize =  128 bits */
+const u64 SKEIN_512_IV_128[] =
+    {
+    MK_64(0xA8BC7BF3, 0x6FBF9F52),
+    MK_64(0x1E9872CE, 0xBD1AF0AA),
+    MK_64(0x309B1790, 0xB32190D3),
+    MK_64(0xBCFBB854, 0x3F94805C),
+    MK_64(0x0DA61BCD, 0x6E31B11B),
+    MK_64(0x1A18EBEA, 0xD46A32E3),
+    MK_64(0xA2CC5B18, 0xCE84AA82),
+    MK_64(0x6982AB28, 0x9D46982D)
+    };
+
+/* blkSize =  512 bits. hashSize =  160 bits */
+const u64 SKEIN_512_IV_160[] =
+    {
+    MK_64(0x28B81A2A, 0xE013BD91),
+    MK_64(0xC2F11668, 0xB5BDF78F),
+    MK_64(0x1760D8F3, 0xF6A56F12),
+    MK_64(0x4FB74758, 0x8239904F),
+    MK_64(0x21EDE07F, 0x7EAF5056),
+    MK_64(0xD908922E, 0x63ED70B8),
+    MK_64(0xB8EC76FF, 0xECCB52FA),
+    MK_64(0x01A47BB8, 0xA3F27A6E)
+    };
+
+/* blkSize =  512 bits. hashSize =  224 bits */
+const u64 SKEIN_512_IV_224[] =
+    {
+    MK_64(0xCCD06162, 0x48677224),
+    MK_64(0xCBA65CF3, 0xA92339EF),
+    MK_64(0x8CCD69D6, 0x52FF4B64),
+    MK_64(0x398AED7B, 0x3AB890B4),
+    MK_64(0x0F59D1B1, 0x457D2BD0),
+    MK_64(0x6776FE65, 0x75D4EB3D),
+    MK_64(0x99FBC70E, 0x997413E9),
+    MK_64(0x9E2CFCCF, 0xE1C41EF7)
+    };
+
+/* blkSize =  512 bits. hashSize =  256 bits */
+const u64 SKEIN_512_IV_256[] =
+    {
+    MK_64(0xCCD044A1, 0x2FDB3E13),
+    MK_64(0xE8359030, 0x1A79A9EB),
+    MK_64(0x55AEA061, 0x4F816E6F),
+    MK_64(0x2A2767A4, 0xAE9B94DB),
+    MK_64(0xEC06025E, 0x74DD7683),
+    MK_64(0xE7A436CD, 0xC4746251),
+    MK_64(0xC36FBAF9, 0x393AD185),
+    MK_64(0x3EEDBA18, 0x33EDFC13)
+    };
+
+/* blkSize =  512 bits. hashSize =  384 bits */
+const u64 SKEIN_512_IV_384[] =
+    {
+    MK_64(0xA3F6C6BF, 0x3A75EF5F),
+    MK_64(0xB0FEF9CC, 0xFD84FAA4),
+    MK_64(0x9D77DD66, 0x3D770CFE),
+    MK_64(0xD798CBF3, 0xB468FDDA),
+    MK_64(0x1BC4A666, 0x8A0E4465),
+    MK_64(0x7ED7D434, 0xE5807407),
+    MK_64(0x548FC1AC, 0xD4EC44D6),
+    MK_64(0x266E1754, 0x6AA18FF8)
+    };
+
+/* blkSize =  512 bits. hashSize =  512 bits */
+const u64 SKEIN_512_IV_512[] =
+    {
+    MK_64(0x4903ADFF, 0x749C51CE),
+    MK_64(0x0D95DE39, 0x9746DF03),
+    MK_64(0x8FD19341, 0x27C79BCE),
+    MK_64(0x9A255629, 0xFF352CB1),
+    MK_64(0x5DB62599, 0xDF6CA7B0),
+    MK_64(0xEABE394C, 0xA9D5C3F4),
+    MK_64(0x991112C7, 0x1A75B523),
+    MK_64(0xAE18A40B, 0x660FCC33)
+    };
+
+/* blkSize = 1024 bits. hashSize =  384 bits */
+const u64 SKEIN1024_IV_384[] =
+    {
+    MK_64(0x5102B6B8, 0xC1894A35),
+    MK_64(0xFEEBC9E3, 0xFE8AF11A),
+    MK_64(0x0C807F06, 0xE32BED71),
+    MK_64(0x60C13A52, 0xB41A91F6),
+    MK_64(0x9716D35D, 0xD4917C38),
+    MK_64(0xE780DF12, 0x6FD31D3A),
+    MK_64(0x797846B6, 0xC898303A),
+    MK_64(0xB172C2A8, 0xB3572A3B),
+    MK_64(0xC9BC8203, 0xA6104A6C),
+    MK_64(0x65909338, 0xD75624F4),
+    MK_64(0x94BCC568, 0x4B3F81A0),
+    MK_64(0x3EBBF51E, 0x10ECFD46),
+    MK_64(0x2DF50F0B, 0xEEB08542),
+    MK_64(0x3B5A6530, 0x0DBC6516),
+    MK_64(0x484B9CD2, 0x167BBCE1),
+    MK_64(0x2D136947, 0xD4CBAFEA)
+    };
+
+/* blkSize = 1024 bits. hashSize =  512 bits */
+const u64 SKEIN1024_IV_512[] =
+    {
+    MK_64(0xCAEC0E5D, 0x7C1B1B18),
+    MK_64(0xA01B0E04, 0x5F03E802),
+    MK_64(0x33840451, 0xED912885),
+    MK_64(0x374AFB04, 0xEAEC2E1C),
+    MK_64(0xDF25A0E2, 0x813581F7),
+    MK_64(0xE4004093, 0x8B12F9D2),
+    MK_64(0xA662D539, 0xC2ED39B6),
+    MK_64(0xFA8B85CF, 0x45D8C75A),
+    MK_64(0x8316ED8E, 0x29EDE796),
+    MK_64(0x053289C0, 0x2E9F91B8),
+    MK_64(0xC3F8EF1D, 0x6D518B73),
+    MK_64(0xBDCEC3C4, 0xD5EF332E),
+    MK_64(0x549A7E52, 0x22974487),
+    MK_64(0x67070872, 0x5B749816),
+    MK_64(0xB9CD28FB, 0xF0581BD1),
+    MK_64(0x0E2940B8, 0x15804974)
+    };
+
+/* blkSize = 1024 bits. hashSize = 1024 bits */
+const u64 SKEIN1024_IV_1024[] =
+    {
+    MK_64(0xD593DA07, 0x41E72355),
+    MK_64(0x15B5E511, 0xAC73E00C),
+    MK_64(0x5180E5AE, 0xBAF2C4F0),
+    MK_64(0x03BD41D3, 0xFCBCAFAF),
+    MK_64(0x1CAEC6FD, 0x1983A898),
+    MK_64(0x6E510B8B, 0xCDD0589F),
+    MK_64(0x77E2BDFD, 0xC6394ADA),
+    MK_64(0xC11E1DB5, 0x24DCB0A3),
+    MK_64(0xD6D14AF9, 0xC6329AB5),
+    MK_64(0x6A9B0BFC, 0x6EB67E0D),
+    MK_64(0x9243C60D, 0xCCFF1332),
+    MK_64(0x1A1F1DDE, 0x743F02D4),
+    MK_64(0x0996753C, 0x10ED0BB8),
+    MK_64(0x6572DD22, 0xF2B4969A),
+    MK_64(0x61FD3062, 0xD00A579A),
+    MK_64(0x1DE0536E, 0x8682E539)
+    };
+
+#endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 0ea0a6aeb168..e2e5685157a0 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -1,721 +1,721 @@
-/***********************************************************************
-**
-** Implementation of the Skein hash function.
-**
-** Source code author: Doug Whiting, 2008.
-**
-** This algorithm and source code is released to the public domain.
-**
-************************************************************************/
-
-#define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
-
-#include <linux/string.h>       /* get the memcpy/memset functions */
-#include <skein.h> /* get the Skein API definitions   */
-#include <skein_iv.h>    /* get precomputed IVs */
-
-/*****************************************************************/
-/* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-void    Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-
-/*****************************************************************/
-/*     256-bit Skein                                             */
-/*****************************************************************/
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a straight hashing operation  */
-int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
-{
-    union
-    {
-        u8  b[SKEIN_256_STATE_BYTES];
-        u64  w[SKEIN_256_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {             /* use pre-computed values, where available */
-    case  256:
-        memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
-        break;
-    case  224:
-        memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
-        break;
-    case  160:
-        memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
-        break;
-    case  128:
-        memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
-{
-    union
-    {
-        u8  b[SKEIN_256_STATE_BYTES];
-        u64  w[SKEIN_256_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(256, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* process the input bytes */
-int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
-{
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
-            Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
-            msg        += n * SKEIN_256_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the result */
-int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_256_BLOCK_BYTES)
-            n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/*****************************************************************/
-/*     512-bit Skein                                             */
-/*****************************************************************/
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a straight hashing operation  */
-int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
-{
-    union
-    {
-        u8  b[SKEIN_512_STATE_BYTES];
-        u64  w[SKEIN_512_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {             /* use pre-computed values, where available */
-    case  512:
-        memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
-        break;
-    case  384:
-        memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
-        break;
-    case  256:
-        memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
-        break;
-    case  224:
-        memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
-{
-    union
-    {
-        u8  b[SKEIN_512_STATE_BYTES];
-        u64  w[SKEIN_512_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(512, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* process the input bytes */
-int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
-{
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
-            Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
-            msg        += n * SKEIN_512_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the result */
-int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_512_BLOCK_BYTES)
-            n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/*****************************************************************/
-/*    1024-bit Skein                                             */
-/*****************************************************************/
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a straight hashing operation  */
-int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
-{
-    union
-    {
-        u8  b[SKEIN1024_STATE_BYTES];
-        u64  w[SKEIN1024_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {              /* use pre-computed values, where available */
-    case  512:
-        memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
-        break;
-    case  384:
-        memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
-        break;
-    case 1024:
-        memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
-{
-    union
-    {
-        u8  b[SKEIN1024_STATE_BYTES];
-        u64  w[SKEIN1024_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(1024, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* process the input bytes */
-int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
-{
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
-            Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
-            msg        += n * SKEIN1024_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the result */
-int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN1024_BLOCK_BYTES)
-            n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/**************** Functions to support MAC/tree hashing ***************/
-/*   (this code is identical for Optimized and Reference versions)    */
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
-{
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
-
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
-{
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
-
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
-{
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
-
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
-
-    return SKEIN_SUCCESS;
-}
-
-#if SKEIN_TREE_HASH
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* just do the OUTPUT stage                                       */
-int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_256_BLOCK_BYTES)
-            n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* just do the OUTPUT stage                                       */
-int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_512_BLOCK_BYTES)
-            n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* just do the OUTPUT stage                                       */
-int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN1024_BLOCK_BYTES)
-            n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-#endif
+/***********************************************************************
+**
+** Implementation of the Skein hash function.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+************************************************************************/
+
+#define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
+
+#include <linux/string.h>       /* get the memcpy/memset functions */
+#include <skein.h> /* get the Skein API definitions   */
+#include <skein_iv.h>    /* get precomputed IVs */
+
+/*****************************************************************/
+/* External function to process blkCnt (nonzero) full block(s) of data. */
+void    Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void    Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+
+/*****************************************************************/
+/*     256-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u8  b[SKEIN_256_STATE_BYTES];
+        u64  w[SKEIN_256_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {             /* use pre-computed values, where available */
+    case  256:
+        memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
+        break;
+    case  224:
+        memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
+        break;
+    case  160:
+        memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
+        break;
+    case  128:
+        memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+        break;
+    }
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+{
+    union
+    {
+        u8  b[SKEIN_256_STATE_BYTES];
+        u64  w[SKEIN_256_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx, CFG_FINAL);
+
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(256, &ctx->h, key, keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
+            Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
+            msg        += n * SKEIN_256_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN_256_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_256_BLOCK_BYTES)
+            n  = SKEIN_256_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*****************************************************************/
+/*     512-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u8  b[SKEIN_512_STATE_BYTES];
+        u64  w[SKEIN_512_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {             /* use pre-computed values, where available */
+    case  512:
+        memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
+        break;
+    case  384:
+        memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
+        break;
+    case  256:
+        memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
+        break;
+    case  224:
+        memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+        break;
+    }
+
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+{
+    union
+    {
+        u8  b[SKEIN_512_STATE_BYTES];
+        u64  w[SKEIN_512_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx, CFG_FINAL);
+
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(512, &ctx->h, key, keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
+            Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
+            msg        += n * SKEIN_512_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN_512_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_512_BLOCK_BYTES)
+            n  = SKEIN_512_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*****************************************************************/
+/*    1024-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u8  b[SKEIN1024_STATE_BYTES];
+        u64  w[SKEIN1024_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {              /* use pre-computed values, where available */
+    case  512:
+        memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
+        break;
+    case  384:
+        memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
+        break;
+    case 1024:
+        memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+        break;
+    }
+
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+{
+    union
+    {
+        u8  b[SKEIN1024_STATE_BYTES];
+        u64  w[SKEIN1024_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx, CFG_FINAL);
+
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(1024, &ctx->h, key, keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
+            Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
+            msg        += n * SKEIN1024_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN1024_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN1024_BLOCK_BYTES)
+            n  = SKEIN1024_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/**************** Functions to support MAC/tree hashing ***************/
+/*   (this code is identical for Optimized and Reference versions)    */
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+#if SKEIN_TREE_HASH
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN_256_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_256_BLOCK_BYTES)
+            n  = SKEIN_256_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN_512_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_512_BLOCK_BYTES)
+            n  = SKEIN_512_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN1024_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN1024_BLOCK_BYTES)
+            n  = SKEIN1024_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+#endif
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
old mode 100755
new mode 100644
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index a4b1ec56ad83..d98933eeb0bf 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -1,172 +1,172 @@
-
-#include <linux/string.h>
-#include <skein.h>
-#include <threefishApi.h>
-
-
-/*****************************  Skein_256 ******************************/
-void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
-                             size_t blkCnt, size_t byteCntAdd)
-{
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
-    u64 words[3];
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish256, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN_256_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = ctx->X[0] ^ w[0];
-        ctx->X[1] = ctx->X[1] ^ w[1];
-        ctx->X[2] = ctx->X[2] ^ w[2];
-        ctx->X[3] = ctx->X[3] ^ w[3];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
-}
-
-void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
-                             size_t blkCnt, size_t byteCntAdd)
-{
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64 words[3];
-    u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish512, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN_512_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = ctx->X[0] ^ w[0];
-        ctx->X[1] = ctx->X[1] ^ w[1];
-        ctx->X[2] = ctx->X[2] ^ w[2];
-        ctx->X[3] = ctx->X[3] ^ w[3];
-        ctx->X[4] = ctx->X[4] ^ w[4];
-        ctx->X[5] = ctx->X[5] ^ w[5];
-        ctx->X[6] = ctx->X[6] ^ w[6];
-        ctx->X[7] = ctx->X[7] ^ w[7];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
-}
-
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
-                              size_t blkCnt, size_t byteCntAdd)
-{
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64 words[3];
-    u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish1024, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN1024_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0]  = ctx->X[0]  ^ w[0];
-        ctx->X[1]  = ctx->X[1]  ^ w[1];
-        ctx->X[2]  = ctx->X[2]  ^ w[2];
-        ctx->X[3]  = ctx->X[3]  ^ w[3];
-        ctx->X[4]  = ctx->X[4]  ^ w[4];
-        ctx->X[5]  = ctx->X[5]  ^ w[5];
-        ctx->X[6]  = ctx->X[6]  ^ w[6];
-        ctx->X[7]  = ctx->X[7]  ^ w[7];
-        ctx->X[8]  = ctx->X[8]  ^ w[8];
-        ctx->X[9]  = ctx->X[9]  ^ w[9];
-        ctx->X[10] = ctx->X[10] ^ w[10];
-        ctx->X[11] = ctx->X[11] ^ w[11];
-        ctx->X[12] = ctx->X[12] ^ w[12];
-        ctx->X[13] = ctx->X[13] ^ w[13];
-        ctx->X[14] = ctx->X[14] ^ w[14];
-        ctx->X[15] = ctx->X[15] ^ w[15];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
-}
+
+#include <linux/string.h>
+#include <skein.h>
+#include <threefishApi.h>
+
+
+/*****************************  Skein_256 ******************************/
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
+                             size_t blkCnt, size_t byteCntAdd)
+{
+    struct threefish_key key;
+    u64 tweak[2];
+    int i;
+    u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+    u64 words[3];
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64 carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish256, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN_256_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = ctx->X[0] ^ w[0];
+        ctx->X[1] = ctx->X[1] ^ w[1];
+        ctx->X[2] = ctx->X[2] ^ w[2];
+        ctx->X[3] = ctx->X[3] ^ w[3];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
+
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
+                             size_t blkCnt, size_t byteCntAdd)
+{
+    struct threefish_key key;
+    u64 tweak[2];
+    int i;
+    u64 words[3];
+    u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64 carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish512, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN_512_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = ctx->X[0] ^ w[0];
+        ctx->X[1] = ctx->X[1] ^ w[1];
+        ctx->X[2] = ctx->X[2] ^ w[2];
+        ctx->X[3] = ctx->X[3] ^ w[3];
+        ctx->X[4] = ctx->X[4] ^ w[4];
+        ctx->X[5] = ctx->X[5] ^ w[5];
+        ctx->X[6] = ctx->X[6] ^ w[6];
+        ctx->X[7] = ctx->X[7] ^ w[7];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
+
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
+                              size_t blkCnt, size_t byteCntAdd)
+{
+    struct threefish_key key;
+    u64 tweak[2];
+    int i;
+    u64 words[3];
+    u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64 carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish1024, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN1024_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0]  = ctx->X[0]  ^ w[0];
+        ctx->X[1]  = ctx->X[1]  ^ w[1];
+        ctx->X[2]  = ctx->X[2]  ^ w[2];
+        ctx->X[3]  = ctx->X[3]  ^ w[3];
+        ctx->X[4]  = ctx->X[4]  ^ w[4];
+        ctx->X[5]  = ctx->X[5]  ^ w[5];
+        ctx->X[6]  = ctx->X[6]  ^ w[6];
+        ctx->X[7]  = ctx->X[7]  ^ w[7];
+        ctx->X[8]  = ctx->X[8]  ^ w[8];
+        ctx->X[9]  = ctx->X[9]  ^ w[9];
+        ctx->X[10] = ctx->X[10] ^ w[10];
+        ctx->X[11] = ctx->X[11] ^ w[11];
+        ctx->X[12] = ctx->X[12] ^ w[12];
+        ctx->X[13] = ctx->X[13] ^ w[13];
+        ctx->X[14] = ctx->X[14] ^ w[14];
+        ctx->X[15] = ctx->X[15] ^ w[15];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 791bacdd3d57..e62b6442783e 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -1,686 +1,686 @@
-/***********************************************************************
-**
-** Implementation of the Skein block functions.
-**
-** Source code author: Doug Whiting, 2008.
-**
-** This algorithm and source code is released to the public domain.
-**
-** Compile-time switches:
-**
-**  SKEIN_USE_ASM  -- set bits (256/512/1024) to select which
-**                    versions use ASM code for block processing
-**                    [default: use C for all block sizes]
-**
-************************************************************************/
-
-#include <linux/string.h>
-#include <skein.h>
-
-#ifndef SKEIN_USE_ASM
-#define SKEIN_USE_ASM   (0)                     /* default is all C code (no ASM) */
-#endif
-
-#ifndef SKEIN_LOOP
-#define SKEIN_LOOP 001                          /* default: unroll 256 and 512, but not 1024 */
-#endif
-
-#define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
-#define KW_TWK_BASE     (0)
-#define KW_KEY_BASE     (3)
-#define ks              (kw + KW_KEY_BASE)                
-#define ts              (kw + KW_TWK_BASE)
-
-#ifdef SKEIN_DEBUG
-#define DebugSaveTweak(ctx) { ctx->h.T[0] = ts[0]; ctx->h.T[1] = ts[1]; }
-#else
-#define DebugSaveTweak(ctx)
-#endif
-
-/*****************************  Skein_256 ******************************/
-#if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C */
-    enum {
-        WCNT = SKEIN_256_STATE_WORDS
-        };
-#undef  RCNT
-#define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
-
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
-#define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
-#else
-#define SKEIN_UNROLL_256 (0)
-#endif
-
-#if SKEIN_UNROLL_256
-#if (RCNT % SKEIN_UNROLL_256)
-#error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
-#endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
-#else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
-#endif
-    u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
-    u64  w[WCNT];                           /* local copy of input block */
-#ifdef SKEIN_DEBUG
-    const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
-#endif
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0] = ctx->X[0];     
-        ks[1] = ctx->X[1];
-        ks[2] = ctx->X[2];
-        ks[3] = ctx->X[3];
-        ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
-
-        ts[2] = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X0 = w[0] + ks[0];                      /* do the first full key injection */
-        X1 = w[1] + ks[1] + ts[0];
-        X2 = w[2] + ks[2] + ts[1];
-        X3 = w[3] + ks[3];
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
-
-        blkPtr += SKEIN_256_BLOCK_BYTES;
-
-        /* run the rounds */
-
-#define Round256(p0, p1, p2, p3, ROT, rNum)                              \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
-
-#if SKEIN_UNROLL_256 == 0                       
-#define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
-    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
-
-#define I256(R)                                                     \
-    X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
-    X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
-    X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
-    X3   += ks[((R)+4) % 5] +     (R)+1;                            \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
-
-#define I256(R)                                                     \
-    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
-    X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
-    X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
-    X3   += ks[r+(R)+3] +    r+(R);                              \
-    ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
-    ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
-#endif  
-        {    
-#define R256_8_rounds(R)                  \
-        R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
-        R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
-        R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
-        R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
-        I256(2 * (R));                      \
-        R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
-        R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
-        R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
-        R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
-        I256(2 * (R) + 1);
-
-        R256_8_rounds(0);
-
-#define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
-
-  #if   R256_Unroll_R(1)
-        R256_8_rounds(1);
-  #endif
-  #if   R256_Unroll_R(2)
-        R256_8_rounds(2);
-  #endif
-  #if   R256_Unroll_R(3)
-        R256_8_rounds(3);
-  #endif
-  #if   R256_Unroll_R(4)
-        R256_8_rounds(4);
-  #endif
-  #if   R256_Unroll_R(5)
-        R256_8_rounds(5);
-  #endif
-  #if   R256_Unroll_R(6)
-        R256_8_rounds(6);
-  #endif
-  #if   R256_Unroll_R(7)
-        R256_8_rounds(7);
-  #endif
-  #if   R256_Unroll_R(8)
-        R256_8_rounds(8);
-  #endif
-  #if   R256_Unroll_R(9)
-        R256_8_rounds(9);
-  #endif
-  #if   R256_Unroll_R(10)
-        R256_8_rounds(10);
-  #endif
-  #if   R256_Unroll_R(11)
-        R256_8_rounds(11);
-  #endif
-  #if   R256_Unroll_R(12)
-        R256_8_rounds(12);
-  #endif
-  #if   R256_Unroll_R(13)
-        R256_8_rounds(13);
-  #endif
-  #if   R256_Unroll_R(14)
-        R256_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_256 > 14)
-#error  "need more unrolling in Skein_256_Process_Block"
-  #endif
-        }
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = X0 ^ w[0];
-        ctx->X[1] = X1 ^ w[1];
-        ctx->X[2] = X2 ^ w[2];
-        ctx->X[3] = X3 ^ w[3];
-
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
-
-#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
-size_t Skein_256_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein_256_Process_Block_CodeSize) -
-           ((u8 *) Skein_256_Process_Block);
-    }
-unsigned int Skein_256_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_256;
-    }
-#endif
-#endif
-
-/*****************************  Skein_512 ******************************/
-#if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C */
-    enum {
-        WCNT = SKEIN_512_STATE_WORDS
-        };
-#undef  RCNT
-#define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
-
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
-#define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
-#else
-#define SKEIN_UNROLL_512 (0)
-#endif
-
-#if SKEIN_UNROLL_512
-#if (RCNT % SKEIN_UNROLL_512)
-#error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
-#endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
-#else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
-#endif
-    u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
-    u64  w[WCNT];                           /* local copy of input block */
-#ifdef SKEIN_DEBUG
-    const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
-    Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
-#endif
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0] = ctx->X[0];
-        ks[1] = ctx->X[1];
-        ks[2] = ctx->X[2];
-        ks[3] = ctx->X[3];
-        ks[4] = ctx->X[4];
-        ks[5] = ctx->X[5];
-        ks[6] = ctx->X[6];
-        ks[7] = ctx->X[7];
-        ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
-                ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
-
-        ts[2] = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X0   = w[0] + ks[0];                    /* do the first full key injection */
-        X1   = w[1] + ks[1];
-        X2   = w[2] + ks[2];
-        X3   = w[3] + ks[3];
-        X4   = w[4] + ks[4];
-        X5   = w[5] + ks[5] + ts[0];
-        X6   = w[6] + ks[6] + ts[1];
-        X7   = w[7] + ks[7];
-
-        blkPtr += SKEIN_512_BLOCK_BYTES;
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
-        /* run the rounds */
-#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
-
-#if SKEIN_UNROLL_512 == 0                       
-#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
-    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
-
-#define I512(R)                                                     \
-    X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
-    X1   += ks[((R) + 2) % 9];                                        \
-    X2   += ks[((R) + 3) % 9];                                        \
-    X3   += ks[((R) + 4) % 9];                                        \
-    X4   += ks[((R) + 5) % 9];                                        \
-    X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
-    X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
-    X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
-
-#define I512(R)                                                     \
-    X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
-    X1   += ks[r + (R) + 1];                                            \
-    X2   += ks[r + (R) + 2];                                            \
-    X3   += ks[r + (R) + 3];                                            \
-    X4   += ks[r + (R) + 4];                                            \
-    X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
-    X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
-    X7   += ks[r + (R) + 7] +         r + (R);                              \
-    ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
-    ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
-#endif                         /* end of looped code definitions */
-        {
-#define R512_8_rounds(R)  /* do 8 full rounds */  \
-        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
-        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
-        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
-        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
-        I512(2 * (R));                              \
-        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
-        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
-        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
-        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
-        I512(2 * (R) + 1);        /* and key injection */
-
-        R512_8_rounds(0);
-
-#define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
-
-  #if   R512_Unroll_R(1)
-        R512_8_rounds(1);
-  #endif
-  #if   R512_Unroll_R(2)
-        R512_8_rounds(2);
-  #endif
-  #if   R512_Unroll_R(3)
-        R512_8_rounds(3);
-  #endif
-  #if   R512_Unroll_R(4)
-        R512_8_rounds(4);
-  #endif
-  #if   R512_Unroll_R(5)
-        R512_8_rounds(5);
-  #endif
-  #if   R512_Unroll_R(6)
-        R512_8_rounds(6);
-  #endif
-  #if   R512_Unroll_R(7)
-        R512_8_rounds(7);
-  #endif
-  #if   R512_Unroll_R(8)
-        R512_8_rounds(8);
-  #endif
-  #if   R512_Unroll_R(9)
-        R512_8_rounds(9);
-  #endif
-  #if   R512_Unroll_R(10)
-        R512_8_rounds(10);
-  #endif
-  #if   R512_Unroll_R(11)
-        R512_8_rounds(11);
-  #endif
-  #if   R512_Unroll_R(12)
-        R512_8_rounds(12);
-  #endif
-  #if   R512_Unroll_R(13)
-        R512_8_rounds(13);
-  #endif
-  #if   R512_Unroll_R(14)
-        R512_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_512 > 14)
-#error  "need more unrolling in Skein_512_Process_Block"
-  #endif
-        }
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = X0 ^ w[0];
-        ctx->X[1] = X1 ^ w[1];
-        ctx->X[2] = X2 ^ w[2];
-        ctx->X[3] = X3 ^ w[3];
-        ctx->X[4] = X4 ^ w[4];
-        ctx->X[5] = X5 ^ w[5];
-        ctx->X[6] = X6 ^ w[6];
-        ctx->X[7] = X7 ^ w[7];
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
-
-#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
-size_t Skein_512_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein_512_Process_Block_CodeSize) -
-           ((u8 *) Skein_512_Process_Block);
-    }
-unsigned int Skein_512_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_512;
-    }
-#endif
-#endif
-
-/*****************************  Skein1024 ******************************/
-#if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C, always looping (unrolled is bigger AND slower!) */
-    enum {
-        WCNT = SKEIN1024_STATE_WORDS
-        };
-#undef  RCNT
-#define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
-
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
-#define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
-#else
-#define SKEIN_UNROLL_1024 (0)
-#endif
-
-#if (SKEIN_UNROLL_1024 != 0)
-#if (RCNT % SKEIN_UNROLL_1024)
-#error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
-#endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
-#else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
-#endif
-
-    u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
-            X08, X09, X10, X11, X12, X13, X14, X15;
-    u64  w[WCNT];                            /* local copy of input block */
-#ifdef SKEIN_DEBUG
-    const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
-    Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
-    Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
-    Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
-#endif
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0]  = ctx->X[0];
-        ks[1]  = ctx->X[1];
-        ks[2]  = ctx->X[2];
-        ks[3]  = ctx->X[3];
-        ks[4]  = ctx->X[4];
-        ks[5]  = ctx->X[5];
-        ks[6]  = ctx->X[6];
-        ks[7]  = ctx->X[7];
-        ks[8]  = ctx->X[8];
-        ks[9]  = ctx->X[9];
-        ks[10] = ctx->X[10];
-        ks[11] = ctx->X[11];
-        ks[12] = ctx->X[12];
-        ks[13] = ctx->X[13];
-        ks[14] = ctx->X[14];
-        ks[15] = ctx->X[15];
-        ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
-                  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
-                  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
-                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
-
-        ts[2]  = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X00    =  w[0] +  ks[0];                 /* do the first full key injection */
-        X01    =  w[1] +  ks[1];
-        X02    =  w[2] +  ks[2];
-        X03    =  w[3] +  ks[3];
-        X04    =  w[4] +  ks[4];
-        X05    =  w[5] +  ks[5];
-        X06    =  w[6] +  ks[6];
-        X07    =  w[7] +  ks[7];
-        X08    =  w[8] +  ks[8];
-        X09    =  w[9] +  ks[9];
-        X10    = w[10] + ks[10];
-        X11    = w[11] + ks[11];
-        X12    = w[12] + ks[12];
-        X13    = w[13] + ks[13] + ts[0];
-        X14    = w[14] + ks[14] + ts[1];
-        X15    = w[15] + ks[15];
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
-
-#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
-    X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
-    X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
-    X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
-    X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
-
-#if SKEIN_UNROLL_1024 == 0                      
-#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
-
-#define I1024(R)                                                        \
-    X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
-    X01   += ks[((R) +  2) % 17];                                       \
-    X02   += ks[((R) +  3) % 17];                                       \
-    X03   += ks[((R) +  4) % 17];                                       \
-    X04   += ks[((R) +  5) % 17];                                       \
-    X05   += ks[((R) +  6) % 17];                                       \
-    X06   += ks[((R) +  7) % 17];                                       \
-    X07   += ks[((R) +  8) % 17];                                       \
-    X08   += ks[((R) +  9) % 17];                                       \
-    X09   += ks[((R) + 10) % 17];                                       \
-    X10   += ks[((R) + 11) % 17];                                       \
-    X11   += ks[((R) + 12) % 17];                                       \
-    X12   += ks[((R) + 13) % 17];                                       \
-    X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
-    X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
-    X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
-#else                                       /* looping version */
-#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
-
-#define I1024(R)                                                      \
-    X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
-    X01   += ks[r + (R) +  1];                                            \
-    X02   += ks[r + (R) +  2];                                            \
-    X03   += ks[r + (R) +  3];                                            \
-    X04   += ks[r + (R) +  4];                                            \
-    X05   += ks[r + (R) +  5];                                            \
-    X06   += ks[r + (R) +  6];                                            \
-    X07   += ks[r + (R) +  7];                                            \
-    X08   += ks[r + (R) +  8];                                            \
-    X09   += ks[r + (R) +  9];                                            \
-    X10   += ks[r + (R) + 10];                                            \
-    X11   += ks[r + (R) + 11];                                            \
-    X12   += ks[r + (R) + 12];                                            \
-    X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
-    X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
-    X15   += ks[r + (R) + 15] +         r + (R);                          \
-    ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
-    ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
-    Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
-#endif  
-        {
-#define R1024_8_rounds(R)    /* do 8 full rounds */                               \
-        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
-        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
-        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
-        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
-        I1024(2*(R));                                                             \
-        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
-        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
-        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
-        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
-        I1024(2*(R)+1);
-
-        R1024_8_rounds(0);
-
-#define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
-
-  #if   R1024_Unroll_R(1)
-        R1024_8_rounds(1);
-  #endif
-  #if   R1024_Unroll_R(2)
-        R1024_8_rounds(2);
-  #endif
-  #if   R1024_Unroll_R(3)
-        R1024_8_rounds(3);
-  #endif
-  #if   R1024_Unroll_R(4)
-        R1024_8_rounds(4);
-  #endif
-  #if   R1024_Unroll_R(5)
-        R1024_8_rounds(5);
-  #endif
-  #if   R1024_Unroll_R(6)
-        R1024_8_rounds(6);
-  #endif
-  #if   R1024_Unroll_R(7)
-        R1024_8_rounds(7);
-  #endif
-  #if   R1024_Unroll_R(8)
-        R1024_8_rounds(8);
-  #endif
-  #if   R1024_Unroll_R(9)
-        R1024_8_rounds(9);
-  #endif
-  #if   R1024_Unroll_R(10)
-        R1024_8_rounds(10);
-  #endif
-  #if   R1024_Unroll_R(11)
-        R1024_8_rounds(11);
-  #endif
-  #if   R1024_Unroll_R(12)
-        R1024_8_rounds(12);
-  #endif
-  #if   R1024_Unroll_R(13)
-        R1024_8_rounds(13);
-  #endif
-  #if   R1024_Unroll_R(14)
-        R1024_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_1024 > 14)
-#error  "need more unrolling in Skein_1024_Process_Block"
-  #endif
-        }
-        /* do the final "feedforward" xor, update context chaining vars */
-
-        ctx->X[0] = X00 ^ w[0];
-        ctx->X[1] = X01 ^ w[1];
-        ctx->X[2] = X02 ^ w[2];
-        ctx->X[3] = X03 ^ w[3];
-        ctx->X[4] = X04 ^ w[4];
-        ctx->X[5] = X05 ^ w[5];
-        ctx->X[6] = X06 ^ w[6];
-        ctx->X[7] = X07 ^ w[7];
-        ctx->X[8] = X08 ^ w[8];
-        ctx->X[9] = X09 ^ w[9];
-        ctx->X[10] = X10 ^ w[10];
-        ctx->X[11] = X11 ^ w[11];
-        ctx->X[12] = X12 ^ w[12];
-        ctx->X[13] = X13 ^ w[13];
-        ctx->X[14] = X14 ^ w[14];
-        ctx->X[15] = X15 ^ w[15];
-
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-        
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        blkPtr += SKEIN1024_BLOCK_BYTES;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
-
-#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
-size_t Skein1024_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein1024_Process_Block_CodeSize) -
-           ((u8 *) Skein1024_Process_Block);
-    }
-unsigned int Skein1024_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_1024;
-    }
-#endif
-#endif
+/***********************************************************************
+**
+** Implementation of the Skein block functions.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+** Compile-time switches:
+**
+**  SKEIN_USE_ASM  -- set bits (256/512/1024) to select which
+**                    versions use ASM code for block processing
+**                    [default: use C for all block sizes]
+**
+************************************************************************/
+
+#include <linux/string.h>
+#include <skein.h>
+
+#ifndef SKEIN_USE_ASM
+#define SKEIN_USE_ASM   (0)                     /* default is all C code (no ASM) */
+#endif
+
+#ifndef SKEIN_LOOP
+#define SKEIN_LOOP 001                          /* default: unroll 256 and 512, but not 1024 */
+#endif
+
+#define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
+#define KW_TWK_BASE     (0)
+#define KW_KEY_BASE     (3)
+#define ks              (kw + KW_KEY_BASE)                
+#define ts              (kw + KW_TWK_BASE)
+
+#ifdef SKEIN_DEBUG
+#define DebugSaveTweak(ctx) { ctx->h.T[0] = ts[0]; ctx->h.T[1] = ts[1]; }
+#else
+#define DebugSaveTweak(ctx)
+#endif
+
+/*****************************  Skein_256 ******************************/
+#if !(SKEIN_USE_ASM & 256)
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+    { /* do it in C */
+    enum {
+        WCNT = SKEIN_256_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
+
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
+#else
+#define SKEIN_UNROLL_256 (0)
+#endif
+
+#if SKEIN_UNROLL_256
+#if (RCNT % SKEIN_UNROLL_256)
+#error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+    u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
+    u64  w[WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+#endif
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0] = ctx->X[0];     
+        ks[1] = ctx->X[1];
+        ks[2] = ctx->X[2];
+        ks[3] = ctx->X[3];
+        ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
+
+        ts[2] = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+        X0 = w[0] + ks[0];                      /* do the first full key injection */
+        X1 = w[1] + ks[1] + ts[0];
+        X2 = w[2] + ks[2] + ts[1];
+        X3 = w[3] + ks[3];
+
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
+
+        blkPtr += SKEIN_256_BLOCK_BYTES;
+
+        /* run the rounds */
+
+#define Round256(p0, p1, p2, p3, ROT, rNum)                              \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+
+#if SKEIN_UNROLL_256 == 0                       
+#define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
+    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+
+#define I256(R)                                                     \
+    X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
+    X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
+    X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
+    X3   += ks[((R)+4) % 5] +     (R)+1;                            \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+#else                                       /* looping version */
+#define R256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+
+#define I256(R)                                                     \
+    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
+    X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
+    X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
+    X3   += ks[r+(R)+3] +    r+(R);                              \
+    ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
+    ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
+#endif  
+        {    
+#define R256_8_rounds(R)                  \
+        R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
+        R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
+        R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
+        R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
+        I256(2 * (R));                      \
+        R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
+        R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
+        R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
+        R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
+        I256(2 * (R) + 1);
+
+        R256_8_rounds(0);
+
+#define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
+
+  #if   R256_Unroll_R(1)
+        R256_8_rounds(1);
+  #endif
+  #if   R256_Unroll_R(2)
+        R256_8_rounds(2);
+  #endif
+  #if   R256_Unroll_R(3)
+        R256_8_rounds(3);
+  #endif
+  #if   R256_Unroll_R(4)
+        R256_8_rounds(4);
+  #endif
+  #if   R256_Unroll_R(5)
+        R256_8_rounds(5);
+  #endif
+  #if   R256_Unroll_R(6)
+        R256_8_rounds(6);
+  #endif
+  #if   R256_Unroll_R(7)
+        R256_8_rounds(7);
+  #endif
+  #if   R256_Unroll_R(8)
+        R256_8_rounds(8);
+  #endif
+  #if   R256_Unroll_R(9)
+        R256_8_rounds(9);
+  #endif
+  #if   R256_Unroll_R(10)
+        R256_8_rounds(10);
+  #endif
+  #if   R256_Unroll_R(11)
+        R256_8_rounds(11);
+  #endif
+  #if   R256_Unroll_R(12)
+        R256_8_rounds(12);
+  #endif
+  #if   R256_Unroll_R(13)
+        R256_8_rounds(13);
+  #endif
+  #if   R256_Unroll_R(14)
+        R256_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_256 > 14)
+#error  "need more unrolling in Skein_256_Process_Block"
+  #endif
+        }
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = X0 ^ w[0];
+        ctx->X[1] = X1 ^ w[1];
+        ctx->X[2] = X2 ^ w[2];
+        ctx->X[3] = X3 ^ w[3];
+
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein_256_Process_Block_CodeSize(void)
+    {
+    return ((u8 *) Skein_256_Process_Block_CodeSize) -
+           ((u8 *) Skein_256_Process_Block);
+    }
+unsigned int Skein_256_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_256;
+    }
+#endif
+#endif
+
+/*****************************  Skein_512 ******************************/
+#if !(SKEIN_USE_ASM & 512)
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+    { /* do it in C */
+    enum {
+        WCNT = SKEIN_512_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
+
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
+#else
+#define SKEIN_UNROLL_512 (0)
+#endif
+
+#if SKEIN_UNROLL_512
+#if (RCNT % SKEIN_UNROLL_512)
+#error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+    u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
+    u64  w[WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+    Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
+#endif
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0] = ctx->X[0];
+        ks[1] = ctx->X[1];
+        ks[2] = ctx->X[2];
+        ks[3] = ctx->X[3];
+        ks[4] = ctx->X[4];
+        ks[5] = ctx->X[5];
+        ks[6] = ctx->X[6];
+        ks[7] = ctx->X[7];
+        ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
+                ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
+
+        ts[2] = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+        X0   = w[0] + ks[0];                    /* do the first full key injection */
+        X1   = w[1] + ks[1];
+        X2   = w[2] + ks[2];
+        X3   = w[3] + ks[3];
+        X4   = w[4] + ks[4];
+        X5   = w[5] + ks[5] + ts[0];
+        X6   = w[6] + ks[6] + ts[1];
+        X7   = w[7] + ks[7];
+
+        blkPtr += SKEIN_512_BLOCK_BYTES;
+
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+        /* run the rounds */
+#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
+
+#if SKEIN_UNROLL_512 == 0                       
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
+    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+
+#define I512(R)                                                     \
+    X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
+    X1   += ks[((R) + 2) % 9];                                        \
+    X2   += ks[((R) + 3) % 9];                                        \
+    X3   += ks[((R) + 4) % 9];                                        \
+    X4   += ks[((R) + 5) % 9];                                        \
+    X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
+    X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
+    X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+#else                                       /* looping version */
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+
+#define I512(R)                                                     \
+    X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
+    X1   += ks[r + (R) + 1];                                            \
+    X2   += ks[r + (R) + 2];                                            \
+    X3   += ks[r + (R) + 3];                                            \
+    X4   += ks[r + (R) + 4];                                            \
+    X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
+    X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
+    X7   += ks[r + (R) + 7] +         r + (R);                              \
+    ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
+    ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
+#endif                         /* end of looped code definitions */
+        {
+#define R512_8_rounds(R)  /* do 8 full rounds */  \
+        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
+        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
+        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
+        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
+        I512(2 * (R));                              \
+        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
+        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
+        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
+        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
+        I512(2 * (R) + 1);        /* and key injection */
+
+        R512_8_rounds(0);
+
+#define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
+
+  #if   R512_Unroll_R(1)
+        R512_8_rounds(1);
+  #endif
+  #if   R512_Unroll_R(2)
+        R512_8_rounds(2);
+  #endif
+  #if   R512_Unroll_R(3)
+        R512_8_rounds(3);
+  #endif
+  #if   R512_Unroll_R(4)
+        R512_8_rounds(4);
+  #endif
+  #if   R512_Unroll_R(5)
+        R512_8_rounds(5);
+  #endif
+  #if   R512_Unroll_R(6)
+        R512_8_rounds(6);
+  #endif
+  #if   R512_Unroll_R(7)
+        R512_8_rounds(7);
+  #endif
+  #if   R512_Unroll_R(8)
+        R512_8_rounds(8);
+  #endif
+  #if   R512_Unroll_R(9)
+        R512_8_rounds(9);
+  #endif
+  #if   R512_Unroll_R(10)
+        R512_8_rounds(10);
+  #endif
+  #if   R512_Unroll_R(11)
+        R512_8_rounds(11);
+  #endif
+  #if   R512_Unroll_R(12)
+        R512_8_rounds(12);
+  #endif
+  #if   R512_Unroll_R(13)
+        R512_8_rounds(13);
+  #endif
+  #if   R512_Unroll_R(14)
+        R512_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_512 > 14)
+#error  "need more unrolling in Skein_512_Process_Block"
+  #endif
+        }
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = X0 ^ w[0];
+        ctx->X[1] = X1 ^ w[1];
+        ctx->X[2] = X2 ^ w[2];
+        ctx->X[3] = X3 ^ w[3];
+        ctx->X[4] = X4 ^ w[4];
+        ctx->X[5] = X5 ^ w[5];
+        ctx->X[6] = X6 ^ w[6];
+        ctx->X[7] = X7 ^ w[7];
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein_512_Process_Block_CodeSize(void)
+    {
+    return ((u8 *) Skein_512_Process_Block_CodeSize) -
+           ((u8 *) Skein_512_Process_Block);
+    }
+unsigned int Skein_512_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_512;
+    }
+#endif
+#endif
+
+/*****************************  Skein1024 ******************************/
+#if !(SKEIN_USE_ASM & 1024)
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+    { /* do it in C, always looping (unrolled is bigger AND slower!) */
+    enum {
+        WCNT = SKEIN1024_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
+
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
+#else
+#define SKEIN_UNROLL_1024 (0)
+#endif
+
+#if (SKEIN_UNROLL_1024 != 0)
+#if (RCNT % SKEIN_UNROLL_1024)
+#error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+
+    u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
+            X08, X09, X10, X11, X12, X13, X14, X15;
+    u64  w[WCNT];                            /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
+    Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
+    Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
+    Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
+#endif
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0]  = ctx->X[0];
+        ks[1]  = ctx->X[1];
+        ks[2]  = ctx->X[2];
+        ks[3]  = ctx->X[3];
+        ks[4]  = ctx->X[4];
+        ks[5]  = ctx->X[5];
+        ks[6]  = ctx->X[6];
+        ks[7]  = ctx->X[7];
+        ks[8]  = ctx->X[8];
+        ks[9]  = ctx->X[9];
+        ks[10] = ctx->X[10];
+        ks[11] = ctx->X[11];
+        ks[12] = ctx->X[12];
+        ks[13] = ctx->X[13];
+        ks[14] = ctx->X[14];
+        ks[15] = ctx->X[15];
+        ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
+                  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
+                  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
+                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
+
+        ts[2]  = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+        X00    =  w[0] +  ks[0];                 /* do the first full key injection */
+        X01    =  w[1] +  ks[1];
+        X02    =  w[2] +  ks[2];
+        X03    =  w[3] +  ks[3];
+        X04    =  w[4] +  ks[4];
+        X05    =  w[5] +  ks[5];
+        X06    =  w[6] +  ks[6];
+        X07    =  w[7] +  ks[7];
+        X08    =  w[8] +  ks[8];
+        X09    =  w[9] +  ks[9];
+        X10    = w[10] + ks[10];
+        X11    = w[11] + ks[11];
+        X12    = w[12] + ks[12];
+        X13    = w[13] + ks[13] + ts[0];
+        X14    = w[14] + ks[14] + ts[1];
+        X15    = w[15] + ks[15];
+
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+
+#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
+    X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
+    X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
+    X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
+    X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
+
+#if SKEIN_UNROLL_1024 == 0                      
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+
+#define I1024(R)                                                        \
+    X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
+    X01   += ks[((R) +  2) % 17];                                       \
+    X02   += ks[((R) +  3) % 17];                                       \
+    X03   += ks[((R) +  4) % 17];                                       \
+    X04   += ks[((R) +  5) % 17];                                       \
+    X05   += ks[((R) +  6) % 17];                                       \
+    X06   += ks[((R) +  7) % 17];                                       \
+    X07   += ks[((R) +  8) % 17];                                       \
+    X08   += ks[((R) +  9) % 17];                                       \
+    X09   += ks[((R) + 10) % 17];                                       \
+    X10   += ks[((R) + 11) % 17];                                       \
+    X11   += ks[((R) + 12) % 17];                                       \
+    X12   += ks[((R) + 13) % 17];                                       \
+    X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
+    X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
+    X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
+#else                                       /* looping version */
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
+
+#define I1024(R)                                                      \
+    X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
+    X01   += ks[r + (R) +  1];                                            \
+    X02   += ks[r + (R) +  2];                                            \
+    X03   += ks[r + (R) +  3];                                            \
+    X04   += ks[r + (R) +  4];                                            \
+    X05   += ks[r + (R) +  5];                                            \
+    X06   += ks[r + (R) +  6];                                            \
+    X07   += ks[r + (R) +  7];                                            \
+    X08   += ks[r + (R) +  8];                                            \
+    X09   += ks[r + (R) +  9];                                            \
+    X10   += ks[r + (R) + 10];                                            \
+    X11   += ks[r + (R) + 11];                                            \
+    X12   += ks[r + (R) + 12];                                            \
+    X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
+    X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
+    X15   += ks[r + (R) + 15] +         r + (R);                          \
+    ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
+    ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
+    Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
+#endif  
+        {
+#define R1024_8_rounds(R)    /* do 8 full rounds */                               \
+        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
+        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
+        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
+        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
+        I1024(2*(R));                                                             \
+        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
+        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
+        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
+        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
+        I1024(2*(R)+1);
+
+        R1024_8_rounds(0);
+
+#define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
+
+  #if   R1024_Unroll_R(1)
+        R1024_8_rounds(1);
+  #endif
+  #if   R1024_Unroll_R(2)
+        R1024_8_rounds(2);
+  #endif
+  #if   R1024_Unroll_R(3)
+        R1024_8_rounds(3);
+  #endif
+  #if   R1024_Unroll_R(4)
+        R1024_8_rounds(4);
+  #endif
+  #if   R1024_Unroll_R(5)
+        R1024_8_rounds(5);
+  #endif
+  #if   R1024_Unroll_R(6)
+        R1024_8_rounds(6);
+  #endif
+  #if   R1024_Unroll_R(7)
+        R1024_8_rounds(7);
+  #endif
+  #if   R1024_Unroll_R(8)
+        R1024_8_rounds(8);
+  #endif
+  #if   R1024_Unroll_R(9)
+        R1024_8_rounds(9);
+  #endif
+  #if   R1024_Unroll_R(10)
+        R1024_8_rounds(10);
+  #endif
+  #if   R1024_Unroll_R(11)
+        R1024_8_rounds(11);
+  #endif
+  #if   R1024_Unroll_R(12)
+        R1024_8_rounds(12);
+  #endif
+  #if   R1024_Unroll_R(13)
+        R1024_8_rounds(13);
+  #endif
+  #if   R1024_Unroll_R(14)
+        R1024_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_1024 > 14)
+#error  "need more unrolling in Skein_1024_Process_Block"
+  #endif
+        }
+        /* do the final "feedforward" xor, update context chaining vars */
+
+        ctx->X[0] = X00 ^ w[0];
+        ctx->X[1] = X01 ^ w[1];
+        ctx->X[2] = X02 ^ w[2];
+        ctx->X[3] = X03 ^ w[3];
+        ctx->X[4] = X04 ^ w[4];
+        ctx->X[5] = X05 ^ w[5];
+        ctx->X[6] = X06 ^ w[6];
+        ctx->X[7] = X07 ^ w[7];
+        ctx->X[8] = X08 ^ w[8];
+        ctx->X[9] = X09 ^ w[9];
+        ctx->X[10] = X10 ^ w[10];
+        ctx->X[11] = X11 ^ w[11];
+        ctx->X[12] = X12 ^ w[12];
+        ctx->X[13] = X13 ^ w[13];
+        ctx->X[14] = X14 ^ w[14];
+        ctx->X[15] = X15 ^ w[15];
+
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+        
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        blkPtr += SKEIN1024_BLOCK_BYTES;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein1024_Process_Block_CodeSize(void)
+    {
+    return ((u8 *) Skein1024_Process_Block_CodeSize) -
+           ((u8 *) Skein1024_Process_Block);
+    }
+unsigned int Skein1024_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_1024;
+    }
+#endif
+#endif
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 13/22] staging: crypto: skein: fix leading whitespace
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (11 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 12/22] staging: crypto: skein: dos2unix, remove executable perms Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 14/22] staging: crypto: skein: remove trailing whitespace Jason Cooper
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        |  136 +-
 drivers/staging/skein/include/skeinApi.h     |  284 +--
 drivers/staging/skein/include/skein_iv.h     |  276 +--
 drivers/staging/skein/include/threefishApi.h |  230 +--
 drivers/staging/skein/skein.c                | 1126 +++++------
 drivers/staging/skein/skeinApi.c             |  320 +--
 drivers/staging/skein/skeinBlockNo3F.c       |  286 +--
 drivers/staging/skein/skein_block.c          | 1012 +++++-----
 drivers/staging/skein/threefish1024Block.c   | 2740 +++++++++++++-------------
 drivers/staging/skein/threefish256Block.c    |  639 +++---
 drivers/staging/skein/threefish512Block.c    | 1254 ++++++------
 drivers/staging/skein/threefishApi.c         |  102 +-
 12 files changed, 4200 insertions(+), 4205 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 18bb15824e41..906bcee41c39 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -38,11 +38,11 @@
 #define Skein_Swap64(w64)  (w64)
 
 enum
-    {
-    SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
-    SKEIN_FAIL            =      1,
-    SKEIN_BAD_HASHLEN     =      2
-    };
+	{
+	SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
+	SKEIN_FAIL            =      1,
+	SKEIN_BAD_HASHLEN     =      2
+	};
 
 #define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
 
@@ -64,32 +64,32 @@ enum
 #define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
 
 struct skein_ctx_hdr
-    {
-    size_t  hashBitLen;                      /* size of hash result, in bits */
-    size_t  bCnt;                            /* current byte count in buffer b[] */
-    u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
-    };
+	{
+	size_t  hashBitLen;                      /* size of hash result, in bits */
+	size_t  bCnt;                            /* current byte count in buffer b[] */
+	u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+	};
 
 struct skein_256_ctx                               /*  256-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
+	{
+	struct skein_ctx_hdr h;                      /* common header context variables */
+	u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
+	u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	};
 
 struct skein_512_ctx                             /*  512-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
+	{
+	struct skein_ctx_hdr h;                      /* common header context variables */
+	u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
+	u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	};
 
 struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
+	{
+	struct skein_ctx_hdr h;                      /* common header context variables */
+	u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
+	u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	};
 
 /*   Skein APIs for (incremental) "straight hashing" */
 int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
@@ -150,18 +150,18 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 
 /* tweak word T[1]: bit field starting positions */
 #define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
-                                
+
 #define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
 #define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
 #define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
 #define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
 #define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
-                                
+
 /* tweak word T[1]: flag bit definition(s) */
 #define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
 #define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
 #define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
-                                
+
 /* tweak word T[1]: tree level bit field mask */
 #define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
 #define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
@@ -213,9 +213,9 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
 
 #define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
-    ((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
-     (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
-     (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
+	((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+	 (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+	 (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
 
 #define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
 
@@ -233,17 +233,17 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 
 /* set both tweak words at once */
 #define Skein_Set_T0_T1(ctxPtr, T0, T1)           \
-    {                                           \
-    Skein_Set_T0(ctxPtr, (T0));                  \
-    Skein_Set_T1(ctxPtr, (T1));                  \
-    }
+	{                                           \
+	Skein_Set_T0(ctxPtr, (T0));                  \
+	Skein_Set_T1(ctxPtr, (T1));                  \
+	}
 
 #define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
-    Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
+	Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
 
 /* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
 #define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
-    { Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
+	{ Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
 
 #define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
 #define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
@@ -270,37 +270,37 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 ** Skein block function constants (shared across Ref and Opt code)
 ******************************************************************/
 enum    
-    {   
-        /* Skein_256 round rotation constants */
-    R_256_0_0 = 14, R_256_0_1 = 16,
-    R_256_1_0 = 52, R_256_1_1 = 57,
-    R_256_2_0 = 23, R_256_2_1 = 40,
-    R_256_3_0 =  5, R_256_3_1 = 37,
-    R_256_4_0 = 25, R_256_4_1 = 33,
-    R_256_5_0 = 46, R_256_5_1 = 12,
-    R_256_6_0 = 58, R_256_6_1 = 22,
-    R_256_7_0 = 32, R_256_7_1 = 32,
-
-        /* Skein_512 round rotation constants */
-    R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
-    R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
-    R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
-    R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
-    R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
-    R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
-    R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
-    R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
-
-        /* Skein1024 round rotation constants */
-    R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
-    R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
-    R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
-    R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
-    R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
-    R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
-    R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
-    R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
-    };
+	{   
+	    /* Skein_256 round rotation constants */
+	R_256_0_0 = 14, R_256_0_1 = 16,
+	R_256_1_0 = 52, R_256_1_1 = 57,
+	R_256_2_0 = 23, R_256_2_1 = 40,
+	R_256_3_0 =  5, R_256_3_1 = 37,
+	R_256_4_0 = 25, R_256_4_1 = 33,
+	R_256_5_0 = 46, R_256_5_1 = 12,
+	R_256_6_0 = 58, R_256_6_1 = 22,
+	R_256_7_0 = 32, R_256_7_1 = 32,
+
+	    /* Skein_512 round rotation constants */
+	R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
+	R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
+	R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
+	R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
+	R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
+	R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
+	R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
+	R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
+
+	    /* Skein1024 round rotation constants */
+	R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
+	R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
+	R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
+	R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
+	R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
+	R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
+	R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
+	R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
+	};
 
 #ifndef SKEIN_ROUNDS
 #define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 2c52797918cf..0d7d59eff460 100644
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -81,148 +81,148 @@ OTHER DEALINGS IN THE SOFTWARE.
 #include <linux/types.h>
 #include <skein.h>
 
-    /**
-     * Which Skein size to use
-     */
-    enum skein_size {
-        Skein256 = 256,     /*!< Skein with 256 bit state */
-        Skein512 = 512,     /*!< Skein with 512 bit state */
-        Skein1024 = 1024    /*!< Skein with 1024 bit state */
-    };
-
-    /**
-     * Context for Skein.
-     *
-     * This structure was setup with some know-how of the internal
-     * Skein structures, in particular ordering of header and size dependent
-     * variables. If Skein implementation changes this, then adapt these
-     * structures as well.
-     */
-    struct skein_ctx {
-        u64 skeinSize;
-        u64  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
-        union {
-            struct skein_ctx_hdr h;
-            struct skein_256_ctx s256;
-            struct skein_512_ctx s512;
-            struct skein1024_ctx s1024;
-        } m;
-    };
-
-    /**
-     * Prepare a Skein context.
-     * 
-     * An application must call this function before it can use the Skein
-     * context. The functions clears memory and initializes size dependent
-     * variables.
-     *
-     * @param ctx
-     *     Pointer to a Skein context.
-     * @param size
-     *     Which Skein size to use.
-     * @return
-     *     SKEIN_SUCESS of SKEIN_FAIL
-     */
-    int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size);
-
-    /**
-     * Initialize a Skein context.
-     *
-     * Initializes the context with this data and saves the resulting Skein 
-     * state variables for further use.
-     *
-     * @param ctx
-     *     Pointer to a Skein context.
-     * @param hashBitLen
-     *     Number of MAC hash bits to compute
-     * @return
-     *     SKEIN_SUCESS of SKEIN_FAIL
-     * @see skeinReset
-     */
-    int skeinInit(struct skein_ctx *ctx, size_t hashBitLen);
-
-    /**
-     * Resets a Skein context for further use.
-     * 
-     * Restores the saved chaining variables to reset the Skein context. 
-     * Thus applications can reuse the same setup to  process several 
-     * messages. This saves a complete Skein initialization cycle.
-     * 
-     * @param ctx
-     *     Pointer to a pre-initialized Skein MAC context
-     */
-    void skeinReset(struct skein_ctx *ctx);
-    
-    /**
-     * Initializes a Skein context for MAC usage.
-     * 
-     * Initializes the context with this data and saves the resulting Skein 
-     * state variables for further use.
-     *
-     * Applications call the normal Skein functions to update the MAC and
-     * get the final result.
-     *
-     * @param ctx
-     *     Pointer to an empty or preinitialized Skein MAC context
-     * @param key
-     *     Pointer to key bytes or NULL
-     * @param keyLen
-     *     Length of the key in bytes or zero
-     * @param hashBitLen
-     *     Number of MAC hash bits to compute
-     * @return
-     *     SKEIN_SUCESS of SKEIN_FAIL
-     */
-    int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
-                     size_t hashBitLen);
-
-    /**
-     * Update Skein with the next part of the message.
-     *
-     * @param ctx
-     *     Pointer to initialized Skein context
-     * @param msg
-     *     Pointer to the message.
-     * @param msgByteCnt
-     *     Length of the message in @b bytes
-     * @return
-     *     Success or error code.
-     */
-    int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
-                    size_t msgByteCnt);
-
-    /**
-     * Update the hash with a message bit string.
-     *
-     * Skein can handle data not only as bytes but also as bit strings of
-     * arbitrary length (up to its maximum design size).
-     *
-     * @param ctx
-     *     Pointer to initialized Skein context
-     * @param msg
-     *     Pointer to the message.
-     * @param msgBitCnt
-     *     Length of the message in @b bits.
-     */
-    int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
-                        size_t msgBitCnt);
-
-    /**
-     * Finalize Skein and return the hash.
-     * 
-     * Before an application can reuse a Skein setup the application must
-     * reset the Skein context.
-     *
-     * @param ctx
-     *     Pointer to initialized Skein context
-     * @param hash
-     *     Pointer to buffer that receives the hash. The buffer must be large
-     *     enough to store @c hashBitLen bits.
-     * @return
-     *     Success or error code.
-     * @see skeinReset
-     */
-    int skeinFinal(struct skein_ctx *ctx, u8 *hash);
+/**
+ * Which Skein size to use
+ */
+enum skein_size {
+	Skein256 = 256,     /*!< Skein with 256 bit state */
+	Skein512 = 512,     /*!< Skein with 512 bit state */
+	Skein1024 = 1024    /*!< Skein with 1024 bit state */
+};
+
+/**
+ * Context for Skein.
+ *
+ * This structure was setup with some know-how of the internal
+ * Skein structures, in particular ordering of header and size dependent
+ * variables. If Skein implementation changes this, then adapt these
+ * structures as well.
+ */
+struct skein_ctx {
+	u64 skeinSize;
+	u64  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
+	union {
+		struct skein_ctx_hdr h;
+		struct skein_256_ctx s256;
+		struct skein_512_ctx s512;
+		struct skein1024_ctx s1024;
+	} m;
+};
+
+/**
+ * Prepare a Skein context.
+ * 
+ * An application must call this function before it can use the Skein
+ * context. The functions clears memory and initializes size dependent
+ * variables.
+ *
+ * @param ctx
+ *     Pointer to a Skein context.
+ * @param size
+ *     Which Skein size to use.
+ * @return
+ *     SKEIN_SUCESS of SKEIN_FAIL
+ */
+int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size);
+
+/**
+ * Initialize a Skein context.
+ *
+ * Initializes the context with this data and saves the resulting Skein 
+ * state variables for further use.
+ *
+ * @param ctx
+ *     Pointer to a Skein context.
+ * @param hashBitLen
+ *     Number of MAC hash bits to compute
+ * @return
+ *     SKEIN_SUCESS of SKEIN_FAIL
+ * @see skeinReset
+ */
+int skeinInit(struct skein_ctx *ctx, size_t hashBitLen);
+
+/**
+ * Resets a Skein context for further use.
+ * 
+ * Restores the saved chaining variables to reset the Skein context. 
+ * Thus applications can reuse the same setup to  process several 
+ * messages. This saves a complete Skein initialization cycle.
+ * 
+ * @param ctx
+ *     Pointer to a pre-initialized Skein MAC context
+ */
+void skeinReset(struct skein_ctx *ctx);
+
+/**
+ * Initializes a Skein context for MAC usage.
+ * 
+ * Initializes the context with this data and saves the resulting Skein 
+ * state variables for further use.
+ *
+ * Applications call the normal Skein functions to update the MAC and
+ * get the final result.
+ *
+ * @param ctx
+ *     Pointer to an empty or preinitialized Skein MAC context
+ * @param key
+ *     Pointer to key bytes or NULL
+ * @param keyLen
+ *     Length of the key in bytes or zero
+ * @param hashBitLen
+ *     Number of MAC hash bits to compute
+ * @return
+ *     SKEIN_SUCESS of SKEIN_FAIL
+ */
+int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
+		 size_t hashBitLen);
+
+/**
+ * Update Skein with the next part of the message.
+ *
+ * @param ctx
+ *     Pointer to initialized Skein context
+ * @param msg
+ *     Pointer to the message.
+ * @param msgByteCnt
+ *     Length of the message in @b bytes
+ * @return
+ *     Success or error code.
+ */
+int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
+		size_t msgByteCnt);
+
+/**
+ * Update the hash with a message bit string.
+ *
+ * Skein can handle data not only as bytes but also as bit strings of
+ * arbitrary length (up to its maximum design size).
+ *
+ * @param ctx
+ *     Pointer to initialized Skein context
+ * @param msg
+ *     Pointer to the message.
+ * @param msgBitCnt
+ *     Length of the message in @b bits.
+ */
+int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
+		    size_t msgBitCnt);
+
+/**
+ * Finalize Skein and return the hash.
+ * 
+ * Before an application can reuse a Skein setup the application must
+ * reset the Skein context.
+ *
+ * @param ctx
+ *     Pointer to initialized Skein context
+ * @param hash
+ *     Pointer to buffer that receives the hash. The buffer must be large
+ *     enough to store @c hashBitLen bits.
+ * @return
+ *     Success or error code.
+ * @see skeinReset
+ */
+int skeinFinal(struct skein_ctx *ctx, u8 *hash);
 
 /**
  * @}
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index 813bad528e3c..bbbba77c44d3 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -21,179 +21,179 @@
 
 /* blkSize =  256 bits. hashSize =  128 bits */
 const u64 SKEIN_256_IV_128[] =
-    {
-    MK_64(0xE1111906, 0x964D7260),
-    MK_64(0x883DAAA7, 0x7C8D811C),
-    MK_64(0x10080DF4, 0x91960F7A),
-    MK_64(0xCCF7DDE5, 0xB45BC1C2)
-    };
+	{
+	MK_64(0xE1111906, 0x964D7260),
+	MK_64(0x883DAAA7, 0x7C8D811C),
+	MK_64(0x10080DF4, 0x91960F7A),
+	MK_64(0xCCF7DDE5, 0xB45BC1C2)
+	};
 
 /* blkSize =  256 bits. hashSize =  160 bits */
 const u64 SKEIN_256_IV_160[] =
-    {
-    MK_64(0x14202314, 0x72825E98),
-    MK_64(0x2AC4E9A2, 0x5A77E590),
-    MK_64(0xD47A5856, 0x8838D63E),
-    MK_64(0x2DD2E496, 0x8586AB7D)
-    };
+	{
+	MK_64(0x14202314, 0x72825E98),
+	MK_64(0x2AC4E9A2, 0x5A77E590),
+	MK_64(0xD47A5856, 0x8838D63E),
+	MK_64(0x2DD2E496, 0x8586AB7D)
+	};
 
 /* blkSize =  256 bits. hashSize =  224 bits */
 const u64 SKEIN_256_IV_224[] =
-    {
-    MK_64(0xC6098A8C, 0x9AE5EA0B),
-    MK_64(0x876D5686, 0x08C5191C),
-    MK_64(0x99CB88D7, 0xD7F53884),
-    MK_64(0x384BDDB1, 0xAEDDB5DE)
-    };
+	{
+	MK_64(0xC6098A8C, 0x9AE5EA0B),
+	MK_64(0x876D5686, 0x08C5191C),
+	MK_64(0x99CB88D7, 0xD7F53884),
+	MK_64(0x384BDDB1, 0xAEDDB5DE)
+	};
 
 /* blkSize =  256 bits. hashSize =  256 bits */
 const u64 SKEIN_256_IV_256[] =
-    {
-    MK_64(0xFC9DA860, 0xD048B449),
-    MK_64(0x2FCA6647, 0x9FA7D833),
-    MK_64(0xB33BC389, 0x6656840F),
-    MK_64(0x6A54E920, 0xFDE8DA69)
-    };
+	{
+	MK_64(0xFC9DA860, 0xD048B449),
+	MK_64(0x2FCA6647, 0x9FA7D833),
+	MK_64(0xB33BC389, 0x6656840F),
+	MK_64(0x6A54E920, 0xFDE8DA69)
+	};
 
 /* blkSize =  512 bits. hashSize =  128 bits */
 const u64 SKEIN_512_IV_128[] =
-    {
-    MK_64(0xA8BC7BF3, 0x6FBF9F52),
-    MK_64(0x1E9872CE, 0xBD1AF0AA),
-    MK_64(0x309B1790, 0xB32190D3),
-    MK_64(0xBCFBB854, 0x3F94805C),
-    MK_64(0x0DA61BCD, 0x6E31B11B),
-    MK_64(0x1A18EBEA, 0xD46A32E3),
-    MK_64(0xA2CC5B18, 0xCE84AA82),
-    MK_64(0x6982AB28, 0x9D46982D)
-    };
+	{
+	MK_64(0xA8BC7BF3, 0x6FBF9F52),
+	MK_64(0x1E9872CE, 0xBD1AF0AA),
+	MK_64(0x309B1790, 0xB32190D3),
+	MK_64(0xBCFBB854, 0x3F94805C),
+	MK_64(0x0DA61BCD, 0x6E31B11B),
+	MK_64(0x1A18EBEA, 0xD46A32E3),
+	MK_64(0xA2CC5B18, 0xCE84AA82),
+	MK_64(0x6982AB28, 0x9D46982D)
+	};
 
 /* blkSize =  512 bits. hashSize =  160 bits */
 const u64 SKEIN_512_IV_160[] =
-    {
-    MK_64(0x28B81A2A, 0xE013BD91),
-    MK_64(0xC2F11668, 0xB5BDF78F),
-    MK_64(0x1760D8F3, 0xF6A56F12),
-    MK_64(0x4FB74758, 0x8239904F),
-    MK_64(0x21EDE07F, 0x7EAF5056),
-    MK_64(0xD908922E, 0x63ED70B8),
-    MK_64(0xB8EC76FF, 0xECCB52FA),
-    MK_64(0x01A47BB8, 0xA3F27A6E)
-    };
+	{
+	MK_64(0x28B81A2A, 0xE013BD91),
+	MK_64(0xC2F11668, 0xB5BDF78F),
+	MK_64(0x1760D8F3, 0xF6A56F12),
+	MK_64(0x4FB74758, 0x8239904F),
+	MK_64(0x21EDE07F, 0x7EAF5056),
+	MK_64(0xD908922E, 0x63ED70B8),
+	MK_64(0xB8EC76FF, 0xECCB52FA),
+	MK_64(0x01A47BB8, 0xA3F27A6E)
+	};
 
 /* blkSize =  512 bits. hashSize =  224 bits */
 const u64 SKEIN_512_IV_224[] =
-    {
-    MK_64(0xCCD06162, 0x48677224),
-    MK_64(0xCBA65CF3, 0xA92339EF),
-    MK_64(0x8CCD69D6, 0x52FF4B64),
-    MK_64(0x398AED7B, 0x3AB890B4),
-    MK_64(0x0F59D1B1, 0x457D2BD0),
-    MK_64(0x6776FE65, 0x75D4EB3D),
-    MK_64(0x99FBC70E, 0x997413E9),
-    MK_64(0x9E2CFCCF, 0xE1C41EF7)
-    };
+	{
+	MK_64(0xCCD06162, 0x48677224),
+	MK_64(0xCBA65CF3, 0xA92339EF),
+	MK_64(0x8CCD69D6, 0x52FF4B64),
+	MK_64(0x398AED7B, 0x3AB890B4),
+	MK_64(0x0F59D1B1, 0x457D2BD0),
+	MK_64(0x6776FE65, 0x75D4EB3D),
+	MK_64(0x99FBC70E, 0x997413E9),
+	MK_64(0x9E2CFCCF, 0xE1C41EF7)
+	};
 
 /* blkSize =  512 bits. hashSize =  256 bits */
 const u64 SKEIN_512_IV_256[] =
-    {
-    MK_64(0xCCD044A1, 0x2FDB3E13),
-    MK_64(0xE8359030, 0x1A79A9EB),
-    MK_64(0x55AEA061, 0x4F816E6F),
-    MK_64(0x2A2767A4, 0xAE9B94DB),
-    MK_64(0xEC06025E, 0x74DD7683),
-    MK_64(0xE7A436CD, 0xC4746251),
-    MK_64(0xC36FBAF9, 0x393AD185),
-    MK_64(0x3EEDBA18, 0x33EDFC13)
-    };
+	{
+	MK_64(0xCCD044A1, 0x2FDB3E13),
+	MK_64(0xE8359030, 0x1A79A9EB),
+	MK_64(0x55AEA061, 0x4F816E6F),
+	MK_64(0x2A2767A4, 0xAE9B94DB),
+	MK_64(0xEC06025E, 0x74DD7683),
+	MK_64(0xE7A436CD, 0xC4746251),
+	MK_64(0xC36FBAF9, 0x393AD185),
+	MK_64(0x3EEDBA18, 0x33EDFC13)
+	};
 
 /* blkSize =  512 bits. hashSize =  384 bits */
 const u64 SKEIN_512_IV_384[] =
-    {
-    MK_64(0xA3F6C6BF, 0x3A75EF5F),
-    MK_64(0xB0FEF9CC, 0xFD84FAA4),
-    MK_64(0x9D77DD66, 0x3D770CFE),
-    MK_64(0xD798CBF3, 0xB468FDDA),
-    MK_64(0x1BC4A666, 0x8A0E4465),
-    MK_64(0x7ED7D434, 0xE5807407),
-    MK_64(0x548FC1AC, 0xD4EC44D6),
-    MK_64(0x266E1754, 0x6AA18FF8)
-    };
+	{
+	MK_64(0xA3F6C6BF, 0x3A75EF5F),
+	MK_64(0xB0FEF9CC, 0xFD84FAA4),
+	MK_64(0x9D77DD66, 0x3D770CFE),
+	MK_64(0xD798CBF3, 0xB468FDDA),
+	MK_64(0x1BC4A666, 0x8A0E4465),
+	MK_64(0x7ED7D434, 0xE5807407),
+	MK_64(0x548FC1AC, 0xD4EC44D6),
+	MK_64(0x266E1754, 0x6AA18FF8)
+	};
 
 /* blkSize =  512 bits. hashSize =  512 bits */
 const u64 SKEIN_512_IV_512[] =
-    {
-    MK_64(0x4903ADFF, 0x749C51CE),
-    MK_64(0x0D95DE39, 0x9746DF03),
-    MK_64(0x8FD19341, 0x27C79BCE),
-    MK_64(0x9A255629, 0xFF352CB1),
-    MK_64(0x5DB62599, 0xDF6CA7B0),
-    MK_64(0xEABE394C, 0xA9D5C3F4),
-    MK_64(0x991112C7, 0x1A75B523),
-    MK_64(0xAE18A40B, 0x660FCC33)
-    };
+	{
+	MK_64(0x4903ADFF, 0x749C51CE),
+	MK_64(0x0D95DE39, 0x9746DF03),
+	MK_64(0x8FD19341, 0x27C79BCE),
+	MK_64(0x9A255629, 0xFF352CB1),
+	MK_64(0x5DB62599, 0xDF6CA7B0),
+	MK_64(0xEABE394C, 0xA9D5C3F4),
+	MK_64(0x991112C7, 0x1A75B523),
+	MK_64(0xAE18A40B, 0x660FCC33)
+	};
 
 /* blkSize = 1024 bits. hashSize =  384 bits */
 const u64 SKEIN1024_IV_384[] =
-    {
-    MK_64(0x5102B6B8, 0xC1894A35),
-    MK_64(0xFEEBC9E3, 0xFE8AF11A),
-    MK_64(0x0C807F06, 0xE32BED71),
-    MK_64(0x60C13A52, 0xB41A91F6),
-    MK_64(0x9716D35D, 0xD4917C38),
-    MK_64(0xE780DF12, 0x6FD31D3A),
-    MK_64(0x797846B6, 0xC898303A),
-    MK_64(0xB172C2A8, 0xB3572A3B),
-    MK_64(0xC9BC8203, 0xA6104A6C),
-    MK_64(0x65909338, 0xD75624F4),
-    MK_64(0x94BCC568, 0x4B3F81A0),
-    MK_64(0x3EBBF51E, 0x10ECFD46),
-    MK_64(0x2DF50F0B, 0xEEB08542),
-    MK_64(0x3B5A6530, 0x0DBC6516),
-    MK_64(0x484B9CD2, 0x167BBCE1),
-    MK_64(0x2D136947, 0xD4CBAFEA)
-    };
+	{
+	MK_64(0x5102B6B8, 0xC1894A35),
+	MK_64(0xFEEBC9E3, 0xFE8AF11A),
+	MK_64(0x0C807F06, 0xE32BED71),
+	MK_64(0x60C13A52, 0xB41A91F6),
+	MK_64(0x9716D35D, 0xD4917C38),
+	MK_64(0xE780DF12, 0x6FD31D3A),
+	MK_64(0x797846B6, 0xC898303A),
+	MK_64(0xB172C2A8, 0xB3572A3B),
+	MK_64(0xC9BC8203, 0xA6104A6C),
+	MK_64(0x65909338, 0xD75624F4),
+	MK_64(0x94BCC568, 0x4B3F81A0),
+	MK_64(0x3EBBF51E, 0x10ECFD46),
+	MK_64(0x2DF50F0B, 0xEEB08542),
+	MK_64(0x3B5A6530, 0x0DBC6516),
+	MK_64(0x484B9CD2, 0x167BBCE1),
+	MK_64(0x2D136947, 0xD4CBAFEA)
+	};
 
 /* blkSize = 1024 bits. hashSize =  512 bits */
 const u64 SKEIN1024_IV_512[] =
-    {
-    MK_64(0xCAEC0E5D, 0x7C1B1B18),
-    MK_64(0xA01B0E04, 0x5F03E802),
-    MK_64(0x33840451, 0xED912885),
-    MK_64(0x374AFB04, 0xEAEC2E1C),
-    MK_64(0xDF25A0E2, 0x813581F7),
-    MK_64(0xE4004093, 0x8B12F9D2),
-    MK_64(0xA662D539, 0xC2ED39B6),
-    MK_64(0xFA8B85CF, 0x45D8C75A),
-    MK_64(0x8316ED8E, 0x29EDE796),
-    MK_64(0x053289C0, 0x2E9F91B8),
-    MK_64(0xC3F8EF1D, 0x6D518B73),
-    MK_64(0xBDCEC3C4, 0xD5EF332E),
-    MK_64(0x549A7E52, 0x22974487),
-    MK_64(0x67070872, 0x5B749816),
-    MK_64(0xB9CD28FB, 0xF0581BD1),
-    MK_64(0x0E2940B8, 0x15804974)
-    };
+	{
+	MK_64(0xCAEC0E5D, 0x7C1B1B18),
+	MK_64(0xA01B0E04, 0x5F03E802),
+	MK_64(0x33840451, 0xED912885),
+	MK_64(0x374AFB04, 0xEAEC2E1C),
+	MK_64(0xDF25A0E2, 0x813581F7),
+	MK_64(0xE4004093, 0x8B12F9D2),
+	MK_64(0xA662D539, 0xC2ED39B6),
+	MK_64(0xFA8B85CF, 0x45D8C75A),
+	MK_64(0x8316ED8E, 0x29EDE796),
+	MK_64(0x053289C0, 0x2E9F91B8),
+	MK_64(0xC3F8EF1D, 0x6D518B73),
+	MK_64(0xBDCEC3C4, 0xD5EF332E),
+	MK_64(0x549A7E52, 0x22974487),
+	MK_64(0x67070872, 0x5B749816),
+	MK_64(0xB9CD28FB, 0xF0581BD1),
+	MK_64(0x0E2940B8, 0x15804974)
+	};
 
 /* blkSize = 1024 bits. hashSize = 1024 bits */
 const u64 SKEIN1024_IV_1024[] =
-    {
-    MK_64(0xD593DA07, 0x41E72355),
-    MK_64(0x15B5E511, 0xAC73E00C),
-    MK_64(0x5180E5AE, 0xBAF2C4F0),
-    MK_64(0x03BD41D3, 0xFCBCAFAF),
-    MK_64(0x1CAEC6FD, 0x1983A898),
-    MK_64(0x6E510B8B, 0xCDD0589F),
-    MK_64(0x77E2BDFD, 0xC6394ADA),
-    MK_64(0xC11E1DB5, 0x24DCB0A3),
-    MK_64(0xD6D14AF9, 0xC6329AB5),
-    MK_64(0x6A9B0BFC, 0x6EB67E0D),
-    MK_64(0x9243C60D, 0xCCFF1332),
-    MK_64(0x1A1F1DDE, 0x743F02D4),
-    MK_64(0x0996753C, 0x10ED0BB8),
-    MK_64(0x6572DD22, 0xF2B4969A),
-    MK_64(0x61FD3062, 0xD00A579A),
-    MK_64(0x1DE0536E, 0x8682E539)
-    };
+	{
+	MK_64(0xD593DA07, 0x41E72355),
+	MK_64(0x15B5E511, 0xAC73E00C),
+	MK_64(0x5180E5AE, 0xBAF2C4F0),
+	MK_64(0x03BD41D3, 0xFCBCAFAF),
+	MK_64(0x1CAEC6FD, 0x1983A898),
+	MK_64(0x6E510B8B, 0xCDD0589F),
+	MK_64(0x77E2BDFD, 0xC6394ADA),
+	MK_64(0xC11E1DB5, 0x24DCB0A3),
+	MK_64(0xD6D14AF9, 0xC6329AB5),
+	MK_64(0x6A9B0BFC, 0x6EB67E0D),
+	MK_64(0x9243C60D, 0xCCFF1332),
+	MK_64(0x1A1F1DDE, 0x743F02D4),
+	MK_64(0x0996753C, 0x10ED0BB8),
+	MK_64(0x6572DD22, 0xF2B4969A),
+	MK_64(0x61FD3062, 0xD00A579A),
+	MK_64(0x1DE0536E, 0x8682E539)
+	};
 
 #endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 1f9e6e14f50b..199257e37813 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -33,125 +33,125 @@
 
 #define KeyScheduleConst 0x1BD11BDAA9FC1A22L
 
-    /**
-     * Which Threefish size to use
-     */
-    enum threefish_size {
-        Threefish256 = 256,     /*!< Skein with 256 bit state */
-        Threefish512 = 512,     /*!< Skein with 512 bit state */
-        Threefish1024 = 1024    /*!< Skein with 1024 bit state */
-    };
-    
-    /**
-     * Context for Threefish key and tweak words.
-     * 
-     * This structure was setup with some know-how of the internal
-     * Skein structures, in particular ordering of header and size dependent
-     * variables. If Skein implementation changes this, the adapt these
-     * structures as well.
-     */
-    struct threefish_key {
-        u64 stateSize;
-        u64 key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
-        u64 tweak[3];
-    };
+/**
+ * Which Threefish size to use
+ */
+enum threefish_size {
+	Threefish256 = 256,     /*!< Skein with 256 bit state */
+	Threefish512 = 512,     /*!< Skein with 512 bit state */
+	Threefish1024 = 1024    /*!< Skein with 1024 bit state */
+};
+
+/**
+ * Context for Threefish key and tweak words.
+ * 
+ * This structure was setup with some know-how of the internal
+ * Skein structures, in particular ordering of header and size dependent
+ * variables. If Skein implementation changes this, the adapt these
+ * structures as well.
+ */
+struct threefish_key {
+	u64 stateSize;
+	u64 key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
+	u64 tweak[3];
+};
+
+/**
+ * Set Threefish key and tweak data.
+ * 
+ * This function sets the key and tweak data for the Threefish cipher of
+ * the given size. The key data must have the same length (number of bits)
+ * as the state size 
+ *
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param size
+ *     Which Skein size to use.
+ * @param keyData
+ *     Pointer to the key words (word has 64 bits).
+ * @param tweak
+ *     Pointer to the two tweak words (word has 64 bits).
+ */
+void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize, u64 *keyData, u64 *tweak);
+
+/**
+ * Encrypt Threefisch block (bytes).
+ * 
+ * The buffer must have at least the same length (number of bits) aas the 
+ * state size for this key. The function uses the first @c stateSize bits
+ * of the input buffer, encrypts them and stores the result in the output
+ * buffer.
+ * 
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param in
+ *     Poionter to plaintext data buffer.
+ * @param out
+ *     Pointer to cipher buffer.
+ */
+void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
-    /**
-     * Set Threefish key and tweak data.
-     * 
-     * This function sets the key and tweak data for the Threefish cipher of
-     * the given size. The key data must have the same length (number of bits)
-     * as the state size 
-     *
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param size
-     *     Which Skein size to use.
-     * @param keyData
-     *     Pointer to the key words (word has 64 bits).
-     * @param tweak
-     *     Pointer to the two tweak words (word has 64 bits).
-     */
-    void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize, u64 *keyData, u64 *tweak);
-    
-    /**
-     * Encrypt Threefisch block (bytes).
-     * 
-     * The buffer must have at least the same length (number of bits) aas the 
-     * state size for this key. The function uses the first @c stateSize bits
-     * of the input buffer, encrypts them and stores the result in the output
-     * buffer.
-     * 
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param in
-     *     Poionter to plaintext data buffer.
-     * @param out
-     *     Pointer to cipher buffer.
-     */
-    void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
-    
-    /**
-     * Encrypt Threefisch block (words).
-     * 
-     * The buffer must have at least the same length (number of bits) aas the 
-     * state size for this key. The function uses the first @c stateSize bits
-     * of the input buffer, encrypts them and stores the result in the output
-     * buffer.
-     * 
-     * The wordsize ist set to 64 bits.
-     * 
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param in
-     *     Poionter to plaintext data buffer.
-     * @param out
-     *     Pointer to cipher buffer.
-     */
-    void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
+/**
+ * Encrypt Threefisch block (words).
+ * 
+ * The buffer must have at least the same length (number of bits) aas the 
+ * state size for this key. The function uses the first @c stateSize bits
+ * of the input buffer, encrypts them and stores the result in the output
+ * buffer.
+ * 
+ * The wordsize ist set to 64 bits.
+ * 
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param in
+ *     Poionter to plaintext data buffer.
+ * @param out
+ *     Pointer to cipher buffer.
+ */
+void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
 
-    /**
-     * Decrypt Threefisch block (bytes).
-     * 
-     * The buffer must have at least the same length (number of bits) aas the 
-     * state size for this key. The function uses the first @c stateSize bits
-     * of the input buffer, decrypts them and stores the result in the output
-     * buffer
-     * 
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param in
-     *     Poionter to cipher data buffer.
-     * @param out
-     *     Pointer to plaintext buffer.
-     */
-    void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
+/**
+ * Decrypt Threefisch block (bytes).
+ * 
+ * The buffer must have at least the same length (number of bits) aas the 
+ * state size for this key. The function uses the first @c stateSize bits
+ * of the input buffer, decrypts them and stores the result in the output
+ * buffer
+ * 
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param in
+ *     Poionter to cipher data buffer.
+ * @param out
+ *     Pointer to plaintext buffer.
+ */
+void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
-    /**
-     * Decrypt Threefisch block (words).
-     * 
-     * The buffer must have at least the same length (number of bits) aas the 
-     * state size for this key. The function uses the first @c stateSize bits
-     * of the input buffer, encrypts them and stores the result in the output
-     * buffer.
-     * 
-     * The wordsize ist set to 64 bits.
-     * 
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param in
-     *     Poionter to cipher data buffer.
-     * @param out
-     *     Pointer to plaintext buffer.
-     */
-    void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
+/**
+ * Decrypt Threefisch block (words).
+ * 
+ * The buffer must have at least the same length (number of bits) aas the 
+ * state size for this key. The function uses the first @c stateSize bits
+ * of the input buffer, encrypts them and stores the result in the output
+ * buffer.
+ * 
+ * The wordsize ist set to 64 bits.
+ * 
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param in
+ *     Poionter to cipher data buffer.
+ * @param out
+ *     Pointer to plaintext buffer.
+ */
+void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
 
-    void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index e2e5685157a0..3f0f32806181 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -28,49 +28,49 @@ void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, siz
 /* init the context for a straight hashing operation  */
 int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 {
-    union
-    {
-        u8  b[SKEIN_256_STATE_BYTES];
-        u64  w[SKEIN_256_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {             /* use pre-computed values, where available */
-    case  256:
-        memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
-        break;
-    case  224:
-        memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
-        break;
-    case  160:
-        memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
-        break;
-    case  128:
-        memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN_256_STATE_BYTES];
+		u64  w[SKEIN_256_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+	switch (hashBitLen)
+	{             /* use pre-computed values, where available */
+	case  256:
+		memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
+		break;
+	case  224:
+		memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
+		break;
+	case  160:
+		memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
+		break;
+	case  128:
+		memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
+		break;
+	default:
+		/* here if there is no precomputed IV value available */
+		/* build/process the config block, type == CONFIG (could be precomputed) */
+		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+		/* compute the initial chaining values from config block */
+		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+		break;
+	}
+	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
@@ -78,133 +78,133 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 /* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
 int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-    union
-    {
-        u8  b[SKEIN_256_STATE_BYTES];
-        u64  w[SKEIN_256_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(256, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN_256_STATE_BYTES];
+		u64  w[SKEIN_256_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+	/* compute the initial chaining values ctx->X[], based on key */
+	if (keyBytes == 0)                          /* is there a key? */
+	{
+		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+	}
+	else                                        /* here to pre-process a key */
+	{
+		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		/* do a mini-Init right here */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+		Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
+		Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+	}
+	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
+	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	Skein_Start_New_Type(ctx, CFG_FINAL);
+
+	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+	Skein_Show_Key(256, &ctx->h, key, keyBytes);
+
+	/* compute the initial chaining values from config block */
+	Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+	/* The chaining vars ctx->X are now initialized */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
 int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
-            Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
-            msg        += n * SKEIN_256_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
+	size_t n;
+
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* process full blocks, if any */
+	if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
+	{
+		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		{
+			n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			if (n)
+			{
+				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+				msgByteCnt  -= n;
+				msg         += n;
+				ctx->h.bCnt += n;
+			}
+			Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
+			Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
+			ctx->h.bCnt = 0;
+		}
+		/* now process any remaining full blocks, directly from input message data */
+		if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
+		{
+			n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
+			Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
+			msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
+			msg        += n * SKEIN_256_BLOCK_BYTES;
+		}
+		Skein_assert(ctx->h.bCnt == 0);
+	}
+
+	/* copy any remaining source message data bytes into b[] */
+	if (msgByteCnt)
+	{
+		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
+		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+		ctx->h.bCnt += msgByteCnt;
+	}
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
 int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_256_BLOCK_BYTES)
-            n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN_256_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+
+	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN_256_BLOCK_BYTES)
+			n  = SKEIN_256_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /*****************************************************************/
@@ -215,50 +215,50 @@ int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 /* init the context for a straight hashing operation  */
 int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 {
-    union
-    {
-        u8  b[SKEIN_512_STATE_BYTES];
-        u64  w[SKEIN_512_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {             /* use pre-computed values, where available */
-    case  512:
-        memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
-        break;
-    case  384:
-        memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
-        break;
-    case  256:
-        memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
-        break;
-    case  224:
-        memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN_512_STATE_BYTES];
+		u64  w[SKEIN_512_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+	switch (hashBitLen)
+	{             /* use pre-computed values, where available */
+	case  512:
+		memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
+		break;
+	case  384:
+		memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
+		break;
+	case  256:
+		memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
+		break;
+	case  224:
+		memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
+		break;
+	default:
+		/* here if there is no precomputed IV value available */
+		/* build/process the config block, type == CONFIG (could be precomputed) */
+		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+		/* compute the initial chaining values from config block */
+		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+		break;
+	}
+
+	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
@@ -266,133 +266,133 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 /* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
 int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-    union
-    {
-        u8  b[SKEIN_512_STATE_BYTES];
-        u64  w[SKEIN_512_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(512, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN_512_STATE_BYTES];
+		u64  w[SKEIN_512_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+	/* compute the initial chaining values ctx->X[], based on key */
+	if (keyBytes == 0)                          /* is there a key? */
+	{
+		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+	}
+	else                                        /* here to pre-process a key */
+	{
+		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		/* do a mini-Init right here */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+		Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
+		Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+	}
+	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
+	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	Skein_Start_New_Type(ctx, CFG_FINAL);
+
+	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+	Skein_Show_Key(512, &ctx->h, key, keyBytes);
+
+	/* compute the initial chaining values from config block */
+	Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+	/* The chaining vars ctx->X are now initialized */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
 int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
-            Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
-            msg        += n * SKEIN_512_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
+	size_t n;
+
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* process full blocks, if any */
+	if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
+	{
+		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		{
+			n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			if (n)
+			{
+				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+				msgByteCnt  -= n;
+				msg         += n;
+				ctx->h.bCnt += n;
+			}
+			Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
+			Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
+			ctx->h.bCnt = 0;
+		}
+		/* now process any remaining full blocks, directly from input message data */
+		if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
+		{
+			n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
+			Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
+			msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
+			msg        += n * SKEIN_512_BLOCK_BYTES;
+		}
+		Skein_assert(ctx->h.bCnt == 0);
+	}
+
+	/* copy any remaining source message data bytes into b[] */
+	if (msgByteCnt)
+	{
+		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
+		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+		ctx->h.bCnt += msgByteCnt;
+	}
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
 int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_512_BLOCK_BYTES)
-            n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN_512_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+
+	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN_512_BLOCK_BYTES)
+			n  = SKEIN_512_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /*****************************************************************/
@@ -403,47 +403,47 @@ int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 /* init the context for a straight hashing operation  */
 int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 {
-    union
-    {
-        u8  b[SKEIN1024_STATE_BYTES];
-        u64  w[SKEIN1024_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {              /* use pre-computed values, where available */
-    case  512:
-        memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
-        break;
-    case  384:
-        memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
-        break;
-    case 1024:
-        memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN1024_STATE_BYTES];
+		u64  w[SKEIN1024_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+	switch (hashBitLen)
+	{              /* use pre-computed values, where available */
+	case  512:
+		memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
+		break;
+	case  384:
+		memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
+		break;
+	case 1024:
+		memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
+		break;
+	default:
+		/* here if there is no precomputed IV value available */
+		/* build/process the config block, type == CONFIG (could be precomputed) */
+		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+		/* compute the initial chaining values from config block */
+		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+		break;
+	}
+
+	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
@@ -451,133 +451,133 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 /* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
 int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-    union
-    {
-        u8  b[SKEIN1024_STATE_BYTES];
-        u64  w[SKEIN1024_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(1024, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN1024_STATE_BYTES];
+		u64  w[SKEIN1024_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+	/* compute the initial chaining values ctx->X[], based on key */
+	if (keyBytes == 0)                          /* is there a key? */
+	{
+		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+	}
+	else                                        /* here to pre-process a key */
+	{
+		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		/* do a mini-Init right here */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+		Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
+		Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+	}
+	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
+	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	Skein_Start_New_Type(ctx, CFG_FINAL);
+
+	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+	Skein_Show_Key(1024, &ctx->h, key, keyBytes);
+
+	/* compute the initial chaining values from config block */
+	Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+	/* The chaining vars ctx->X are now initialized */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
 int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
-            Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
-            msg        += n * SKEIN1024_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
+	size_t n;
+
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* process full blocks, if any */
+	if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
+	{
+		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		{
+			n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			if (n)
+			{
+				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+				msgByteCnt  -= n;
+				msg         += n;
+				ctx->h.bCnt += n;
+			}
+			Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
+			Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
+			ctx->h.bCnt = 0;
+		}
+		/* now process any remaining full blocks, directly from input message data */
+		if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
+		{
+			n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
+			Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
+			msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
+			msg        += n * SKEIN1024_BLOCK_BYTES;
+		}
+		Skein_assert(ctx->h.bCnt == 0);
+	}
+
+	/* copy any remaining source message data bytes into b[] */
+	if (msgByteCnt)
+	{
+		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
+		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+		ctx->h.bCnt += msgByteCnt;
+	}
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
 int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN1024_BLOCK_BYTES)
-            n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN1024_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+
+	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN1024_BLOCK_BYTES)
+			n  = SKEIN1024_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /**************** Functions to support MAC/tree hashing ***************/
@@ -587,48 +587,48 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
 
-    return SKEIN_SUCCESS;
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
 
-    return SKEIN_SUCCESS;
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
 
-    return SKEIN_SUCCESS;
+	return SKEIN_SUCCESS;
 }
 
 #if SKEIN_TREE_HASH
@@ -636,86 +636,86 @@ int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 /* just do the OUTPUT stage                                       */
 int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_256_BLOCK_BYTES)
-            n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN_256_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN_256_BLOCK_BYTES)
+			n  = SKEIN_256_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
 int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_512_BLOCK_BYTES)
-            n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN_512_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN_512_BLOCK_BYTES)
+			n  = SKEIN_512_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
 int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN1024_BLOCK_BYTES)
-            n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN1024_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN1024_BLOCK_BYTES)
+			n  = SKEIN1024_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 #endif
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index a3f471be8db3..3ebb1d60ef93 100644
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -29,191 +29,191 @@ OTHER DEALINGS IN THE SOFTWARE.
 
 int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size)
 {
-    Skein_Assert(ctx && size, SKEIN_FAIL);
+	Skein_Assert(ctx && size, SKEIN_FAIL);
 
-    memset(ctx , 0, sizeof(struct skein_ctx));
-    ctx->skeinSize = size;
+	memset(ctx , 0, sizeof(struct skein_ctx));
+	ctx->skeinSize = size;
 
-    return SKEIN_SUCCESS;
+	return SKEIN_SUCCESS;
 }
 
 int skeinInit(struct skein_ctx *ctx, size_t hashBitLen)
 {
-    int ret = SKEIN_FAIL;
-    size_t Xlen = 0;
-    u64 *X = NULL;
-    u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
-
-    Skein_Assert(ctx, SKEIN_FAIL);
-    /*
-     * The following two lines rely of the fact that the real Skein contexts are
-     * a union in out context and thus have tha maximum memory available.
-     * The beauty of C :-) .
-     */
-    X = ctx->m.s256.X;
-    Xlen = ctx->skeinSize/8;
-    /*
-     * If size is the same and hash bit length is zero then reuse
-     * the save chaining variables.
-     */
-    switch (ctx->skeinSize) {
-    case Skein256:
-        ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
-                                treeInfo, NULL, 0);
-        break;
-    case Skein512:
-        ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
-                                treeInfo, NULL, 0);
-        break;
-    case Skein1024:
-        ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
-                                treeInfo, NULL, 0);
-        break;
-    }
-
-    if (ret == SKEIN_SUCCESS) {
-        /* Save chaining variables for this combination of size and hashBitLen */
-        memcpy(ctx->XSave, X, Xlen);
-    }
-    return ret;
+	int ret = SKEIN_FAIL;
+	size_t Xlen = 0;
+	u64 *X = NULL;
+	u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+
+	Skein_Assert(ctx, SKEIN_FAIL);
+	/*
+	 * The following two lines rely of the fact that the real Skein contexts are
+	 * a union in out context and thus have tha maximum memory available.
+	 * The beauty of C :-) .
+	 */
+	X = ctx->m.s256.X;
+	Xlen = ctx->skeinSize/8;
+	/*
+	 * If size is the same and hash bit length is zero then reuse
+	 * the save chaining variables.
+	 */
+	switch (ctx->skeinSize) {
+	case Skein256:
+		ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
+					treeInfo, NULL, 0);
+		break;
+	case Skein512:
+		ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
+					treeInfo, NULL, 0);
+		break;
+	case Skein1024:
+		ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
+					treeInfo, NULL, 0);
+		break;
+	}
+
+	if (ret == SKEIN_SUCCESS) {
+		/* Save chaining variables for this combination of size and hashBitLen */
+		memcpy(ctx->XSave, X, Xlen);
+	}
+	return ret;
 }
 
 int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
-                 size_t hashBitLen)
+		size_t hashBitLen)
 {
-    int ret = SKEIN_FAIL;
-    u64 *X = NULL;
-    size_t Xlen = 0;
-    u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
-
-    Skein_Assert(ctx, SKEIN_FAIL);
-
-    X = ctx->m.s256.X;
-    Xlen = ctx->skeinSize/8;
-
-    Skein_Assert(hashBitLen, SKEIN_BAD_HASHLEN);
-
-    switch (ctx->skeinSize) {
-    case Skein256:
-        ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
-                                treeInfo,
-                                (const u8 *)key, keyLen);
-
-        break;
-    case Skein512:
-        ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
-                                treeInfo,
-                                (const u8 *)key, keyLen);
-        break;
-    case Skein1024:
-        ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
-                                treeInfo,
-                                (const u8 *)key, keyLen);
-
-        break;
-    }
-    if (ret == SKEIN_SUCCESS) {
-        /* Save chaining variables for this combination of key, keyLen, hashBitLen */
-        memcpy(ctx->XSave, X, Xlen);
-    }
-    return ret;
+	int ret = SKEIN_FAIL;
+	u64 *X = NULL;
+	size_t Xlen = 0;
+	u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+
+	Skein_Assert(ctx, SKEIN_FAIL);
+
+	X = ctx->m.s256.X;
+	Xlen = ctx->skeinSize/8;
+
+	Skein_Assert(hashBitLen, SKEIN_BAD_HASHLEN);
+
+	switch (ctx->skeinSize) {
+	case Skein256:
+		ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
+					treeInfo,
+					(const u8 *)key, keyLen);
+
+		break;
+	case Skein512:
+		ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
+					treeInfo,
+					(const u8 *)key, keyLen);
+		break;
+	case Skein1024:
+		ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
+					treeInfo,
+					(const u8 *)key, keyLen);
+
+		break;
+	}
+	if (ret == SKEIN_SUCCESS) {
+		/* Save chaining variables for this combination of key, keyLen, hashBitLen */
+		memcpy(ctx->XSave, X, Xlen);
+	}
+	return ret;
 }
 
 void skeinReset(struct skein_ctx *ctx)
 {
-    size_t Xlen = 0;
-    u64 *X = NULL;
-
-    /*
-     * The following two lines rely of the fact that the real Skein contexts are
-     * a union in out context and thus have tha maximum memory available.
-     * The beautiy of C :-) .
-     */
-    X = ctx->m.s256.X;
-    Xlen = ctx->skeinSize/8;
-    /* Restore the chaing variable, reset byte counter */
-    memcpy(X, ctx->XSave, Xlen);
-
-    /* Setup context to process the message */
-    Skein_Start_New_Type(&ctx->m, MSG);
+	size_t Xlen = 0;
+	u64 *X = NULL;
+
+	/*
+	 * The following two lines rely of the fact that the real Skein contexts are
+	 * a union in out context and thus have tha maximum memory available.
+	 * The beautiy of C :-) .
+	 */
+	X = ctx->m.s256.X;
+	Xlen = ctx->skeinSize/8;
+	/* Restore the chaing variable, reset byte counter */
+	memcpy(X, ctx->XSave, Xlen);
+
+	/* Setup context to process the message */
+	Skein_Start_New_Type(&ctx->m, MSG);
 }
 
 int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
-                size_t msgByteCnt)
+		size_t msgByteCnt)
 {
-    int ret = SKEIN_FAIL;
-    Skein_Assert(ctx, SKEIN_FAIL);
-
-    switch (ctx->skeinSize) {
-    case Skein256:
-        ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg, msgByteCnt);
-        break;
-    case Skein512:
-        ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg, msgByteCnt);
-        break;
-    case Skein1024:
-        ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg, msgByteCnt);
-        break;
-    }
-    return ret;
+	int ret = SKEIN_FAIL;
+	Skein_Assert(ctx, SKEIN_FAIL);
+
+	switch (ctx->skeinSize) {
+	case Skein256:
+		ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg, msgByteCnt);
+		break;
+	case Skein512:
+		ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg, msgByteCnt);
+		break;
+	case Skein1024:
+		ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg, msgByteCnt);
+		break;
+	}
+	return ret;
 
 }
 
 int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
-                    size_t msgBitCnt)
+			size_t msgBitCnt)
 {
-    /*
-     * I've used the bit pad implementation from skein_test.c (see NIST CD)
-     * and modified it to use the convenience functions and added some pointer
-     * arithmetic.
-     */
-    size_t length;
-    u8 mask;
-    u8 *up;
-
-    /* only the final Update() call is allowed do partial bytes, else assert an error */
-    Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
-
-    /* if number of bits is a multiple of bytes - that's easy */
-    if ((msgBitCnt & 0x7) == 0) {
-        return skeinUpdate(ctx, msg, msgBitCnt >> 3);
-    }
-    skeinUpdate(ctx, msg, (msgBitCnt >> 3) + 1);
-
-    /*
-     * The next line rely on the fact that the real Skein contexts
-     * are a union in our context. After the addition the pointer points to
-     * Skein's real partial block buffer.
-     * If this layout ever changes we have to adapt this as well.
-     */
-    up = (u8 *)ctx->m.s256.X + ctx->skeinSize / 8;
-
-    Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
-
-    /* now "pad" the final partial byte the way NIST likes */
-    length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
-    Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
-    mask = (u8) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
-    up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
-
-    return SKEIN_SUCCESS;
+	/*
+	 * I've used the bit pad implementation from skein_test.c (see NIST CD)
+	 * and modified it to use the convenience functions and added some pointer
+	 * arithmetic.
+	 */
+	size_t length;
+	u8 mask;
+	u8 *up;
+
+	/* only the final Update() call is allowed do partial bytes, else assert an error */
+	Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
+
+	/* if number of bits is a multiple of bytes - that's easy */
+	if ((msgBitCnt & 0x7) == 0) {
+		return skeinUpdate(ctx, msg, msgBitCnt >> 3);
+	}
+	skeinUpdate(ctx, msg, (msgBitCnt >> 3) + 1);
+
+	/*
+	 * The next line rely on the fact that the real Skein contexts
+	 * are a union in our context. After the addition the pointer points to
+	 * Skein's real partial block buffer.
+	 * If this layout ever changes we have to adapt this as well.
+	 */
+	up = (u8 *)ctx->m.s256.X + ctx->skeinSize / 8;
+
+	Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
+
+	/* now "pad" the final partial byte the way NIST likes */
+	length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
+	Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
+	mask = (u8) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
+	up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
+
+	return SKEIN_SUCCESS;
 }
 
 int skeinFinal(struct skein_ctx *ctx, u8 *hash)
 {
-    int ret = SKEIN_FAIL;
-    Skein_Assert(ctx, SKEIN_FAIL);
-
-    switch (ctx->skeinSize) {
-    case Skein256:
-        ret = Skein_256_Final(&ctx->m.s256, (u8 *)hash);
-        break;
-    case Skein512:
-        ret = Skein_512_Final(&ctx->m.s512, (u8 *)hash);
-        break;
-    case Skein1024:
-        ret = Skein1024_Final(&ctx->m.s1024, (u8 *)hash);
-        break;
-    }
-    return ret;
+	int ret = SKEIN_FAIL;
+	Skein_Assert(ctx, SKEIN_FAIL);
+
+	switch (ctx->skeinSize) {
+	case Skein256:
+		ret = Skein_256_Final(&ctx->m.s256, (u8 *)hash);
+		break;
+	case Skein512:
+		ret = Skein_512_Final(&ctx->m.s512, (u8 *)hash);
+		break;
+	case Skein1024:
+		ret = Skein1024_Final(&ctx->m.s1024, (u8 *)hash);
+		break;
+	}
+	return ret;
 }
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index d98933eeb0bf..3c2878c966e1 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -6,167 +6,167 @@
 
 /*****************************  Skein_256 ******************************/
 void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
-                             size_t blkCnt, size_t byteCntAdd)
+				size_t blkCnt, size_t byteCntAdd)
 {
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
-    u64 words[3];
+	struct threefish_key key;
+	u64 tweak[2];
+	int i;
+	u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+	u64 words[3];
 
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	tweak[0] = ctx->h.T[0];
+	tweak[1] = ctx->h.T[1];
 
-    do  {
-        u64 carry = byteCntAdd;
+	do  {
+		u64 carry = byteCntAdd;
 
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
+		words[0] = tweak[0] & 0xffffffffL;
+		words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+		words[2] = (tweak[1] & 0xffffffffL);
 
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
+		for (i = 0; i < 3; i++) {
+			carry += words[i];
+			words[i] = carry;
+			carry >>= 32;
+		}        
+		tweak[0] = words[0] & 0xffffffffL;
+		tweak[0] |= (words[1] & 0xffffffffL) << 32;
+		tweak[1] |= words[2] & 0xffffffffL;
 
-        threefishSetKey(&key, Threefish256, ctx->X, tweak);
+		threefishSetKey(&key, Threefish256, ctx->X, tweak);
 
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
 
-        threefishEncryptBlockWords(&key, w, ctx->X);
+		threefishEncryptBlockWords(&key, w, ctx->X);
 
-        blkPtr += SKEIN_256_BLOCK_BYTES;
+		blkPtr += SKEIN_256_BLOCK_BYTES;
 
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = ctx->X[0] ^ w[0];
-        ctx->X[1] = ctx->X[1] ^ w[1];
-        ctx->X[2] = ctx->X[2] ^ w[2];
-        ctx->X[3] = ctx->X[3] ^ w[3];
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0] = ctx->X[0] ^ w[0];
+		ctx->X[1] = ctx->X[1] ^ w[1];
+		ctx->X[2] = ctx->X[2] ^ w[2];
+		ctx->X[3] = ctx->X[3] ^ w[3];
 
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
+		tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+	} while (--blkCnt);
 
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
+	ctx->h.T[0] = tweak[0];
+	ctx->h.T[1] = tweak[1];
 }
 
 void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
-                             size_t blkCnt, size_t byteCntAdd)
+				size_t blkCnt, size_t byteCntAdd)
 {
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64 words[3];
-    u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish512, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN_512_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = ctx->X[0] ^ w[0];
-        ctx->X[1] = ctx->X[1] ^ w[1];
-        ctx->X[2] = ctx->X[2] ^ w[2];
-        ctx->X[3] = ctx->X[3] ^ w[3];
-        ctx->X[4] = ctx->X[4] ^ w[4];
-        ctx->X[5] = ctx->X[5] ^ w[5];
-        ctx->X[6] = ctx->X[6] ^ w[6];
-        ctx->X[7] = ctx->X[7] ^ w[7];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
+	struct threefish_key key;
+	u64 tweak[2];
+	int i;
+	u64 words[3];
+	u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	tweak[0] = ctx->h.T[0];
+	tweak[1] = ctx->h.T[1];
+
+	do  {
+		u64 carry = byteCntAdd;
+
+		words[0] = tweak[0] & 0xffffffffL;
+		words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+		words[2] = (tweak[1] & 0xffffffffL);
+
+		for (i = 0; i < 3; i++) {
+			carry += words[i];
+			words[i] = carry;
+			carry >>= 32;
+		}        
+		tweak[0] = words[0] & 0xffffffffL;
+		tweak[0] |= (words[1] & 0xffffffffL) << 32;
+		tweak[1] |= words[2] & 0xffffffffL;
+
+		threefishSetKey(&key, Threefish512, ctx->X, tweak);
+
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
+
+		threefishEncryptBlockWords(&key, w, ctx->X);
+
+		blkPtr += SKEIN_512_BLOCK_BYTES;
+
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0] = ctx->X[0] ^ w[0];
+		ctx->X[1] = ctx->X[1] ^ w[1];
+		ctx->X[2] = ctx->X[2] ^ w[2];
+		ctx->X[3] = ctx->X[3] ^ w[3];
+		ctx->X[4] = ctx->X[4] ^ w[4];
+		ctx->X[5] = ctx->X[5] ^ w[5];
+		ctx->X[6] = ctx->X[6] ^ w[6];
+		ctx->X[7] = ctx->X[7] ^ w[7];
+
+		tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+	} while (--blkCnt);
+
+	ctx->h.T[0] = tweak[0];
+	ctx->h.T[1] = tweak[1];
 }
 
 void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
-                              size_t blkCnt, size_t byteCntAdd)
+				size_t blkCnt, size_t byteCntAdd)
 {
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64 words[3];
-    u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish1024, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN1024_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0]  = ctx->X[0]  ^ w[0];
-        ctx->X[1]  = ctx->X[1]  ^ w[1];
-        ctx->X[2]  = ctx->X[2]  ^ w[2];
-        ctx->X[3]  = ctx->X[3]  ^ w[3];
-        ctx->X[4]  = ctx->X[4]  ^ w[4];
-        ctx->X[5]  = ctx->X[5]  ^ w[5];
-        ctx->X[6]  = ctx->X[6]  ^ w[6];
-        ctx->X[7]  = ctx->X[7]  ^ w[7];
-        ctx->X[8]  = ctx->X[8]  ^ w[8];
-        ctx->X[9]  = ctx->X[9]  ^ w[9];
-        ctx->X[10] = ctx->X[10] ^ w[10];
-        ctx->X[11] = ctx->X[11] ^ w[11];
-        ctx->X[12] = ctx->X[12] ^ w[12];
-        ctx->X[13] = ctx->X[13] ^ w[13];
-        ctx->X[14] = ctx->X[14] ^ w[14];
-        ctx->X[15] = ctx->X[15] ^ w[15];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
+	struct threefish_key key;
+	u64 tweak[2];
+	int i;
+	u64 words[3];
+	u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	tweak[0] = ctx->h.T[0];
+	tweak[1] = ctx->h.T[1];
+
+	do  {
+		u64 carry = byteCntAdd;
+
+		words[0] = tweak[0] & 0xffffffffL;
+		words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+		words[2] = (tweak[1] & 0xffffffffL);
+
+		for (i = 0; i < 3; i++) {
+			carry += words[i];
+			words[i] = carry;
+			carry >>= 32;
+		}        
+		tweak[0] = words[0] & 0xffffffffL;
+		tweak[0] |= (words[1] & 0xffffffffL) << 32;
+		tweak[1] |= words[2] & 0xffffffffL;
+
+		threefishSetKey(&key, Threefish1024, ctx->X, tweak);
+
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
+
+		threefishEncryptBlockWords(&key, w, ctx->X);
+
+		blkPtr += SKEIN1024_BLOCK_BYTES;
+
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0]  = ctx->X[0]  ^ w[0];
+		ctx->X[1]  = ctx->X[1]  ^ w[1];
+		ctx->X[2]  = ctx->X[2]  ^ w[2];
+		ctx->X[3]  = ctx->X[3]  ^ w[3];
+		ctx->X[4]  = ctx->X[4]  ^ w[4];
+		ctx->X[5]  = ctx->X[5]  ^ w[5];
+		ctx->X[6]  = ctx->X[6]  ^ w[6];
+		ctx->X[7]  = ctx->X[7]  ^ w[7];
+		ctx->X[8]  = ctx->X[8]  ^ w[8];
+		ctx->X[9]  = ctx->X[9]  ^ w[9];
+		ctx->X[10] = ctx->X[10] ^ w[10];
+		ctx->X[11] = ctx->X[11] ^ w[11];
+		ctx->X[12] = ctx->X[12] ^ w[12];
+		ctx->X[13] = ctx->X[13] ^ w[13];
+		ctx->X[14] = ctx->X[14] ^ w[14];
+		ctx->X[15] = ctx->X[15] ^ w[15];
+
+		tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+	} while (--blkCnt);
+
+	ctx->h.T[0] = tweak[0];
+	ctx->h.T[1] = tweak[1];
 }
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index e62b6442783e..bb36860fafdf 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -40,10 +40,10 @@
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
 void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C */
-    enum {
-        WCNT = SKEIN_256_STATE_WORDS
-        };
+	{ /* do it in C */
+	enum {
+		WCNT = SKEIN_256_STATE_WORDS
+	};
 #undef  RCNT
 #define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
 
@@ -57,177 +57,177 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 #if (RCNT % SKEIN_UNROLL_256)
 #error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
 #endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	size_t  r;
+	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
-    u64  w[WCNT];                           /* local copy of input block */
+	u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
+	u64  w[WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+	const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+	Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
 #endif
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	ts[0] = ctx->h.T[0];
+	ts[1] = ctx->h.T[1];
+	do  {
+		/* this implementation only supports 2**64 input bytes (no carry out here) */
+		ts[0] += byteCntAdd;                    /* update processed length */
 
-        /* precompute the key schedule for this block */
-        ks[0] = ctx->X[0];     
-        ks[1] = ctx->X[1];
-        ks[2] = ctx->X[2];
-        ks[3] = ctx->X[3];
-        ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
+		/* precompute the key schedule for this block */
+		ks[0] = ctx->X[0];     
+		ks[1] = ctx->X[1];
+		ks[2] = ctx->X[2];
+		ks[3] = ctx->X[3];
+		ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
 
-        ts[2] = ts[0] ^ ts[1];
+		ts[2] = ts[0] ^ ts[1];
 
-        Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+		Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
+		DebugSaveTweak(ctx);
+		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
-        X0 = w[0] + ks[0];                      /* do the first full key injection */
-        X1 = w[1] + ks[1] + ts[0];
-        X2 = w[2] + ks[2] + ts[1];
-        X3 = w[3] + ks[3];
+		X0 = w[0] + ks[0];                      /* do the first full key injection */
+		X1 = w[1] + ks[1] + ts[0];
+		X2 = w[2] + ks[2] + ts[1];
+		X3 = w[3] + ks[3];
 
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
 
-        blkPtr += SKEIN_256_BLOCK_BYTES;
+		blkPtr += SKEIN_256_BLOCK_BYTES;
 
-        /* run the rounds */
+		/* run the rounds */
 
 #define Round256(p0, p1, p2, p3, ROT, rNum)                              \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 
 #if SKEIN_UNROLL_256 == 0                       
 #define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
-    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
 #define I256(R)                                                     \
-    X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
-    X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
-    X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
-    X3   += ks[((R)+4) % 5] +     (R)+1;                            \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
+	X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
+	X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
+	X3   += ks[((R)+4) % 5] +     (R)+1;                            \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
 #define R256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
 #define I256(R)                                                     \
-    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
-    X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
-    X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
-    X3   += ks[r+(R)+3] +    r+(R);                              \
-    ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
-    ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
+	X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
+	X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
+	X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
+	X3   += ks[r+(R)+3] +    r+(R);                              \
+	ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
+	ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
 #endif  
-        {    
+		{    
 #define R256_8_rounds(R)                  \
-        R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
-        R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
-        R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
-        R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
-        I256(2 * (R));                      \
-        R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
-        R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
-        R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
-        R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
-        I256(2 * (R) + 1);
-
-        R256_8_rounds(0);
+		R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
+		R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
+		R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
+		R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
+		I256(2 * (R));                      \
+		R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
+		R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
+		R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
+		R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
+		I256(2 * (R) + 1);
+
+		R256_8_rounds(0);
 
 #define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
 
-  #if   R256_Unroll_R(1)
-        R256_8_rounds(1);
-  #endif
-  #if   R256_Unroll_R(2)
-        R256_8_rounds(2);
-  #endif
-  #if   R256_Unroll_R(3)
-        R256_8_rounds(3);
-  #endif
-  #if   R256_Unroll_R(4)
-        R256_8_rounds(4);
-  #endif
-  #if   R256_Unroll_R(5)
-        R256_8_rounds(5);
-  #endif
-  #if   R256_Unroll_R(6)
-        R256_8_rounds(6);
-  #endif
-  #if   R256_Unroll_R(7)
-        R256_8_rounds(7);
-  #endif
-  #if   R256_Unroll_R(8)
-        R256_8_rounds(8);
-  #endif
-  #if   R256_Unroll_R(9)
-        R256_8_rounds(9);
-  #endif
-  #if   R256_Unroll_R(10)
-        R256_8_rounds(10);
-  #endif
-  #if   R256_Unroll_R(11)
-        R256_8_rounds(11);
-  #endif
-  #if   R256_Unroll_R(12)
-        R256_8_rounds(12);
-  #endif
-  #if   R256_Unroll_R(13)
-        R256_8_rounds(13);
-  #endif
-  #if   R256_Unroll_R(14)
-        R256_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_256 > 14)
+	#if   R256_Unroll_R(1)
+		R256_8_rounds(1);
+	#endif
+	#if   R256_Unroll_R(2)
+		R256_8_rounds(2);
+	#endif
+	#if   R256_Unroll_R(3)
+		R256_8_rounds(3);
+	#endif
+	#if   R256_Unroll_R(4)
+		R256_8_rounds(4);
+	#endif
+	#if   R256_Unroll_R(5)
+		R256_8_rounds(5);
+	#endif
+	#if   R256_Unroll_R(6)
+		R256_8_rounds(6);
+	#endif
+	#if   R256_Unroll_R(7)
+		R256_8_rounds(7);
+	#endif
+	#if   R256_Unroll_R(8)
+		R256_8_rounds(8);
+	#endif
+	#if   R256_Unroll_R(9)
+		R256_8_rounds(9);
+	#endif
+	#if   R256_Unroll_R(10)
+		R256_8_rounds(10);
+	#endif
+	#if   R256_Unroll_R(11)
+		R256_8_rounds(11);
+	#endif
+	#if   R256_Unroll_R(12)
+		R256_8_rounds(12);
+	#endif
+	#if   R256_Unroll_R(13)
+		R256_8_rounds(13);
+	#endif
+	#if   R256_Unroll_R(14)
+		R256_8_rounds(14);
+	#endif
+	#if  (SKEIN_UNROLL_256 > 14)
 #error  "need more unrolling in Skein_256_Process_Block"
-  #endif
-        }
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = X0 ^ w[0];
-        ctx->X[1] = X1 ^ w[1];
-        ctx->X[2] = X2 ^ w[2];
-        ctx->X[3] = X3 ^ w[3];
-
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
+	#endif
+		}
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0] = X0 ^ w[0];
+		ctx->X[1] = X1 ^ w[1];
+		ctx->X[2] = X2 ^ w[2];
+		ctx->X[3] = X3 ^ w[3];
+
+		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+	}
+	while (--blkCnt);
+	ctx->h.T[0] = ts[0];
+	ctx->h.T[1] = ts[1];
+}
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein_256_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein_256_Process_Block_CodeSize) -
-           ((u8 *) Skein_256_Process_Block);
-    }
+{
+	return ((u8 *) Skein_256_Process_Block_CodeSize) -
+		((u8 *) Skein_256_Process_Block);
+}
 unsigned int Skein_256_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_256;
-    }
+{
+	return SKEIN_UNROLL_256;
+}
 #endif
 #endif
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
 void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C */
-    enum {
-        WCNT = SKEIN_512_STATE_WORDS
-        };
+{ /* do it in C */
+	enum {
+		WCNT = SKEIN_512_STATE_WORDS
+	};
 #undef  RCNT
 #define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
 
@@ -241,200 +241,200 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 #if (RCNT % SKEIN_UNROLL_512)
 #error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
 #endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	size_t  r;
+	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
-    u64  w[WCNT];                           /* local copy of input block */
+	u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
+	u64  w[WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
-    Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
+	const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+	Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+	Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
 #endif
 
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0] = ctx->X[0];
-        ks[1] = ctx->X[1];
-        ks[2] = ctx->X[2];
-        ks[3] = ctx->X[3];
-        ks[4] = ctx->X[4];
-        ks[5] = ctx->X[5];
-        ks[6] = ctx->X[6];
-        ks[7] = ctx->X[7];
-        ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
-                ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
-
-        ts[2] = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X0   = w[0] + ks[0];                    /* do the first full key injection */
-        X1   = w[1] + ks[1];
-        X2   = w[2] + ks[2];
-        X3   = w[3] + ks[3];
-        X4   = w[4] + ks[4];
-        X5   = w[5] + ks[5] + ts[0];
-        X6   = w[6] + ks[6] + ts[1];
-        X7   = w[7] + ks[7];
-
-        blkPtr += SKEIN_512_BLOCK_BYTES;
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
-        /* run the rounds */
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	ts[0] = ctx->h.T[0];
+	ts[1] = ctx->h.T[1];
+	do  {
+		/* this implementation only supports 2**64 input bytes (no carry out here) */
+		ts[0] += byteCntAdd;                    /* update processed length */
+
+		/* precompute the key schedule for this block */
+		ks[0] = ctx->X[0];
+		ks[1] = ctx->X[1];
+		ks[2] = ctx->X[2];
+		ks[3] = ctx->X[3];
+		ks[4] = ctx->X[4];
+		ks[5] = ctx->X[5];
+		ks[6] = ctx->X[6];
+		ks[7] = ctx->X[7];
+		ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
+			ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
+
+		ts[2] = ts[0] ^ ts[1];
+
+		Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+		DebugSaveTweak(ctx);
+		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+		X0   = w[0] + ks[0];                    /* do the first full key injection */
+		X1   = w[1] + ks[1];
+		X2   = w[2] + ks[2];
+		X3   = w[3] + ks[3];
+		X4   = w[4] + ks[4];
+		X5   = w[5] + ks[5] + ts[0];
+		X6   = w[6] + ks[6] + ts[1];
+		X7   = w[7] + ks[7];
+
+		blkPtr += SKEIN_512_BLOCK_BYTES;
+
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+		/* run the rounds */
 #define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
+		X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+		X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
+		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
 
 #if SKEIN_UNROLL_512 == 0                       
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
-    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
 #define I512(R)                                                     \
-    X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
-    X1   += ks[((R) + 2) % 9];                                        \
-    X2   += ks[((R) + 3) % 9];                                        \
-    X3   += ks[((R) + 4) % 9];                                        \
-    X4   += ks[((R) + 5) % 9];                                        \
-    X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
-    X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
-    X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+		X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
+		X1   += ks[((R) + 2) % 9];                                        \
+		X2   += ks[((R) + 3) % 9];                                        \
+		X3   += ks[((R) + 4) % 9];                                        \
+		X4   += ks[((R) + 5) % 9];                                        \
+		X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
+		X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
+		X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
 #define I512(R)                                                     \
-    X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
-    X1   += ks[r + (R) + 1];                                            \
-    X2   += ks[r + (R) + 2];                                            \
-    X3   += ks[r + (R) + 3];                                            \
-    X4   += ks[r + (R) + 4];                                            \
-    X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
-    X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
-    X7   += ks[r + (R) + 7] +         r + (R);                              \
-    ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
-    ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
+		X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
+		X1   += ks[r + (R) + 1];                                            \
+		X2   += ks[r + (R) + 2];                                            \
+		X3   += ks[r + (R) + 3];                                            \
+		X4   += ks[r + (R) + 4];                                            \
+		X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
+		X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
+		X7   += ks[r + (R) + 7] +         r + (R);                              \
+		ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
+		ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+		for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
 #endif                         /* end of looped code definitions */
-        {
+		{
 #define R512_8_rounds(R)  /* do 8 full rounds */  \
-        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
-        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
-        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
-        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
-        I512(2 * (R));                              \
-        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
-        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
-        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
-        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
-        I512(2 * (R) + 1);        /* and key injection */
-
-        R512_8_rounds(0);
+			R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
+			R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
+			R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
+			R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
+			I512(2 * (R));                              \
+			R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
+			R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
+			R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
+			R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
+			I512(2 * (R) + 1);        /* and key injection */
+
+			R512_8_rounds(0);
 
 #define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
 
-  #if   R512_Unroll_R(1)
-        R512_8_rounds(1);
-  #endif
-  #if   R512_Unroll_R(2)
-        R512_8_rounds(2);
-  #endif
-  #if   R512_Unroll_R(3)
-        R512_8_rounds(3);
-  #endif
-  #if   R512_Unroll_R(4)
-        R512_8_rounds(4);
-  #endif
-  #if   R512_Unroll_R(5)
-        R512_8_rounds(5);
-  #endif
-  #if   R512_Unroll_R(6)
-        R512_8_rounds(6);
-  #endif
-  #if   R512_Unroll_R(7)
-        R512_8_rounds(7);
-  #endif
-  #if   R512_Unroll_R(8)
-        R512_8_rounds(8);
-  #endif
-  #if   R512_Unroll_R(9)
-        R512_8_rounds(9);
-  #endif
-  #if   R512_Unroll_R(10)
-        R512_8_rounds(10);
-  #endif
-  #if   R512_Unroll_R(11)
-        R512_8_rounds(11);
-  #endif
-  #if   R512_Unroll_R(12)
-        R512_8_rounds(12);
-  #endif
-  #if   R512_Unroll_R(13)
-        R512_8_rounds(13);
-  #endif
-  #if   R512_Unroll_R(14)
-        R512_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_512 > 14)
+	#if   R512_Unroll_R(1)
+			R512_8_rounds(1);
+	#endif
+	#if   R512_Unroll_R(2)
+			R512_8_rounds(2);
+	#endif
+	#if   R512_Unroll_R(3)
+			R512_8_rounds(3);
+	#endif
+	#if   R512_Unroll_R(4)
+			R512_8_rounds(4);
+	#endif
+	#if   R512_Unroll_R(5)
+			R512_8_rounds(5);
+	#endif
+	#if   R512_Unroll_R(6)
+			R512_8_rounds(6);
+	#endif
+	#if   R512_Unroll_R(7)
+			R512_8_rounds(7);
+	#endif
+	#if   R512_Unroll_R(8)
+			R512_8_rounds(8);
+	#endif
+	#if   R512_Unroll_R(9)
+			R512_8_rounds(9);
+	#endif
+	#if   R512_Unroll_R(10)
+			R512_8_rounds(10);
+	#endif
+	#if   R512_Unroll_R(11)
+			R512_8_rounds(11);
+	#endif
+	#if   R512_Unroll_R(12)
+			R512_8_rounds(12);
+	#endif
+	#if   R512_Unroll_R(13)
+			R512_8_rounds(13);
+	#endif
+	#if   R512_Unroll_R(14)
+			R512_8_rounds(14);
+	#endif
+	#if  (SKEIN_UNROLL_512 > 14)
 #error  "need more unrolling in Skein_512_Process_Block"
-  #endif
-        }
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = X0 ^ w[0];
-        ctx->X[1] = X1 ^ w[1];
-        ctx->X[2] = X2 ^ w[2];
-        ctx->X[3] = X3 ^ w[3];
-        ctx->X[4] = X4 ^ w[4];
-        ctx->X[5] = X5 ^ w[5];
-        ctx->X[6] = X6 ^ w[6];
-        ctx->X[7] = X7 ^ w[7];
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
+	#endif
+		}
+
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0] = X0 ^ w[0];
+		ctx->X[1] = X1 ^ w[1];
+		ctx->X[2] = X2 ^ w[2];
+		ctx->X[3] = X3 ^ w[3];
+		ctx->X[4] = X4 ^ w[4];
+		ctx->X[5] = X5 ^ w[5];
+		ctx->X[6] = X6 ^ w[6];
+		ctx->X[7] = X7 ^ w[7];
+		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+	}
+	while (--blkCnt);
+	ctx->h.T[0] = ts[0];
+	ctx->h.T[1] = ts[1];
+}
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein_512_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein_512_Process_Block_CodeSize) -
-           ((u8 *) Skein_512_Process_Block);
-    }
+{
+	return ((u8 *) Skein_512_Process_Block_CodeSize) -
+		((u8 *) Skein_512_Process_Block);
+}
 unsigned int Skein_512_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_512;
-    }
+{
+	return SKEIN_UNROLL_512;
+}
 #endif
 #endif
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
 void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C, always looping (unrolled is bigger AND slower!) */
-    enum {
-        WCNT = SKEIN1024_STATE_WORDS
-        };
+{ /* do it in C, always looping (unrolled is bigger AND slower!) */
+	enum {
+		WCNT = SKEIN1024_STATE_WORDS
+	};
 #undef  RCNT
 #define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
 
@@ -448,239 +448,239 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 #if (RCNT % SKEIN_UNROLL_1024)
 #error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
 #endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	size_t  r;
+	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
 
-    u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
-            X08, X09, X10, X11, X12, X13, X14, X15;
-    u64  w[WCNT];                            /* local copy of input block */
+	u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
+		X08, X09, X10, X11, X12, X13, X14, X15;
+	u64  w[WCNT];                            /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
-    Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
-    Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
-    Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
+	const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+	Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
+	Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
+	Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
+	Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
 #endif
 
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0]  = ctx->X[0];
-        ks[1]  = ctx->X[1];
-        ks[2]  = ctx->X[2];
-        ks[3]  = ctx->X[3];
-        ks[4]  = ctx->X[4];
-        ks[5]  = ctx->X[5];
-        ks[6]  = ctx->X[6];
-        ks[7]  = ctx->X[7];
-        ks[8]  = ctx->X[8];
-        ks[9]  = ctx->X[9];
-        ks[10] = ctx->X[10];
-        ks[11] = ctx->X[11];
-        ks[12] = ctx->X[12];
-        ks[13] = ctx->X[13];
-        ks[14] = ctx->X[14];
-        ks[15] = ctx->X[15];
-        ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
-                  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
-                  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
-                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
-
-        ts[2]  = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X00    =  w[0] +  ks[0];                 /* do the first full key injection */
-        X01    =  w[1] +  ks[1];
-        X02    =  w[2] +  ks[2];
-        X03    =  w[3] +  ks[3];
-        X04    =  w[4] +  ks[4];
-        X05    =  w[5] +  ks[5];
-        X06    =  w[6] +  ks[6];
-        X07    =  w[7] +  ks[7];
-        X08    =  w[8] +  ks[8];
-        X09    =  w[9] +  ks[9];
-        X10    = w[10] + ks[10];
-        X11    = w[11] + ks[11];
-        X12    = w[12] + ks[12];
-        X13    = w[13] + ks[13] + ts[0];
-        X14    = w[14] + ks[14] + ts[1];
-        X15    = w[15] + ks[15];
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	ts[0] = ctx->h.T[0];
+	ts[1] = ctx->h.T[1];
+	do  {
+		/* this implementation only supports 2**64 input bytes (no carry out here) */
+		ts[0] += byteCntAdd;                    /* update processed length */
+
+		/* precompute the key schedule for this block */
+		ks[0]  = ctx->X[0];
+		ks[1]  = ctx->X[1];
+		ks[2]  = ctx->X[2];
+		ks[3]  = ctx->X[3];
+		ks[4]  = ctx->X[4];
+		ks[5]  = ctx->X[5];
+		ks[6]  = ctx->X[6];
+		ks[7]  = ctx->X[7];
+		ks[8]  = ctx->X[8];
+		ks[9]  = ctx->X[9];
+		ks[10] = ctx->X[10];
+		ks[11] = ctx->X[11];
+		ks[12] = ctx->X[12];
+		ks[13] = ctx->X[13];
+		ks[14] = ctx->X[14];
+		ks[15] = ctx->X[15];
+		ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
+			  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
+			  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
+			  ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
+
+		ts[2]  = ts[0] ^ ts[1];
+
+		Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+		DebugSaveTweak(ctx);
+		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+		X00    =  w[0] +  ks[0];                 /* do the first full key injection */
+		X01    =  w[1] +  ks[1];
+		X02    =  w[2] +  ks[2];
+		X03    =  w[3] +  ks[3];
+		X04    =  w[4] +  ks[4];
+		X05    =  w[5] +  ks[5];
+		X06    =  w[6] +  ks[6];
+		X07    =  w[7] +  ks[7];
+		X08    =  w[8] +  ks[8];
+		X09    =  w[9] +  ks[9];
+		X10    = w[10] + ks[10];
+		X11    = w[11] + ks[11];
+		X12    = w[12] + ks[12];
+		X13    = w[13] + ks[13] + ts[0];
+		X14    = w[14] + ks[14] + ts[1];
+		X15    = w[15] + ks[15];
+
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
 
 #define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
-    X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
-    X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
-    X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
-    X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
+		X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
+		X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
+		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
+		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
+		X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
+		X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
+		X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
+		X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
 
 #if SKEIN_UNROLL_1024 == 0                      
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
 
 #define I1024(R)                                                        \
-    X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
-    X01   += ks[((R) +  2) % 17];                                       \
-    X02   += ks[((R) +  3) % 17];                                       \
-    X03   += ks[((R) +  4) % 17];                                       \
-    X04   += ks[((R) +  5) % 17];                                       \
-    X05   += ks[((R) +  6) % 17];                                       \
-    X06   += ks[((R) +  7) % 17];                                       \
-    X07   += ks[((R) +  8) % 17];                                       \
-    X08   += ks[((R) +  9) % 17];                                       \
-    X09   += ks[((R) + 10) % 17];                                       \
-    X10   += ks[((R) + 11) % 17];                                       \
-    X11   += ks[((R) + 12) % 17];                                       \
-    X12   += ks[((R) + 13) % 17];                                       \
-    X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
-    X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
-    X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
+		X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
+		X01   += ks[((R) +  2) % 17];                                       \
+		X02   += ks[((R) +  3) % 17];                                       \
+		X03   += ks[((R) +  4) % 17];                                       \
+		X04   += ks[((R) +  5) % 17];                                       \
+		X05   += ks[((R) +  6) % 17];                                       \
+		X06   += ks[((R) +  7) % 17];                                       \
+		X07   += ks[((R) +  8) % 17];                                       \
+		X08   += ks[((R) +  9) % 17];                                       \
+		X09   += ks[((R) + 10) % 17];                                       \
+		X10   += ks[((R) + 11) % 17];                                       \
+		X11   += ks[((R) + 12) % 17];                                       \
+		X12   += ks[((R) + 13) % 17];                                       \
+		X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
+		X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
+		X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
 #else                                       /* looping version */
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
+		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
 
 #define I1024(R)                                                      \
-    X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
-    X01   += ks[r + (R) +  1];                                            \
-    X02   += ks[r + (R) +  2];                                            \
-    X03   += ks[r + (R) +  3];                                            \
-    X04   += ks[r + (R) +  4];                                            \
-    X05   += ks[r + (R) +  5];                                            \
-    X06   += ks[r + (R) +  6];                                            \
-    X07   += ks[r + (R) +  7];                                            \
-    X08   += ks[r + (R) +  8];                                            \
-    X09   += ks[r + (R) +  9];                                            \
-    X10   += ks[r + (R) + 10];                                            \
-    X11   += ks[r + (R) + 11];                                            \
-    X12   += ks[r + (R) + 12];                                            \
-    X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
-    X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
-    X15   += ks[r + (R) + 15] +         r + (R);                          \
-    ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
-    ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
-    Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
+		X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
+		X01   += ks[r + (R) +  1];                                            \
+		X02   += ks[r + (R) +  2];                                            \
+		X03   += ks[r + (R) +  3];                                            \
+		X04   += ks[r + (R) +  4];                                            \
+		X05   += ks[r + (R) +  5];                                            \
+		X06   += ks[r + (R) +  6];                                            \
+		X07   += ks[r + (R) +  7];                                            \
+		X08   += ks[r + (R) +  8];                                            \
+		X09   += ks[r + (R) +  9];                                            \
+		X10   += ks[r + (R) + 10];                                            \
+		X11   += ks[r + (R) + 11];                                            \
+		X12   += ks[r + (R) + 12];                                            \
+		X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
+		X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
+		X15   += ks[r + (R) + 15] +         r + (R);                          \
+		ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
+		ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
+		Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
 #endif  
-        {
+		{
 #define R1024_8_rounds(R)    /* do 8 full rounds */                               \
-        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
-        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
-        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
-        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
-        I1024(2*(R));                                                             \
-        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
-        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
-        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
-        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
-        I1024(2*(R)+1);
-
-        R1024_8_rounds(0);
+			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
+			R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
+			R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
+			R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
+			I1024(2*(R));                                                             \
+			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
+			R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
+			R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
+			R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
+			I1024(2*(R)+1);
+
+			R1024_8_rounds(0);
 
 #define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
 
-  #if   R1024_Unroll_R(1)
-        R1024_8_rounds(1);
-  #endif
-  #if   R1024_Unroll_R(2)
-        R1024_8_rounds(2);
-  #endif
-  #if   R1024_Unroll_R(3)
-        R1024_8_rounds(3);
-  #endif
-  #if   R1024_Unroll_R(4)
-        R1024_8_rounds(4);
-  #endif
-  #if   R1024_Unroll_R(5)
-        R1024_8_rounds(5);
-  #endif
-  #if   R1024_Unroll_R(6)
-        R1024_8_rounds(6);
-  #endif
-  #if   R1024_Unroll_R(7)
-        R1024_8_rounds(7);
-  #endif
-  #if   R1024_Unroll_R(8)
-        R1024_8_rounds(8);
-  #endif
-  #if   R1024_Unroll_R(9)
-        R1024_8_rounds(9);
-  #endif
-  #if   R1024_Unroll_R(10)
-        R1024_8_rounds(10);
-  #endif
-  #if   R1024_Unroll_R(11)
-        R1024_8_rounds(11);
-  #endif
-  #if   R1024_Unroll_R(12)
-        R1024_8_rounds(12);
-  #endif
-  #if   R1024_Unroll_R(13)
-        R1024_8_rounds(13);
-  #endif
-  #if   R1024_Unroll_R(14)
-        R1024_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_1024 > 14)
+	#if   R1024_Unroll_R(1)
+			R1024_8_rounds(1);
+	#endif
+	#if   R1024_Unroll_R(2)
+			R1024_8_rounds(2);
+	#endif
+	#if   R1024_Unroll_R(3)
+			R1024_8_rounds(3);
+	#endif
+	#if   R1024_Unroll_R(4)
+			R1024_8_rounds(4);
+	#endif
+	#if   R1024_Unroll_R(5)
+			R1024_8_rounds(5);
+	#endif
+	#if   R1024_Unroll_R(6)
+			R1024_8_rounds(6);
+	#endif
+	#if   R1024_Unroll_R(7)
+			R1024_8_rounds(7);
+	#endif
+	#if   R1024_Unroll_R(8)
+			R1024_8_rounds(8);
+	#endif
+	#if   R1024_Unroll_R(9)
+			R1024_8_rounds(9);
+	#endif
+	#if   R1024_Unroll_R(10)
+			R1024_8_rounds(10);
+	#endif
+	#if   R1024_Unroll_R(11)
+			R1024_8_rounds(11);
+	#endif
+	#if   R1024_Unroll_R(12)
+			R1024_8_rounds(12);
+	#endif
+	#if   R1024_Unroll_R(13)
+			R1024_8_rounds(13);
+	#endif
+	#if   R1024_Unroll_R(14)
+			R1024_8_rounds(14);
+	#endif
+#if  (SKEIN_UNROLL_1024 > 14)
 #error  "need more unrolling in Skein_1024_Process_Block"
   #endif
-        }
-        /* do the final "feedforward" xor, update context chaining vars */
-
-        ctx->X[0] = X00 ^ w[0];
-        ctx->X[1] = X01 ^ w[1];
-        ctx->X[2] = X02 ^ w[2];
-        ctx->X[3] = X03 ^ w[3];
-        ctx->X[4] = X04 ^ w[4];
-        ctx->X[5] = X05 ^ w[5];
-        ctx->X[6] = X06 ^ w[6];
-        ctx->X[7] = X07 ^ w[7];
-        ctx->X[8] = X08 ^ w[8];
-        ctx->X[9] = X09 ^ w[9];
-        ctx->X[10] = X10 ^ w[10];
-        ctx->X[11] = X11 ^ w[11];
-        ctx->X[12] = X12 ^ w[12];
-        ctx->X[13] = X13 ^ w[13];
-        ctx->X[14] = X14 ^ w[14];
-        ctx->X[15] = X15 ^ w[15];
-
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-        
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        blkPtr += SKEIN1024_BLOCK_BYTES;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
+		}
+		/* do the final "feedforward" xor, update context chaining vars */
+
+		ctx->X[0] = X00 ^ w[0];
+		ctx->X[1] = X01 ^ w[1];
+		ctx->X[2] = X02 ^ w[2];
+		ctx->X[3] = X03 ^ w[3];
+		ctx->X[4] = X04 ^ w[4];
+		ctx->X[5] = X05 ^ w[5];
+		ctx->X[6] = X06 ^ w[6];
+		ctx->X[7] = X07 ^ w[7];
+		ctx->X[8] = X08 ^ w[8];
+		ctx->X[9] = X09 ^ w[9];
+		ctx->X[10] = X10 ^ w[10];
+		ctx->X[11] = X11 ^ w[11];
+		ctx->X[12] = X12 ^ w[12];
+		ctx->X[13] = X13 ^ w[13];
+		ctx->X[14] = X14 ^ w[14];
+		ctx->X[15] = X15 ^ w[15];
+
+		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+		blkPtr += SKEIN1024_BLOCK_BYTES;
+	}
+	while (--blkCnt);
+	ctx->h.T[0] = ts[0];
+	ctx->h.T[1] = ts[1];
+}
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein1024_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein1024_Process_Block_CodeSize) -
-           ((u8 *) Skein1024_Process_Block);
-    }
+{
+	return ((u8 *) Skein1024_Process_Block_CodeSize) -
+		((u8 *) Skein1024_Process_Block);
+}
 unsigned int Skein1024_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_1024;
-    }
+{
+	return SKEIN_UNROLL_1024;
+}
 #endif
 #endif
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index e3be37ea8024..1730a3120a0f 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -3,1382 +3,1380 @@
 
 
 void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
-        {
-
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3],
-      b4 = input[4], b5 = input[5],
-      b6 = input[6], b7 = input[7],
-      b8 = input[8], b9 = input[9],
-      b10 = input[10], b11 = input[11],
-      b12 = input[12], b13 = input[13],
-      b14 = input[14], b15 = input[15];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-      k8 = keyCtx->key[8], k9 = keyCtx->key[9],
-      k10 = keyCtx->key[10], k11 = keyCtx->key[11],
-      k12 = keyCtx->key[12], k13 = keyCtx->key[13],
-      k14 = keyCtx->key[14], k15 = keyCtx->key[15],
-      k16 = keyCtx->key[16];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7],
+	  b8 = input[8], b9 = input[9],
+	  b10 = input[10], b11 = input[11],
+	  b12 = input[12], b13 = input[13],
+	  b14 = input[14], b15 = input[15];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+	  k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+	  k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+	  k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+	  k16 = keyCtx->key[16];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-            b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k3; b2 += b3 + k2; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k5; b4 += b5 + k4; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k7; b6 += b7 + k6; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k9; b8 += b9 + k8; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k11; b10 += b11 + k10; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k13 + t0; b12 += b13 + k12; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k15; b14 += b15 + k14 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k2; b0 += b1 + k1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k4; b2 += b3 + k3; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k6; b4 += b5 + k5; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k8; b6 += b7 + k7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k10; b8 += b9 + k9; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k12; b10 += b11 + k11; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k14 + t1; b12 += b13 + k13; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k16 + 1; b14 += b15 + k15 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k3; b0 += b1 + k2; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k5; b2 += b3 + k4; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k7; b4 += b5 + k6; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k9; b6 += b7 + k8; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k11; b8 += b9 + k10; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k13; b10 += b11 + k12; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k15 + t2; b12 += b13 + k14; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k0 + 2; b14 += b15 + k16 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k4; b0 += b1 + k3; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k6; b2 += b3 + k5; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k8; b4 += b5 + k7; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k10; b6 += b7 + k9; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k12; b8 += b9 + k11; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k14; b10 += b11 + k13; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k16 + t0; b12 += b13 + k15; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k1 + 3; b14 += b15 + k0 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k5; b0 += b1 + k4; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k7; b2 += b3 + k6; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k9; b4 += b5 + k8; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k11; b6 += b7 + k10; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k13; b8 += b9 + k12; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k15; b10 += b11 + k14; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k0 + t1; b12 += b13 + k16; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k2 + 4; b14 += b15 + k1 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k6; b0 += b1 + k5; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k8; b2 += b3 + k7; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k10; b4 += b5 + k9; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k12; b6 += b7 + k11; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k14; b8 += b9 + k13; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k16; b10 += b11 + k15; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k1 + t2; b12 += b13 + k0; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k3 + 5; b14 += b15 + k2 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k7; b0 += b1 + k6; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k9; b2 += b3 + k8; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k11; b4 += b5 + k10; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k13; b6 += b7 + k12; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k15; b8 += b9 + k14; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k0; b10 += b11 + k16; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k2 + t0; b12 += b13 + k1; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k4 + 6; b14 += b15 + k3 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k8; b0 += b1 + k7; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k10; b2 += b3 + k9; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k12; b4 += b5 + k11; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k14; b6 += b7 + k13; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k16; b8 += b9 + k15; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k1; b10 += b11 + k0; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k3 + t1; b12 += b13 + k2; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k5 + 7; b14 += b15 + k4 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k9; b0 += b1 + k8; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k11; b2 += b3 + k10; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k13; b4 += b5 + k12; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k15; b6 += b7 + k14; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k0; b8 += b9 + k16; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k2; b10 += b11 + k1; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k4 + t2; b12 += b13 + k3; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k6 + 8; b14 += b15 + k5 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k10; b0 += b1 + k9; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k12; b2 += b3 + k11; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k14; b4 += b5 + k13; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k16; b6 += b7 + k15; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k1; b8 += b9 + k0; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k3; b10 += b11 + k2; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k5 + t0; b12 += b13 + k4; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k7 + 9; b14 += b15 + k6 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k11; b0 += b1 + k10; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k13; b2 += b3 + k12; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k15; b4 += b5 + k14; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k0; b6 += b7 + k16; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k2; b8 += b9 + k1; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k4; b10 += b11 + k3; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k6 + t1; b12 += b13 + k5; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k8 + 10; b14 += b15 + k7 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k12; b0 += b1 + k11; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k14; b2 += b3 + k13; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k16; b4 += b5 + k15; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k1; b6 += b7 + k0; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k3; b8 += b9 + k2; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k5; b10 += b11 + k4; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k7 + t2; b12 += b13 + k6; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k9 + 11; b14 += b15 + k8 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k13; b0 += b1 + k12; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k15; b2 += b3 + k14; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k0; b4 += b5 + k16; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k2; b6 += b7 + k1; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k4; b8 += b9 + k3; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k6; b10 += b11 + k5; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k8 + t0; b12 += b13 + k7; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k10 + 12; b14 += b15 + k9 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k14; b0 += b1 + k13; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k16; b2 += b3 + k15; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k1; b4 += b5 + k0; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k3; b6 += b7 + k2; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k5; b8 += b9 + k4; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k7; b10 += b11 + k6; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k9 + t1; b12 += b13 + k8; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k11 + 13; b14 += b15 + k10 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k15; b0 += b1 + k14; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k0; b2 += b3 + k16; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k2; b4 += b5 + k1; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k4; b6 += b7 + k3; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k6; b8 += b9 + k5; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k8; b10 += b11 + k7; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k10 + t2; b12 += b13 + k9; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k12 + 14; b14 += b15 + k11 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k16; b0 += b1 + k15; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k1; b2 += b3 + k0; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k3; b4 += b5 + k2; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k5; b6 += b7 + k4; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k7; b8 += b9 + k6; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k9; b10 += b11 + k8; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k11 + t0; b12 += b13 + k10; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k13 + 15; b14 += b15 + k12 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k0; b0 += b1 + k16; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k2; b2 += b3 + k1; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k4; b4 += b5 + k3; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k6; b6 += b7 + k5; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k8; b8 += b9 + k7; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k10; b10 += b11 + k9; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k12 + t1; b12 += b13 + k11; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k14 + 16; b14 += b15 + k13 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k1; b0 += b1 + k0; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k3; b2 += b3 + k2; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k5; b4 += b5 + k4; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k7; b6 += b7 + k6; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k9; b8 += b9 + k8; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k11; b10 += b11 + k10; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k13 + t2; b12 += b13 + k12; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k15 + 17; b14 += b15 + k14 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k2; b0 += b1 + k1; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k4; b2 += b3 + k3; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k6; b4 += b5 + k5; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k8; b6 += b7 + k7; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k10; b8 += b9 + k9; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k12; b10 += b11 + k11; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k14 + t0; b12 += b13 + k13; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k16 + 18; b14 += b15 + k15 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k3; b0 += b1 + k2; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k5; b2 += b3 + k4; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k7; b4 += b5 + k6; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k9; b6 += b7 + k8; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k11; b8 += b9 + k10; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k13; b10 += b11 + k12; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k15 + t1; b12 += b13 + k14; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k0 + 19; b14 += b15 + k16 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k5; b4 += b5 + k4; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k7; b6 += b7 + k6; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k9; b8 += b9 + k8; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k11; b10 += b11 + k10; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k13 + t0; b12 += b13 + k12; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k15; b14 += b15 + k14 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k6; b4 += b5 + k5; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k8; b6 += b7 + k7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k10; b8 += b9 + k9; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k12; b10 += b11 + k11; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k14 + t1; b12 += b13 + k13; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k16 + 1; b14 += b15 + k15 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k7; b4 += b5 + k6; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k9; b6 += b7 + k8; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k11; b8 += b9 + k10; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k13; b10 += b11 + k12; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k15 + t2; b12 += b13 + k14; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k0 + 2; b14 += b15 + k16 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k8; b4 += b5 + k7; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k10; b6 += b7 + k9; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k12; b8 += b9 + k11; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k14; b10 += b11 + k13; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k16 + t0; b12 += b13 + k15; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k1 + 3; b14 += b15 + k0 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k9; b4 += b5 + k8; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k11; b6 += b7 + k10; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k13; b8 += b9 + k12; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k15; b10 += b11 + k14; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k0 + t1; b12 += b13 + k16; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k2 + 4; b14 += b15 + k1 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k10; b4 += b5 + k9; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k12; b6 += b7 + k11; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k14; b8 += b9 + k13; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k16; b10 += b11 + k15; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k1 + t2; b12 += b13 + k0; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k3 + 5; b14 += b15 + k2 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k9; b2 += b3 + k8; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k11; b4 += b5 + k10; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k13; b6 += b7 + k12; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k15; b8 += b9 + k14; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k0; b10 += b11 + k16; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k2 + t0; b12 += b13 + k1; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k4 + 6; b14 += b15 + k3 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k10; b2 += b3 + k9; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k12; b4 += b5 + k11; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k14; b6 += b7 + k13; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k16; b8 += b9 + k15; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k1; b10 += b11 + k0; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k3 + t1; b12 += b13 + k2; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k5 + 7; b14 += b15 + k4 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k9; b0 += b1 + k8; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k11; b2 += b3 + k10; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k13; b4 += b5 + k12; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k15; b6 += b7 + k14; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k0; b8 += b9 + k16; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k2; b10 += b11 + k1; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k4 + t2; b12 += b13 + k3; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k6 + 8; b14 += b15 + k5 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k10; b0 += b1 + k9; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k12; b2 += b3 + k11; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k14; b4 += b5 + k13; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k16; b6 += b7 + k15; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k1; b8 += b9 + k0; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k3; b10 += b11 + k2; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k5 + t0; b12 += b13 + k4; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k7 + 9; b14 += b15 + k6 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k11; b0 += b1 + k10; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k13; b2 += b3 + k12; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k15; b4 += b5 + k14; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k0; b6 += b7 + k16; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k2; b8 += b9 + k1; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k4; b10 += b11 + k3; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k6 + t1; b12 += b13 + k5; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k8 + 10; b14 += b15 + k7 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k12; b0 += b1 + k11; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k14; b2 += b3 + k13; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k16; b4 += b5 + k15; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k1; b6 += b7 + k0; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k3; b8 += b9 + k2; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k5; b10 += b11 + k4; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k7 + t2; b12 += b13 + k6; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k9 + 11; b14 += b15 + k8 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k13; b0 += b1 + k12; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k15; b2 += b3 + k14; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k0; b4 += b5 + k16; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k2; b6 += b7 + k1; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k4; b8 += b9 + k3; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k6; b10 += b11 + k5; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k8 + t0; b12 += b13 + k7; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k10 + 12; b14 += b15 + k9 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k14; b0 += b1 + k13; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k16; b2 += b3 + k15; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k1; b4 += b5 + k0; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k3; b6 += b7 + k2; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k5; b8 += b9 + k4; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k7; b10 += b11 + k6; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k9 + t1; b12 += b13 + k8; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k11 + 13; b14 += b15 + k10 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k15; b0 += b1 + k14; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k0; b2 += b3 + k16; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k2; b4 += b5 + k1; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k4; b6 += b7 + k3; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k6; b8 += b9 + k5; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k8; b10 += b11 + k7; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k10 + t2; b12 += b13 + k9; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k12 + 14; b14 += b15 + k11 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k16; b0 += b1 + k15; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k3; b4 += b5 + k2; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k5; b6 += b7 + k4; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k7; b8 += b9 + k6; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k9; b10 += b11 + k8; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k11 + t0; b12 += b13 + k10; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k13 + 15; b14 += b15 + k12 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k0; b0 += b1 + k16; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k4; b4 += b5 + k3; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k6; b6 += b7 + k5; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k8; b8 += b9 + k7; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k10; b10 += b11 + k9; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k12 + t1; b12 += b13 + k11; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k14 + 16; b14 += b15 + k13 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k5; b4 += b5 + k4; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k7; b6 += b7 + k6; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k9; b8 += b9 + k8; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k11; b10 += b11 + k10; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k13 + t2; b12 += b13 + k12; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k15 + 17; b14 += b15 + k14 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k6; b4 += b5 + k5; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k8; b6 += b7 + k7; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k10; b8 += b9 + k9; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k12; b10 += b11 + k11; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k14 + t0; b12 += b13 + k13; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k16 + 18; b14 += b15 + k15 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k7; b4 += b5 + k6; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k9; b6 += b7 + k8; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k11; b8 += b9 + k10; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k13; b10 += b11 + k12; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k15 + t1; b12 += b13 + k14; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k0 + 19; b14 += b15 + k16 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
 
-            output[0] = b0 + k3;
-            output[1] = b1 + k4;
-            output[2] = b2 + k5;
-            output[3] = b3 + k6;
-            output[4] = b4 + k7;
-            output[5] = b5 + k8;
-            output[6] = b6 + k9;
-            output[7] = b7 + k10;
-            output[8] = b8 + k11;
-            output[9] = b9 + k12;
-            output[10] = b10 + k13;
-            output[11] = b11 + k14;
-            output[12] = b12 + k15;
-            output[13] = b13 + k16 + t2;
-            output[14] = b14 + k0 + t0;
-            output[15] = b15 + k1 + 20;
-        }
+	output[0] = b0 + k3;
+	output[1] = b1 + k4;
+	output[2] = b2 + k5;
+	output[3] = b3 + k6;
+	output[4] = b4 + k7;
+	output[5] = b5 + k8;
+	output[6] = b6 + k9;
+	output[7] = b7 + k10;
+	output[8] = b8 + k11;
+	output[9] = b9 + k12;
+	output[10] = b10 + k13;
+	output[11] = b11 + k14;
+	output[12] = b12 + k15;
+	output[13] = b13 + k16 + t2;
+	output[14] = b14 + k0 + t0;
+	output[15] = b15 + k1 + 20;
+}
 
 void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
 {
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7],
+	  b8 = input[8], b9 = input[9],
+	  b10 = input[10], b11 = input[11],
+	  b12 = input[12], b13 = input[13],
+	  b14 = input[14], b15 = input[15];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+	  k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+	  k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+	  k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+	  k16 = keyCtx->key[16];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
+	u64 tmp;
 
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3],
-      b4 = input[4], b5 = input[5],
-      b6 = input[6], b7 = input[7],
-      b8 = input[8], b9 = input[9],
-      b10 = input[10], b11 = input[11],
-      b12 = input[12], b13 = input[13],
-      b14 = input[14], b15 = input[15];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-      k8 = keyCtx->key[8], k9 = keyCtx->key[9],
-      k10 = keyCtx->key[10], k11 = keyCtx->key[11],
-      k12 = keyCtx->key[12], k13 = keyCtx->key[13],
-      k14 = keyCtx->key[14], k15 = keyCtx->key[15],
-      k16 = keyCtx->key[16];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
-    u64 tmp;
-
-            b0 -= k3;
-            b1 -= k4;
-            b2 -= k5;
-            b3 -= k6;
-            b4 -= k7;
-            b5 -= k8;
-            b6 -= k9;
-            b7 -= k10;
-            b8 -= k11;
-            b9 -= k12;
-            b10 -= k13;
-            b11 -= k14;
-            b12 -= k15;
-            b13 -= k16 + t2;
-            b14 -= k0 + t0;
-            b15 -= k1 + 20;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k16 + t2; b15 -= k0 + 19;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k14; b13 -= k15 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k12; b11 -= k13;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k10; b9 -= k11;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k8; b7 -= k9;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k6; b5 -= k7;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k4; b3 -= k5;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k2; b1 -= k3;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k15 + t1; b15 -= k16 + 18;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k13; b13 -= k14 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k11; b11 -= k12;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k9; b9 -= k10;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k7; b7 -= k8;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k5; b5 -= k6;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k3; b3 -= k4;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k1; b1 -= k2;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k14 + t0; b15 -= k15 + 17;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k12; b13 -= k13 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k10; b11 -= k11;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k8; b9 -= k9;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k6; b7 -= k7;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k4; b5 -= k5;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k2; b3 -= k3;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k0; b1 -= k1;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k13 + t2; b15 -= k14 + 16;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k11; b13 -= k12 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k9; b11 -= k10;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k7; b9 -= k8;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k5; b7 -= k6;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k3; b5 -= k4;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k1; b3 -= k2;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k16; b1 -= k0;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k12 + t1; b15 -= k13 + 15;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k10; b13 -= k11 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k8; b11 -= k9;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k6; b9 -= k7;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k4; b7 -= k5;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k2; b5 -= k3;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k0; b3 -= k1;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k15; b1 -= k16;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k11 + t0; b15 -= k12 + 14;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k9; b13 -= k10 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k7; b11 -= k8;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k5; b9 -= k6;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k3; b7 -= k4;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k1; b5 -= k2;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k16; b3 -= k0;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k14; b1 -= k15;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k10 + t2; b15 -= k11 + 13;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k8; b13 -= k9 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k6; b11 -= k7;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k4; b9 -= k5;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k2; b7 -= k3;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k0; b5 -= k1;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k15; b3 -= k16;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k13; b1 -= k14;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k9 + t1; b15 -= k10 + 12;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k7; b13 -= k8 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k5; b11 -= k6;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k3; b9 -= k4;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k1; b7 -= k2;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k16; b5 -= k0;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k14; b3 -= k15;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k12; b1 -= k13;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k8 + t0; b15 -= k9 + 11;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k6; b13 -= k7 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k4; b11 -= k5;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k2; b9 -= k3;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k0; b7 -= k1;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k15; b5 -= k16;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k13; b3 -= k14;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k11; b1 -= k12;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k7 + t2; b15 -= k8 + 10;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k5; b13 -= k6 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k3; b11 -= k4;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k1; b9 -= k2;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k16; b7 -= k0;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k14; b5 -= k15;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k12; b3 -= k13;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k10; b1 -= k11;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k6 + t1; b15 -= k7 + 9;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k4; b13 -= k5 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k2; b11 -= k3;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k0; b9 -= k1;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k15; b7 -= k16;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k13; b5 -= k14;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k11; b3 -= k12;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k9; b1 -= k10;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k5 + t0; b15 -= k6 + 8;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k3; b13 -= k4 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k1; b11 -= k2;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k16; b9 -= k0;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k14; b7 -= k15;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k12; b5 -= k13;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k10; b3 -= k11;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k8; b1 -= k9;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k4 + t2; b15 -= k5 + 7;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k2; b13 -= k3 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k0; b11 -= k1;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k15; b9 -= k16;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k13; b7 -= k14;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k11; b5 -= k12;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k9; b3 -= k10;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k7; b1 -= k8;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k3 + t1; b15 -= k4 + 6;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k1; b13 -= k2 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k16; b11 -= k0;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k14; b9 -= k15;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k12; b7 -= k13;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k10; b5 -= k11;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k8; b3 -= k9;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k6; b1 -= k7;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k2 + t0; b15 -= k3 + 5;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k0; b13 -= k1 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k15; b11 -= k16;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k13; b9 -= k14;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k11; b7 -= k12;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k9; b5 -= k10;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k7; b3 -= k8;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k5; b1 -= k6;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k1 + t2; b15 -= k2 + 4;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k16; b13 -= k0 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k14; b11 -= k15;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k12; b9 -= k13;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k10; b7 -= k11;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k8; b5 -= k9;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k6; b3 -= k7;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k4; b1 -= k5;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k0 + t1; b15 -= k1 + 3;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k15; b13 -= k16 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k13; b11 -= k14;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k11; b9 -= k12;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k9; b7 -= k10;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k7; b5 -= k8;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k5; b3 -= k6;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k3; b1 -= k4;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k16 + t0; b15 -= k0 + 2;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k14; b13 -= k15 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k12; b11 -= k13;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k10; b9 -= k11;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k8; b7 -= k9;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k6; b5 -= k7;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k4; b3 -= k5;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k2; b1 -= k3;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k15 + t2; b15 -= k16 + 1;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k13; b13 -= k14 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k11; b11 -= k12;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k9; b9 -= k10;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k7; b7 -= k8;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k5; b5 -= k6;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k3; b3 -= k4;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k1; b1 -= k2;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k14 + t1; b15 -= k15;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k12; b13 -= k13 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k10; b11 -= k11;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k8; b9 -= k9;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k6; b7 -= k7;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k4; b5 -= k5;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k2; b3 -= k3;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k0; b1 -= k1;
+	b0 -= k3;
+	b1 -= k4;
+	b2 -= k5;
+	b3 -= k6;
+	b4 -= k7;
+	b5 -= k8;
+	b6 -= k9;
+	b7 -= k10;
+	b8 -= k11;
+	b9 -= k12;
+	b10 -= k13;
+	b11 -= k14;
+	b12 -= k15;
+	b13 -= k16 + t2;
+	b14 -= k0 + t0;
+	b15 -= k1 + 20;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k16 + t2; b15 -= k0 + 19;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k14; b13 -= k15 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k12; b11 -= k13;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k10; b9 -= k11;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k8; b7 -= k9;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k6; b5 -= k7;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k4; b3 -= k5;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k2; b1 -= k3;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k15 + t1; b15 -= k16 + 18;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k13; b13 -= k14 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k11; b11 -= k12;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k9; b9 -= k10;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k7; b7 -= k8;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k5; b5 -= k6;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k3; b3 -= k4;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k1; b1 -= k2;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k14 + t0; b15 -= k15 + 17;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k12; b13 -= k13 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k10; b11 -= k11;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k8; b9 -= k9;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k6; b7 -= k7;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k4; b5 -= k5;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k2; b3 -= k3;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k0; b1 -= k1;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k13 + t2; b15 -= k14 + 16;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k11; b13 -= k12 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k9; b11 -= k10;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k7; b9 -= k8;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k5; b7 -= k6;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k3; b5 -= k4;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k1; b3 -= k2;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k16; b1 -= k0;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k12 + t1; b15 -= k13 + 15;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k10; b13 -= k11 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k8; b11 -= k9;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k6; b9 -= k7;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k4; b7 -= k5;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k2; b5 -= k3;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k0; b3 -= k1;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k15; b1 -= k16;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k11 + t0; b15 -= k12 + 14;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k9; b13 -= k10 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k7; b11 -= k8;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k5; b9 -= k6;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k3; b7 -= k4;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k1; b5 -= k2;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k16; b3 -= k0;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k14; b1 -= k15;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k10 + t2; b15 -= k11 + 13;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k8; b13 -= k9 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k6; b11 -= k7;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k4; b9 -= k5;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k2; b7 -= k3;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k0; b5 -= k1;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k15; b3 -= k16;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k13; b1 -= k14;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k9 + t1; b15 -= k10 + 12;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k7; b13 -= k8 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k5; b11 -= k6;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k3; b9 -= k4;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k1; b7 -= k2;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k16; b5 -= k0;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k14; b3 -= k15;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k12; b1 -= k13;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k8 + t0; b15 -= k9 + 11;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k6; b13 -= k7 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k4; b11 -= k5;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k2; b9 -= k3;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k0; b7 -= k1;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k15; b5 -= k16;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k13; b3 -= k14;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k11; b1 -= k12;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k7 + t2; b15 -= k8 + 10;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k5; b13 -= k6 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k3; b11 -= k4;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k1; b9 -= k2;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k16; b7 -= k0;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k14; b5 -= k15;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k12; b3 -= k13;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k10; b1 -= k11;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k6 + t1; b15 -= k7 + 9;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k4; b13 -= k5 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k2; b11 -= k3;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k0; b9 -= k1;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k15; b7 -= k16;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k13; b5 -= k14;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k11; b3 -= k12;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k9; b1 -= k10;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k5 + t0; b15 -= k6 + 8;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k3; b13 -= k4 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k1; b11 -= k2;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k16; b9 -= k0;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k14; b7 -= k15;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k12; b5 -= k13;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k10; b3 -= k11;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k8; b1 -= k9;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k4 + t2; b15 -= k5 + 7;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k2; b13 -= k3 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k0; b11 -= k1;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k15; b9 -= k16;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k13; b7 -= k14;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k11; b5 -= k12;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k9; b3 -= k10;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k7; b1 -= k8;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k3 + t1; b15 -= k4 + 6;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k1; b13 -= k2 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k16; b11 -= k0;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k14; b9 -= k15;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k12; b7 -= k13;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k10; b5 -= k11;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k8; b3 -= k9;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k6; b1 -= k7;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k2 + t0; b15 -= k3 + 5;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k0; b13 -= k1 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k15; b11 -= k16;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k13; b9 -= k14;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k11; b7 -= k12;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k9; b5 -= k10;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k7; b3 -= k8;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k5; b1 -= k6;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k1 + t2; b15 -= k2 + 4;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k16; b13 -= k0 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k14; b11 -= k15;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k12; b9 -= k13;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k10; b7 -= k11;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k8; b5 -= k9;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k6; b3 -= k7;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k4; b1 -= k5;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k0 + t1; b15 -= k1 + 3;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k15; b13 -= k16 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k13; b11 -= k14;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k11; b9 -= k12;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k9; b7 -= k10;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k7; b5 -= k8;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k5; b3 -= k6;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k3; b1 -= k4;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k16 + t0; b15 -= k0 + 2;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k14; b13 -= k15 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k12; b11 -= k13;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k10; b9 -= k11;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k8; b7 -= k9;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k6; b5 -= k7;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k4; b3 -= k5;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k2; b1 -= k3;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k15 + t2; b15 -= k16 + 1;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k13; b13 -= k14 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k11; b11 -= k12;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k9; b9 -= k10;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k7; b7 -= k8;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k5; b5 -= k6;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k3; b3 -= k4;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k1; b1 -= k2;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k14 + t1; b15 -= k15;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k12; b13 -= k13 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k10; b11 -= k11;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k8; b9 -= k9;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k6; b7 -= k7;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k4; b5 -= k5;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k2; b3 -= k3;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k0; b1 -= k1;
 
-            output[15] = b15;
-            output[14] = b14;
-            output[13] = b13;
-            output[12] = b12;
-            output[11] = b11;
-            output[10] = b10;
-            output[9] = b9;
-            output[8] = b8;
-            output[7] = b7;
-            output[6] = b6;
-            output[5] = b5;
-            output[4] = b4;
-            output[3] = b3;
-            output[2] = b2;
-            output[1] = b1;
-            output[0] = b0;
+	output[15] = b15;
+	output[14] = b14;
+	output[13] = b13;
+	output[12] = b12;
+	output[11] = b11;
+	output[10] = b10;
+	output[9] = b9;
+	output[8] = b8;
+	output[7] = b7;
+	output[6] = b6;
+	output[5] = b5;
+	output[4] = b4;
+	output[3] = b3;
+	output[2] = b2;
+	output[1] = b1;
+	output[0] = b0;
 }
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index 09ea5099bc76..da3b8357e47f 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -3,346 +3,345 @@
 
 
 void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
-  {
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+	b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k3; b2 += b3 + k2 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k4 + 1; b2 += b3 + k3 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k3; b2 += b3 + k2 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k4 + 1; b2 += b3 + k3 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k0 + 2; b2 += b3 + k4 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k4 + t0; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k1 + 3; b2 += b3 + k0 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k0 + 2; b2 += b3 + k4 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k4 + t0; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k1 + 3; b2 += b3 + k0 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k0 + t1; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k2 + 4; b2 += b3 + k1 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k1 + t2; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k3 + 5; b2 += b3 + k2 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k0 + t1; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k2 + 4; b2 += b3 + k1 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k1 + t2; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k3 + 5; b2 += b3 + k2 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k2 + t0; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k4 + 6; b2 += b3 + k3 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k3 + t1; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k0 + 7; b2 += b3 + k4 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k2 + t0; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k4 + 6; b2 += b3 + k3 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k3 + t1; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k0 + 7; b2 += b3 + k4 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k4 + t2; b0 += b1 + k3; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k1 + 8; b2 += b3 + k0 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k0 + t0; b0 += b1 + k4; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k2 + 9; b2 += b3 + k1 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k4 + t2; b0 += b1 + k3; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k1 + 8; b2 += b3 + k0 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k0 + t0; b0 += b1 + k4; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k2 + 9; b2 += b3 + k1 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k1 + t1; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k3 + 10; b2 += b3 + k2 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k2 + t2; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k4 + 11; b2 += b3 + k3 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k1 + t1; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k3 + 10; b2 += b3 + k2 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k2 + t2; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k4 + 11; b2 += b3 + k3 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k3 + t0; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k0 + 12; b2 += b3 + k4 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k4 + t1; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k1 + 13; b2 += b3 + k0 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k3 + t0; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k0 + 12; b2 += b3 + k4 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k4 + t1; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k1 + 13; b2 += b3 + k0 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k0 + t2; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k2 + 14; b2 += b3 + k1 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k3 + 15; b2 += b3 + k2 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k0 + t2; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k2 + 14; b2 += b3 + k1 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k3 + 15; b2 += b3 + k2 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k4 + 16; b2 += b3 + k3 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k0 + 17; b2 += b3 + k4 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k4 + 16; b2 += b3 + k3 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k0 + 17; b2 += b3 + k4 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-    output[0] = b0 + k3;
-    output[1] = b1 + k4 + t0;
-    output[2] = b2 + k0 + t1;
-    output[3] = b3 + k1 + 18;
-  }
+	output[0] = b0 + k3;
+	output[1] = b1 + k4 + t0;
+	output[2] = b2 + k0 + t1;
+	output[3] = b3 + k1 + 18;
+}
 
 void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
-  {
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-    u64 tmp;
+	u64 tmp;
 
-    b0 -= k3;
-    b1 -= k4 + t0;
-    b2 -= k0 + t1;
-    b3 -= k1 + 18;
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t0; b3 -= k0 + 17;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t2; b3 -= k4 + 16;
+	b0 -= k3;
+	b1 -= k4 + t0;
+	b2 -= k0 + t1;
+	b3 -= k1 + 18;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t0; b3 -= k0 + 17;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t2; b3 -= k4 + 16;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t1; b3 -= k3 + 15;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t0; b3 -= k2 + 14;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t1; b3 -= k3 + 15;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t0; b3 -= k2 + 14;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t2; b3 -= k1 + 13;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t1; b3 -= k0 + 12;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t2; b3 -= k1 + 13;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t1; b3 -= k0 + 12;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t0; b3 -= k4 + 11;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t2; b3 -= k3 + 10;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t0; b3 -= k4 + 11;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t2; b3 -= k3 + 10;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k4; b1 -= k0 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k1 + t1; b3 -= k2 + 9;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k3; b1 -= k4 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k0 + t0; b3 -= k1 + 8;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k4; b1 -= k0 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k1 + t1; b3 -= k2 + 9;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k3; b1 -= k4 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k0 + t0; b3 -= k1 + 8;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t2; b3 -= k0 + 7;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t1; b3 -= k4 + 6;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t2; b3 -= k0 + 7;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t1; b3 -= k4 + 6;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t0; b3 -= k3 + 5;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t2; b3 -= k2 + 4;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t0; b3 -= k3 + 5;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t2; b3 -= k2 + 4;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t1; b3 -= k1 + 3;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t0; b3 -= k0 + 2;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t1; b3 -= k1 + 3;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t0; b3 -= k0 + 2;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t2; b3 -= k4 + 1;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t1; b3 -= k3;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t2; b3 -= k4 + 1;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t1; b3 -= k3;
 
-    output[0] = b0;
-    output[1] = b1;
-    output[2] = b2;
-    output[3] = b3;
-  }
+	output[0] = b0;
+	output[1] = b1;
+	output[2] = b2;
+	output[3] = b3;
+}
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 5262f5a8f21b..dc96ba279720 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -3,640 +3,638 @@
 
 
 void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
-    {
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3],
-      b4 = input[4], b5 = input[5],
-      b6 = input[6], b7 = input[7];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-      k8 = keyCtx->key[8];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k7; b6 += b7 + k6 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k8 + 1; b6 += b7 + k7 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k0 + 2; b6 += b7 + k8 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k1 + 3; b6 += b7 + k0 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k2 + 4; b6 += b7 + k1 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k3 + 5; b6 += b7 + k2 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k0; b2 += b3 + k8; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k4 + 6; b6 += b7 + k3 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k5 + 7; b6 += b7 + k4 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k0; b0 += b1 + k8; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k6 + 8; b6 += b7 + k5 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k7 + 9; b6 += b7 + k6 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k8 + 10; b6 += b7 + k7 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k0 + 11; b6 += b7 + k8 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k1 + 12; b6 += b7 + k0 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k2 + 13; b6 += b7 + k1 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k3 + 14; b6 += b7 + k2 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k0; b2 += b3 + k8; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k4 + 15; b6 += b7 + k3 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k5 + 16; b6 += b7 + k4 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k0; b0 += b1 + k8; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k6 + 17; b6 += b7 + k5 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
 
-        b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k3; b2 += b3 + k2; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k7; b6 += b7 + k6 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k2; b0 += b1 + k1; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k4; b2 += b3 + k3; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k8 + 1; b6 += b7 + k7 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k3; b0 += b1 + k2; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k5; b2 += b3 + k4; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k0 + 2; b6 += b7 + k8 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k4; b0 += b1 + k3; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k6; b2 += b3 + k5; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k1 + 3; b6 += b7 + k0 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k5; b0 += b1 + k4; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k7; b2 += b3 + k6; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k2 + 4; b6 += b7 + k1 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k6; b0 += b1 + k5; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k8; b2 += b3 + k7; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k3 + 5; b6 += b7 + k2 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k7; b0 += b1 + k6; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k0; b2 += b3 + k8; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k4 + 6; b6 += b7 + k3 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k8; b0 += b1 + k7; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k1; b2 += b3 + k0; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k5 + 7; b6 += b7 + k4 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k0; b0 += b1 + k8; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k2; b2 += b3 + k1; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k6 + 8; b6 += b7 + k5 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k1; b0 += b1 + k0; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k3; b2 += b3 + k2; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k7 + 9; b6 += b7 + k6 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k2; b0 += b1 + k1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k4; b2 += b3 + k3; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k8 + 10; b6 += b7 + k7 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k3; b0 += b1 + k2; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k5; b2 += b3 + k4; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k0 + 11; b6 += b7 + k8 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k4; b0 += b1 + k3; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k6; b2 += b3 + k5; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k1 + 12; b6 += b7 + k0 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k5; b0 += b1 + k4; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k7; b2 += b3 + k6; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k2 + 13; b6 += b7 + k1 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k6; b0 += b1 + k5; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k8; b2 += b3 + k7; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k3 + 14; b6 += b7 + k2 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k7; b0 += b1 + k6; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k0; b2 += b3 + k8; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k4 + 15; b6 += b7 + k3 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k8; b0 += b1 + k7; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k1; b2 += b3 + k0; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k5 + 16; b6 += b7 + k4 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k0; b0 += b1 + k8; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k2; b2 += b3 + k1; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k6 + 17; b6 += b7 + k5 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-        output[0] = b0 + k0;
-        output[1] = b1 + k1;
-        output[2] = b2 + k2;
-        output[3] = b3 + k3;
-        output[4] = b4 + k4;
-        output[5] = b5 + k5 + t0;
-        output[6] = b6 + k6 + t1;
-        output[7] = b7 + k7 + 18;
-    }
+	output[0] = b0 + k0;
+	output[1] = b1 + k1;
+	output[2] = b2 + k2;
+	output[3] = b3 + k3;
+	output[4] = b4 + k4;
+	output[5] = b5 + k5 + t0;
+	output[6] = b6 + k6 + t1;
+	output[7] = b7 + k7 + 18;
+}
 
 void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
-    {
-
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3],
-      b4 = input[4], b5 = input[5],
-      b6 = input[6], b7 = input[7];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-      k8 = keyCtx->key[8];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-      u64 tmp;
+	u64 tmp;
 
-        b0 -= k0;
-        b1 -= k1;
-        b2 -= k2;
-        b3 -= k3;
-        b4 -= k4;
-        b5 -= k5 + t0;
-        b6 -= k6 + t1;
-        b7 -= k7 + 18;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k5 + t0; b7 -= k6 + 17;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k3; b5 -= k4 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k1; b3 -= k2;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k8; b1 -= k0;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k4 + t2; b7 -= k5 + 16;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k2; b5 -= k3 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k0; b3 -= k1;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k7; b1 -= k8;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k3 + t1; b7 -= k4 + 15;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k1; b5 -= k2 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k8; b3 -= k0;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k6; b1 -= k7;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k2 + t0; b7 -= k3 + 14;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k0; b5 -= k1 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k7; b3 -= k8;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k5; b1 -= k6;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k1 + t2; b7 -= k2 + 13;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k8; b5 -= k0 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k6; b3 -= k7;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k4; b1 -= k5;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k0 + t1; b7 -= k1 + 12;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k7; b5 -= k8 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k5; b3 -= k6;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k3; b1 -= k4;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k8 + t0; b7 -= k0 + 11;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k6; b5 -= k7 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k4; b3 -= k5;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k2; b1 -= k3;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k7 + t2; b7 -= k8 + 10;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k5; b5 -= k6 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k3; b3 -= k4;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k1; b1 -= k2;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k6 + t1; b7 -= k7 + 9;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k4; b5 -= k5 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k2; b3 -= k3;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k0; b1 -= k1;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k5 + t0; b7 -= k6 + 8;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k3; b5 -= k4 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k1; b3 -= k2;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k8; b1 -= k0;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k4 + t2; b7 -= k5 + 7;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k2; b5 -= k3 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k0; b3 -= k1;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k7; b1 -= k8;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k3 + t1; b7 -= k4 + 6;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k1; b5 -= k2 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k8; b3 -= k0;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k6; b1 -= k7;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k2 + t0; b7 -= k3 + 5;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k0; b5 -= k1 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k7; b3 -= k8;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k5; b1 -= k6;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k1 + t2; b7 -= k2 + 4;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k8; b5 -= k0 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k6; b3 -= k7;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k4; b1 -= k5;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k0 + t1; b7 -= k1 + 3;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k7; b5 -= k8 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k5; b3 -= k6;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k3; b1 -= k4;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k8 + t0; b7 -= k0 + 2;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k6; b5 -= k7 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k4; b3 -= k5;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k2; b1 -= k3;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k7 + t2; b7 -= k8 + 1;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k5; b5 -= k6 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k3; b3 -= k4;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k1; b1 -= k2;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k6 + t1; b7 -= k7;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k4; b5 -= k5 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k2; b3 -= k3;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k0; b1 -= k1;
+	b0 -= k0;
+	b1 -= k1;
+	b2 -= k2;
+	b3 -= k3;
+	b4 -= k4;
+	b5 -= k5 + t0;
+	b6 -= k6 + t1;
+	b7 -= k7 + 18;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k5 + t0; b7 -= k6 + 17;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k3; b5 -= k4 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k1; b3 -= k2;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k8; b1 -= k0;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k4 + t2; b7 -= k5 + 16;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k2; b5 -= k3 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k0; b3 -= k1;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k7; b1 -= k8;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k3 + t1; b7 -= k4 + 15;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k1; b5 -= k2 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k8; b3 -= k0;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k6; b1 -= k7;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k2 + t0; b7 -= k3 + 14;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k0; b5 -= k1 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k7; b3 -= k8;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k5; b1 -= k6;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k1 + t2; b7 -= k2 + 13;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k8; b5 -= k0 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k6; b3 -= k7;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k4; b1 -= k5;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k0 + t1; b7 -= k1 + 12;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k7; b5 -= k8 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k5; b3 -= k6;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k3; b1 -= k4;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k8 + t0; b7 -= k0 + 11;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k6; b5 -= k7 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k4; b3 -= k5;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k2; b1 -= k3;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k7 + t2; b7 -= k8 + 10;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k5; b5 -= k6 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k3; b3 -= k4;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k1; b1 -= k2;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k6 + t1; b7 -= k7 + 9;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k4; b5 -= k5 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k2; b3 -= k3;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k0; b1 -= k1;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k5 + t0; b7 -= k6 + 8;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k3; b5 -= k4 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k1; b3 -= k2;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k8; b1 -= k0;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k4 + t2; b7 -= k5 + 7;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k2; b5 -= k3 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k0; b3 -= k1;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k7; b1 -= k8;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k3 + t1; b7 -= k4 + 6;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k1; b5 -= k2 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k8; b3 -= k0;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k6; b1 -= k7;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k2 + t0; b7 -= k3 + 5;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k0; b5 -= k1 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k7; b3 -= k8;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k5; b1 -= k6;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k1 + t2; b7 -= k2 + 4;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k8; b5 -= k0 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k6; b3 -= k7;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k4; b1 -= k5;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k0 + t1; b7 -= k1 + 3;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k7; b5 -= k8 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k5; b3 -= k6;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k3; b1 -= k4;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k8 + t0; b7 -= k0 + 2;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k6; b5 -= k7 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k4; b3 -= k5;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k2; b1 -= k3;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k7 + t2; b7 -= k8 + 1;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k5; b5 -= k6 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k3; b3 -= k4;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k1; b1 -= k2;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k6 + t1; b7 -= k7;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k4; b5 -= k5 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k2; b3 -= k3;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k0; b1 -= k1;
 
-    output[0] = b0;
-    output[1] = b1;
-    output[2] = b2;
-    output[3] = b3;
+	output[0] = b0;
+	output[1] = b1;
+	output[2] = b2;
+	output[3] = b3;
 
-        output[7] = b7;
-        output[6] = b6;
-        output[5] = b5;
-        output[4] = b4;
+	output[7] = b7;
+	output[6] = b6;
+	output[5] = b5;
+	output[4] = b4;
 }
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 53f46f6cb9ca..e8ce06a9122f 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -4,75 +4,75 @@
 #include <threefishApi.h>
 
 void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize,
-                     u64 *keyData, u64 *tweak)
+		     u64 *keyData, u64 *tweak)
 {
-    int keyWords = stateSize / 64;
-    int i;
-    u64 parity = KeyScheduleConst;
+	int keyWords = stateSize / 64;
+	int i;
+	u64 parity = KeyScheduleConst;
 
-    keyCtx->tweak[0] = tweak[0];
-    keyCtx->tweak[1] = tweak[1];
-    keyCtx->tweak[2] = tweak[0] ^ tweak[1];
+	keyCtx->tweak[0] = tweak[0];
+	keyCtx->tweak[1] = tweak[1];
+	keyCtx->tweak[2] = tweak[0] ^ tweak[1];
 
-    for (i = 0; i < keyWords; i++) {
-        keyCtx->key[i] = keyData[i];
-        parity ^= keyData[i];
-    }
-    keyCtx->key[i] = parity;
-    keyCtx->stateSize = stateSize;
+	for (i = 0; i < keyWords; i++) {
+		keyCtx->key[i] = keyData[i];
+		parity ^= keyData[i];
+	}
+	keyCtx->key[i] = parity;
+	keyCtx->stateSize = stateSize;
 }
 
 void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
-                                u8 *out)
+				u8 *out)
 {
-    u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
-    u64 cipher[SKEIN_MAX_STATE_WORDS];
-    
-    Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
-    threefishEncryptBlockWords(keyCtx, plain, cipher);
-    Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
+	u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+	u64 cipher[SKEIN_MAX_STATE_WORDS];
+
+	Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
+	threefishEncryptBlockWords(keyCtx, plain, cipher);
+	Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
 }
 
 void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in,
-                                u64 *out)
+				u64 *out)
 {
-    switch (keyCtx->stateSize) {
-        case Threefish256:
-            threefishEncrypt256(keyCtx, in, out);
-            break;
-        case Threefish512:
-            threefishEncrypt512(keyCtx, in, out);
-            break;
-        case Threefish1024:
-            threefishEncrypt1024(keyCtx, in, out);
-            break;
-    }
+	switch (keyCtx->stateSize) {
+	case Threefish256:
+		threefishEncrypt256(keyCtx, in, out);
+		break;
+	case Threefish512:
+		threefishEncrypt512(keyCtx, in, out);
+		break;
+	case Threefish1024:
+		threefishEncrypt1024(keyCtx, in, out);
+		break;
+	}
 }
 
 void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
-                                u8 *out)
+				u8 *out)
 {
-    u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
-    u64 cipher[SKEIN_MAX_STATE_WORDS];
-    
-    Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
-    threefishDecryptBlockWords(keyCtx, cipher, plain);
-    Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
+	u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+	u64 cipher[SKEIN_MAX_STATE_WORDS];
+
+	Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
+	threefishDecryptBlockWords(keyCtx, cipher, plain);
+	Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
 }
 
 void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in,
-                                u64 *out)
+				u64 *out)
 {
-    switch (keyCtx->stateSize) {
-        case Threefish256:
-            threefishDecrypt256(keyCtx, in, out);
-            break;
-        case Threefish512:
-            threefishDecrypt512(keyCtx, in, out);
-            break;
-        case Threefish1024:
-            threefishDecrypt1024(keyCtx, in, out);
-            break;
-    }
+	switch (keyCtx->stateSize) {
+	case Threefish256:
+		threefishDecrypt256(keyCtx, in, out);
+		break;
+	case Threefish512:
+		threefishDecrypt512(keyCtx, in, out);
+		break;
+	case Threefish1024:
+		threefishDecrypt1024(keyCtx, in, out);
+		break;
+	}
 }
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 14/22] staging: crypto: skein: remove trailing whitespace
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (12 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 13/22] staging: crypto: skein: fix leading whitespace Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 15/22] staging: crypto: skein: cleanup >80 character lines Jason Cooper
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 16 +++++-----
 drivers/staging/skein/include/skeinApi.h     | 44 ++++++++++++++--------------
 drivers/staging/skein/include/threefishApi.h | 40 ++++++++++++-------------
 drivers/staging/skein/skeinBlockNo3F.c       |  6 ++--
 drivers/staging/skein/skein_block.c          | 20 ++++++-------
 5 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 906bcee41c39..dd9a210cf5dd 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -9,7 +9,7 @@
 ** This algorithm and source code is released to the public domain.
 **
 ***************************************************************************
-** 
+**
 ** The following compile-time switches may be defined to control some
 ** tradeoffs between speed, code size, error checking, and security.
 **
@@ -20,8 +20,8 @@
 **                            [default: no callouts (no overhead)]
 **
 **  SKEIN_ERR_CHECK        -- how error checking is handled inside Skein
-**                            code. If not defined, most error checking 
-**                            is disabled (for performance). Otherwise, 
+**                            code. If not defined, most error checking
+**                            is disabled (for performance). Otherwise,
 **                            the switch value is interpreted as:
 **                                0: use assert()      to flag errors
 **                                1: return SKEIN_FAIL to flag errors
@@ -109,12 +109,12 @@ int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
 **   After an InitExt() call, just use Update/Final calls as with Init().
 **
 **   Notes: Same parameters as _Init() calls, plus treeInfo/key/keyBytes.
-**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL, 
+**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL,
 **              the results of InitExt() are identical to calling Init().
 **          The function Init() may be called once to "precompute" the IV for
 **              a given hashBitLen value, then by saving a copy of the context
 **              the IV computation may be avoided in later calls.
-**          Similarly, the function InitExt() may be called once per MAC key 
+**          Similarly, the function InitExt() may be called once per MAC key
 **              to precompute the MAC IV, then a copy of the context saved and
 **              reused for each new MAC computation.
 **/
@@ -142,7 +142,7 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 
 /*****************************************************************
 ** "Internal" Skein definitions
-**    -- not needed for sequential hashing API, but will be 
+**    -- not needed for sequential hashing API, but will be
 **           helpful for other uses of Skein (e.g., tree hash mode).
 **    -- included here so that they can be shared between
 **           reference and optimized code.
@@ -269,8 +269,8 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 /*****************************************************************
 ** Skein block function constants (shared across Ref and Opt code)
 ******************************************************************/
-enum    
-	{   
+enum
+	{
 	    /* Skein_256 round rotation constants */
 	R_256_0_0 = 14, R_256_0_1 = 16,
 	R_256_1_0 = 52, R_256_1_1 = 57,
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 0d7d59eff460..ace931a67c23 100644
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -36,46 +36,46 @@ OTHER DEALINGS IN THE SOFTWARE.
  * of Skein. The design and the way to use the functions follow the openSSL
  * design but at the same time take care of some Skein specific behaviour
  * and possibilities.
- * 
+ *
  * The functions enable applications to create a normal Skein hashes and
  * message authentication codes (MAC).
- * 
+ *
  * Using these functions is simple and straight forward:
- * 
+ *
  * @code
- * 
+ *
  * #include <skeinApi.h>
- * 
+ *
  * ...
  * struct skein_ctx ctx;             // a Skein hash or MAC context
- * 
+ *
  * // prepare context, here for a Skein with a state size of 512 bits.
  * skeinCtxPrepare(&ctx, Skein512);
- * 
+ *
  * // Initialize the context to set the requested hash length in bits
  * // here request a output hash size of 31 bits (Skein supports variable
  * // output sizes even very strange sizes)
  * skeinInit(&ctx, 31);
- * 
+ *
  * // Now update Skein with any number of message bits. A function that
  * // takes a number of bytes is also available.
  * skeinUpdateBits(&ctx, message, msgLength);
- * 
+ *
  * // Now get the result of the Skein hash. The output buffer must be
  * // large enough to hold the request number of output bits. The application
  * // may now extract the bits.
  * skeinFinal(&ctx, result);
  * ...
  * @endcode
- * 
+ *
  * An application may use @c skeinReset to reset a Skein context and use
  * it for creation of another hash with the same Skein state size and output
  * bit length. In this case the API implementation restores some internal
  * internal state data and saves a full Skein initialization round.
- * 
- * To create a MAC the application just uses @c skeinMacInit instead of 
+ *
+ * To create a MAC the application just uses @c skeinMacInit instead of
  * @c skeinInit. All other functions calls remain the same.
- * 
+ *
  */
 
 #include <linux/types.h>
@@ -111,7 +111,7 @@ struct skein_ctx {
 
 /**
  * Prepare a Skein context.
- * 
+ *
  * An application must call this function before it can use the Skein
  * context. The functions clears memory and initializes size dependent
  * variables.
@@ -128,7 +128,7 @@ int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size);
 /**
  * Initialize a Skein context.
  *
- * Initializes the context with this data and saves the resulting Skein 
+ * Initializes the context with this data and saves the resulting Skein
  * state variables for further use.
  *
  * @param ctx
@@ -143,11 +143,11 @@ int skeinInit(struct skein_ctx *ctx, size_t hashBitLen);
 
 /**
  * Resets a Skein context for further use.
- * 
- * Restores the saved chaining variables to reset the Skein context. 
- * Thus applications can reuse the same setup to  process several 
+ *
+ * Restores the saved chaining variables to reset the Skein context.
+ * Thus applications can reuse the same setup to  process several
  * messages. This saves a complete Skein initialization cycle.
- * 
+ *
  * @param ctx
  *     Pointer to a pre-initialized Skein MAC context
  */
@@ -155,8 +155,8 @@ void skeinReset(struct skein_ctx *ctx);
 
 /**
  * Initializes a Skein context for MAC usage.
- * 
- * Initializes the context with this data and saves the resulting Skein 
+ *
+ * Initializes the context with this data and saves the resulting Skein
  * state variables for further use.
  *
  * Applications call the normal Skein functions to update the MAC and
@@ -209,7 +209,7 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
 
 /**
  * Finalize Skein and return the hash.
- * 
+ *
  * Before an application can reuse a Skein setup the application must
  * reset the Skein context.
  *
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 199257e37813..5d92bbff8c9f 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -8,14 +8,14 @@
  * @{
  *
  * This API and the functions that implement this API simplify the usage
- * of the Threefish cipher. The design and the way to use the functions 
+ * of the Threefish cipher. The design and the way to use the functions
  * follow the openSSL design but at the same time take care of some Threefish
  * specific behaviour and possibilities.
  *
  * These are the low level functions that deal with Threefisch blocks only.
- * Implementations for cipher modes such as ECB, CFB, or CBC may use these 
+ * Implementations for cipher modes such as ECB, CFB, or CBC may use these
  * functions.
- * 
+ *
 @code
     // Threefish cipher context data
     struct threefish_key keyCtx;
@@ -44,7 +44,7 @@ enum threefish_size {
 
 /**
  * Context for Threefish key and tweak words.
- * 
+ *
  * This structure was setup with some know-how of the internal
  * Skein structures, in particular ordering of header and size dependent
  * variables. If Skein implementation changes this, the adapt these
@@ -58,10 +58,10 @@ struct threefish_key {
 
 /**
  * Set Threefish key and tweak data.
- * 
+ *
  * This function sets the key and tweak data for the Threefish cipher of
  * the given size. The key data must have the same length (number of bits)
- * as the state size 
+ * as the state size
  *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
@@ -76,12 +76,12 @@ void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize
 
 /**
  * Encrypt Threefisch block (bytes).
- * 
- * The buffer must have at least the same length (number of bits) aas the 
+ *
+ * The buffer must have at least the same length (number of bits) aas the
  * state size for this key. The function uses the first @c stateSize bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
- * 
+ *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
  * @param in
@@ -93,14 +93,14 @@ void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
 /**
  * Encrypt Threefisch block (words).
- * 
- * The buffer must have at least the same length (number of bits) aas the 
+ *
+ * The buffer must have at least the same length (number of bits) aas the
  * state size for this key. The function uses the first @c stateSize bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
- * 
+ *
  * The wordsize ist set to 64 bits.
- * 
+ *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
  * @param in
@@ -112,12 +112,12 @@ void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out)
 
 /**
  * Decrypt Threefisch block (bytes).
- * 
- * The buffer must have at least the same length (number of bits) aas the 
+ *
+ * The buffer must have at least the same length (number of bits) aas the
  * state size for this key. The function uses the first @c stateSize bits
  * of the input buffer, decrypts them and stores the result in the output
  * buffer
- * 
+ *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
  * @param in
@@ -129,14 +129,14 @@ void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
 /**
  * Decrypt Threefisch block (words).
- * 
- * The buffer must have at least the same length (number of bits) aas the 
+ *
+ * The buffer must have at least the same length (number of bits) aas the
  * state size for this key. The function uses the first @c stateSize bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
- * 
+ *
  * The wordsize ist set to 64 bits.
- * 
+ *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
  * @param in
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 3c2878c966e1..376cd63d8f83 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -29,7 +29,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 			carry += words[i];
 			words[i] = carry;
 			carry >>= 32;
-		}        
+		}
 		tweak[0] = words[0] & 0xffffffffL;
 		tweak[0] |= (words[1] & 0xffffffffL) << 32;
 		tweak[1] |= words[2] & 0xffffffffL;
@@ -79,7 +79,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 			carry += words[i];
 			words[i] = carry;
 			carry >>= 32;
-		}        
+		}
 		tweak[0] = words[0] & 0xffffffffL;
 		tweak[0] |= (words[1] & 0xffffffffL) << 32;
 		tweak[1] |= words[2] & 0xffffffffL;
@@ -133,7 +133,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 			carry += words[i];
 			words[i] = carry;
 			carry >>= 32;
-		}        
+		}
 		tweak[0] = words[0] & 0xffffffffL;
 		tweak[0] |= (words[1] & 0xffffffffL) << 32;
 		tweak[1] |= words[2] & 0xffffffffL;
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index bb36860fafdf..d315f547feae 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -28,7 +28,7 @@
 #define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
 #define KW_TWK_BASE     (0)
 #define KW_KEY_BASE     (3)
-#define ks              (kw + KW_KEY_BASE)                
+#define ks              (kw + KW_KEY_BASE)
 #define ts              (kw + KW_TWK_BASE)
 
 #ifdef SKEIN_DEBUG
@@ -76,7 +76,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 		ts[0] += byteCntAdd;                    /* update processed length */
 
 		/* precompute the key schedule for this block */
-		ks[0] = ctx->X[0];     
+		ks[0] = ctx->X[0];
 		ks[1] = ctx->X[1];
 		ks[2] = ctx->X[2];
 		ks[3] = ctx->X[3];
@@ -103,7 +103,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 
-#if SKEIN_UNROLL_256 == 0                       
+#if SKEIN_UNROLL_256 == 0
 #define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
 	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
@@ -129,8 +129,8 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 
 	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
-#endif  
-		{    
+#endif
+		{
 #define R256_8_rounds(R)                  \
 		R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
 		R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
@@ -270,7 +270,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 		ks[5] = ctx->X[5];
 		ks[6] = ctx->X[6];
 		ks[7] = ctx->X[7];
-		ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
+		ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
 			ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
 
 		ts[2] = ts[0] ^ ts[1];
@@ -298,7 +298,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
 		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
 
-#if SKEIN_UNROLL_512 == 0                       
+#if SKEIN_UNROLL_512 == 0
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
 		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
 		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
@@ -529,7 +529,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 		X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
 		X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
 
-#if SKEIN_UNROLL_1024 == 0                      
+#if SKEIN_UNROLL_1024 == 0
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
 		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
 		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
@@ -551,7 +551,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 		X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
 		X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
 		X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
 		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
@@ -579,7 +579,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 		Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 
 		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
-#endif  
+#endif
 		{
 #define R1024_8_rounds(R)    /* do 8 full rounds */                               \
 			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 15/22] staging: crypto: skein: cleanup >80 character lines
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (13 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 14/22] staging: crypto: skein: remove trailing whitespace Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 16/22] staging: crypto: skein: fix do/while brace formatting Jason Cooper
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        |  175 +-
 drivers/staging/skein/include/threefishApi.h |   16 +-
 drivers/staging/skein/skein.c                |  586 ++-
 drivers/staging/skein/skeinApi.c             |   58 +-
 drivers/staging/skein/skeinBlockNo3F.c       |   27 +-
 drivers/staging/skein/skein_block.c          |  427 +-
 drivers/staging/skein/threefish1024Block.c   | 6152 ++++++++++++++++++++------
 drivers/staging/skein/threefish256Block.c    | 1398 ++++--
 drivers/staging/skein/threefish512Block.c    | 2775 +++++++++---
 drivers/staging/skein/threefishApi.c         |   13 +-
 10 files changed, 8919 insertions(+), 2708 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index dd9a210cf5dd..f92dc40711d1 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -39,12 +39,12 @@
 
 enum
 	{
-	SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
+	SKEIN_SUCCESS         =      0, /* return codes from Skein calls */
 	SKEIN_FAIL            =      1,
 	SKEIN_BAD_HASHLEN     =      2
 	};
 
-#define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
+#define  SKEIN_MODIFIER_WORDS   (2) /* number of modifier (tweak) words */
 
 #define  SKEIN_256_STATE_WORDS  (4)
 #define  SKEIN_512_STATE_WORDS  (8)
@@ -65,30 +65,30 @@ enum
 
 struct skein_ctx_hdr
 	{
-	size_t  hashBitLen;                      /* size of hash result, in bits */
-	size_t  bCnt;                            /* current byte count in buffer b[] */
-	u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+	size_t  hashBitLen;		/* size of hash result, in bits */
+	size_t  bCnt;			/* current byte count in buffer b[] */
+	u64  T[SKEIN_MODIFIER_WORDS];	/* tweak: T[0]=byte cnt, T[1]=flags */
 	};
 
-struct skein_256_ctx                               /*  256-bit Skein hash context structure */
+struct skein_256_ctx /* 256-bit Skein hash context structure */
 	{
-	struct skein_ctx_hdr h;                      /* common header context variables */
-	u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
-	u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	struct skein_ctx_hdr h;		/* common header context variables */
+	u64  X[SKEIN_256_STATE_WORDS];	/* chaining variables */
+	u8  b[SKEIN_256_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 	};
 
-struct skein_512_ctx                             /*  512-bit Skein hash context structure */
+struct skein_512_ctx /* 512-bit Skein hash context structure */
 	{
-	struct skein_ctx_hdr h;                      /* common header context variables */
-	u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
-	u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	struct skein_ctx_hdr h;		/* common header context variables */
+	u64  X[SKEIN_512_STATE_WORDS];	/* chaining variables */
+	u8  b[SKEIN_512_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 	};
 
-struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
+struct skein1024_ctx /* 1024-bit Skein hash context structure */
 	{
-	struct skein_ctx_hdr h;                      /* common header context variables */
-	u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
-	u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	struct skein_ctx_hdr h;		/* common header context variables */
+	u64  X[SKEIN1024_STATE_WORDS];	/* chaining variables */
+	u8  b[SKEIN1024_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 	};
 
 /*   Skein APIs for (incremental) "straight hashing" */
@@ -96,9 +96,12 @@ int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
 int  Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen);
 int  Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen);
 
-int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt);
+int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt);
+int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt);
 
 int  Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal);
 int  Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal);
@@ -118,9 +121,12 @@ int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
 **              to precompute the MAC IV, then a copy of the context saved and
 **              reused for each new MAC computation.
 **/
-int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes);
 
 /*
 **   Skein APIs for MAC and tree hash:
@@ -149,13 +155,13 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 ******************************************************************/
 
 /* tweak word T[1]: bit field starting positions */
-#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
+#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)      /* second word  */
 
-#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
-#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
-#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
-#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
-#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
+#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112) /* 112..118 hash tree level */
+#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119) /* 119 part. final in byte */
+#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120) /* 120..125 type field `*/
+#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126) /* 126      first blk flag */
+#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127) /* 127      final blk flag */
 
 /* tweak word T[1]: flag bit definition(s) */
 #define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
@@ -167,34 +173,37 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
 
 /* tweak word T[1]: block type field */
-#define SKEIN_BLK_TYPE_KEY       (0)                    /* key, for MAC and KDF */
-#define SKEIN_BLK_TYPE_CFG       (4)                    /* configuration block */
-#define SKEIN_BLK_TYPE_PERS      (8)                    /* personalization string */
-#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
-#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
-#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
-#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
-#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
-#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
-
-#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
-#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
-#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
-#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
-#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
-#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
+#define SKEIN_BLK_TYPE_KEY       (0) /* key, for MAC and KDF */
+#define SKEIN_BLK_TYPE_CFG       (4) /* configuration block */
+#define SKEIN_BLK_TYPE_PERS      (8) /* personalization string */
+#define SKEIN_BLK_TYPE_PK       (12) /* pubkey (for digital sigs) */
+#define SKEIN_BLK_TYPE_KDF      (16) /* key identifier for KDF */
+#define SKEIN_BLK_TYPE_NONCE    (20) /* nonce for PRNG */
+#define SKEIN_BLK_TYPE_MSG      (48) /* message processing */
+#define SKEIN_BLK_TYPE_OUT      (63) /* output stage */
+#define SKEIN_BLK_TYPE_MASK     (63) /* bit field mask */
+
+#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << \
+					SKEIN_T1_POS_BLK_TYPE)
+#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* for MAC and KDF */
+#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* config block */
+#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization */
+#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* pubkey (for sigs) */
+#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key ident for KDF */
 #define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
 #define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
 #define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
 #define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */
 
-#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
-#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)
+#define SKEIN_T1_BLK_TYPE_CFG_FINAL    (SKEIN_T1_BLK_TYPE_CFG | \
+					SKEIN_T1_FLAG_FINAL)
+#define SKEIN_T1_BLK_TYPE_OUT_FINAL    (SKEIN_T1_BLK_TYPE_OUT | \
+					SKEIN_T1_FLAG_FINAL)
 
 #define SKEIN_VERSION           (1)
 
 #ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
-#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
+#define SKEIN_ID_STRING_LE      (0x33414853) /* "SHA3" (little-endian)*/
 #endif
 
 #define SKEIN_MK_64(hi32, lo32)  ((lo32) + (((u64) (hi32)) << 32))
@@ -208,23 +217,29 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_CFG_TREE_NODE_SIZE_POS  (8)
 #define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
 
-#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
-#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
-#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
+#define SKEIN_CFG_TREE_LEAF_SIZE_MSK (((u64)0xFF) << \
+					SKEIN_CFG_TREE_LEAF_SIZE_POS)
+#define SKEIN_CFG_TREE_NODE_SIZE_MSK (((u64)0xFF) << \
+					SKEIN_CFG_TREE_NODE_SIZE_POS)
+#define SKEIN_CFG_TREE_MAX_LEVEL_MSK (((u64)0xFF) << \
+					SKEIN_CFG_TREE_MAX_LEVEL_POS)
 
 #define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
 	((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
 	 (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
 	 (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
 
-#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
+/* use as treeInfo in InitExt() call for sequential processing */
+#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0)
 
 /*
 **   Skein macros for getting/setting tweak words, etc.
 **   These are useful for partial input bytes, hash tree init/update, etc.
 **/
 #define Skein_Get_Tweak(ctxPtr, TWK_NUM)          ((ctxPtr)->h.T[TWK_NUM])
-#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal); }
+#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal) { \
+		(ctxPtr)->h.T[TWK_NUM] = (tVal); \
+	}
 
 #define Skein_Get_T0(ctxPtr)     Skein_Get_Tweak(ctxPtr, 0)
 #define Skein_Get_T1(ctxPtr)     Skein_Get_Tweak(ctxPtr, 1)
@@ -241,14 +256,26 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
 	Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
 
-/* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
-#define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
-	{ Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
+/*
+ * setup for starting with a new type:
+ * h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0;
+ */
+#define Skein_Start_New_Type(ctxPtr, BLK_TYPE) { \
+		Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | \
+				SKEIN_T1_BLK_TYPE_##BLK_TYPE); \
+		(ctxPtr)->h.bCnt = 0; \
+	}
 
-#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
-#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
+#define Skein_Clear_First_Flag(hdr) { \
+		(hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST; \
+	}
+#define Skein_Set_Bit_Pad_Flag(hdr) { \
+		(hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD; \
+	}
 
-#define Skein_Set_Tree_Level(hdr, height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); }
+#define Skein_Set_Tree_Level(hdr, height) { \
+		(hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); \
+	}
 
 /*****************************************************************
 ** "Internal" Skein definitions for debugging and error checking
@@ -263,7 +290,7 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define Skein_Show_Key(bits, ctx, key, keyBytes)
 #endif
 
-#define Skein_Assert(x, retCode)/* default: ignore all Asserts, for performance */
+#define Skein_Assert(x, retCode)/* ignore all Asserts, for performance */
 #define Skein_assert(x)
 
 /*****************************************************************
@@ -292,21 +319,29 @@ enum
 	R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
 
 	    /* Skein1024 round rotation constants */
-	R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
-	R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
-	R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
-	R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
-	R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
-	R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
-	R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
-	R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
+	R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47,
+	R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
+	R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55,
+	R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
+	R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13,
+	R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
+	R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41,
+	R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
+	R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31,
+	R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
+	R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51,
+	R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
+	R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46,
+	R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
+	R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52,
+	R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
 	};
 
 #ifndef SKEIN_ROUNDS
-#define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
+#define SKEIN_256_ROUNDS_TOTAL (72)	/* # rounds for diff block sizes */
 #define SKEIN_512_ROUNDS_TOTAL (72)
 #define SKEIN1024_ROUNDS_TOTAL (80)
-#else                                        /* allow command-line define in range 8*(5..14)   */
+#else			/* allow command-line define in range 8*(5..14)   */
 #define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
 #define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/10)  + 5) % 10) + 5))
 #define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS)     + 5) % 10) + 5))
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 5d92bbff8c9f..e81675d7eac9 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -72,7 +72,9 @@ struct threefish_key {
  * @param tweak
  *     Pointer to the two tweak words (word has 64 bits).
  */
-void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize, u64 *keyData, u64 *tweak);
+void threefishSetKey(struct threefish_key *keyCtx,
+			enum threefish_size stateSize,
+			u64 *keyData, u64 *tweak);
 
 /**
  * Encrypt Threefisch block (bytes).
@@ -108,7 +110,8 @@ void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
  * @param out
  *     Pointer to cipher buffer.
  */
-void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
+void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in,
+				u64 *out);
 
 /**
  * Decrypt Threefisch block (bytes).
@@ -144,14 +147,17 @@ void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
  * @param out
  *     Pointer to plaintext buffer.
  */
-void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
+void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in,
+				u64 *out);
 
 void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
 void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
-void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input,
+			u64 *output);
 void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
 void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
-void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input,
+			u64 *output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 3f0f32806181..ed603ee7b170 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -16,9 +16,12 @@
 
 /*****************************************************************/
 /* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-void    Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
@@ -53,20 +56,28 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
-		/* build/process the config block, type == CONFIG (could be precomputed) */
-		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		/*
+		 * build/process the config block, type == CONFIG (could be
+		 * precomputed)
+		 */
+		/* set tweaks: T0=0; T1=CFG | FINAL */
+		Skein_Start_New_Type(ctx, CFG_FINAL);
+
+		/* set the schema, version */
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+		/* hash result length in bits */
+		cfg.w[1] = Skein_Swap64(hashBitLen);
 		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+		/* zero pad config block */
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0]));
 
 		/* compute the initial chaining values from config block */
-		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		/* zero the chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
 		Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
-	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* The chaining vars ctx->X are now initialized for hashBitLen. */
 	/* Set up to process the data message portion of the hash (default) */
 	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
@@ -75,42 +86,58 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+/* [identical to Skein_256_Init() when keyBytes == 0 && \
+ *	treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
 	union
 	{
 		u8  b[SKEIN_256_STATE_BYTES];
 		u64  w[SKEIN_256_STATE_WORDS];
-	} cfg;                              /* config block */
+	} cfg; /* config block */
 
 	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
 	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
 	/* compute the initial chaining values ctx->X[], based on key */
-	if (keyBytes == 0)                          /* is there a key? */
+	if (keyBytes == 0) /* is there a key? */
 	{
-		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+		/* no key: use all zeroes as key for config block */
+		memset(ctx->X, 0, sizeof(ctx->X));
 	}
-	else                                        /* here to pre-process a key */
+	else /* here to pre-process a key */
 	{
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
-		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-		Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
-		Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+		/* set output hash bit count = state size */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);
+		/* set tweaks: T0 = 0; T1 = KEY type */
+		Skein_Start_New_Type(ctx, KEY);
+		/* zero the initial chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
+		/* hash the key */
+		Skein_256_Update(ctx, key, keyBytes);
+		/* put result into cfg.b[] */
+		Skein_256_Final_Pad(ctx, cfg.b);
+		/* copy over into ctx->X[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
 	}
-	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
-	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	/*
+	 * build/process the config block, type == CONFIG (could be
+	 * precomputed for each key)
+	 */
+	/* output hash bit count */
+	ctx->h.hashBitLen = hashBitLen;
 	Skein_Start_New_Type(ctx, CFG_FINAL);
 
-	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	/* pre-pad cfg.w[] with zeroes */
+	memset(&cfg.w, 0, sizeof(cfg.w));
 	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	/* hash result length in bits */
+	cfg.w[1] = Skein_Swap64(hashBitLen);
+	/* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	cfg.w[2] = Skein_Swap64(treeInfo);
 
 	Skein_Show_Key(256, &ctx->h, key, keyBytes);
 
@@ -126,35 +153,46 @@ int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt)
 {
 	size_t n;
 
-	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
 	if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
 	{
-		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		/* finish up any buffered message data */
+		if (ctx->h.bCnt)
 		{
-			n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			/* # bytes free in buffer b[] */
+			n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;
 			if (n)
 			{
-				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				/* check on our logic here */
+				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
 				msgByteCnt  -= n;
 				msg         += n;
 				ctx->h.bCnt += n;
 			}
 			Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
-			Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
+			Skein_256_Process_Block(ctx, ctx->b, 1,
+						SKEIN_256_BLOCK_BYTES);
 			ctx->h.bCnt = 0;
 		}
-		/* now process any remaining full blocks, directly from input message data */
+		/*
+		 * now process any remaining full blocks, directly from input
+		 * message data
+		 */
 		if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
 		{
-			n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
-			Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
+			/* number of full blocks to process */
+			n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;
+			Skein_256_Process_Block(ctx, msg, n,
+						SKEIN_256_BLOCK_BYTES);
 			msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
 			msg        += n * SKEIN_256_BLOCK_BYTES;
 		}
@@ -178,31 +216,46 @@ int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN_256_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
 
-	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+	/* process the final block */
+	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;
 		if (n >= SKEIN_256_BLOCK_BYTES)
 			n  = SKEIN_256_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(256, &ctx->h, n,
+				 hashVal+i*SKEIN_256_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -240,21 +293,32 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
-		/* build/process the config block, type == CONFIG (could be precomputed) */
-		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		/*
+		 * build/process the config block, type == CONFIG (could be
+		 * precomputed)
+		 */
+		/* set tweaks: T0=0; T1=CFG | FINAL */
+		Skein_Start_New_Type(ctx, CFG_FINAL);
+
+		/* set the schema, version */
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+		/* hash result length in bits */
+		cfg.w[1] = Skein_Swap64(hashBitLen);
 		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+		/* zero pad config block */
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0]));
 
 		/* compute the initial chaining values from config block */
-		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		/* zero the chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
 		Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
 
-	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/*
+	 * The chaining vars ctx->X are now initialized for the given
+	 * hashBitLen.
+	 */
 	/* Set up to process the data message portion of the hash (default) */
 	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
@@ -263,8 +327,10 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+/* [identical to Skein_512_Init() when keyBytes == 0 && \
+ *	treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
 	union
 	{
@@ -278,27 +344,40 @@ int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo
 	/* compute the initial chaining values ctx->X[], based on key */
 	if (keyBytes == 0)                          /* is there a key? */
 	{
-		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+		/* no key: use all zeroes as key for config block */
+		memset(ctx->X, 0, sizeof(ctx->X));
 	}
-	else                                        /* here to pre-process a key */
+	else /* here to pre-process a key */
 	{
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
-		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-		Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
-		Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+		/* set output hash bit count = state size */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);
+		/* set tweaks: T0 = 0; T1 = KEY type */
+		Skein_Start_New_Type(ctx, KEY);
+		/* zero the initial chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
+		/* hash the key */
+		Skein_512_Update(ctx, key, keyBytes);
+		/* put result into cfg.b[] */
+		Skein_512_Final_Pad(ctx, cfg.b);
+		/* copy over into ctx->X[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
 	}
-	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
+	/*
+	 * build/process the config block, type == CONFIG (could be
+	 * precomputed for each key)
+	 */
 	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
 	Skein_Start_New_Type(ctx, CFG_FINAL);
 
-	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	/* pre-pad cfg.w[] with zeroes */
+	memset(&cfg.w, 0, sizeof(cfg.w));
 	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	/* hash result length in bits */
+	cfg.w[1] = Skein_Swap64(hashBitLen);
+	/* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	cfg.w[2] = Skein_Swap64(treeInfo);
 
 	Skein_Show_Key(512, &ctx->h, key, keyBytes);
 
@@ -314,35 +393,46 @@ int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt)
 {
 	size_t n;
 
-	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
 	if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
 	{
-		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		/* finish up any buffered message data */
+		if (ctx->h.bCnt)
 		{
-			n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			/* # bytes free in buffer b[] */
+			n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;
 			if (n)
 			{
-				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				/* check on our logic here */
+				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
 				msgByteCnt  -= n;
 				msg         += n;
 				ctx->h.bCnt += n;
 			}
 			Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
-			Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
+			Skein_512_Process_Block(ctx, ctx->b, 1,
+						SKEIN_512_BLOCK_BYTES);
 			ctx->h.bCnt = 0;
 		}
-		/* now process any remaining full blocks, directly from input message data */
+		/*
+		 * now process any remaining full blocks, directly from input
+		 * message data
+		 */
 		if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
 		{
-			n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
-			Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
+			/* number of full blocks to process */
+			n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;
+			Skein_512_Process_Block(ctx, msg, n,
+						SKEIN_512_BLOCK_BYTES);
 			msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
 			msg        += n * SKEIN_512_BLOCK_BYTES;
 		}
@@ -366,31 +456,46 @@ int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN_512_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
 
-	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+	/* process the final block */
+	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;
 		if (n >= SKEIN_512_BLOCK_BYTES)
 			n  = SKEIN_512_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(512, &ctx->h, n,
+				 hashVal+i*SKEIN_512_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -425,21 +530,29 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
-		/* build/process the config block, type == CONFIG (could be precomputed) */
-		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		/*
+		 * build/process the config block, type == CONFIG
+		 * (could be precomputed)
+		 */
+		/* set tweaks: T0=0; T1=CFG | FINAL */
+		Skein_Start_New_Type(ctx, CFG_FINAL);
+
+		/* set the schema, version */
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+		/* hash result length in bits */
+		cfg.w[1] = Skein_Swap64(hashBitLen);
 		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+		/* zero pad config block */
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0]));
 
 		/* compute the initial chaining values from config block */
-		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		/* zero the chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
 		Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
 
-	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* The chaining vars ctx->X are now initialized for the hashBitLen. */
 	/* Set up to process the data message portion of the hash (default) */
 	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
@@ -448,8 +561,10 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+/* [identical to Skein1024_Init() when keyBytes == 0 && \
+ *	treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
 	union
 	{
@@ -463,27 +578,41 @@ int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo
 	/* compute the initial chaining values ctx->X[], based on key */
 	if (keyBytes == 0)                          /* is there a key? */
 	{
-		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+		/* no key: use all zeroes as key for config block */
+		memset(ctx->X, 0, sizeof(ctx->X));
 	}
-	else                                        /* here to pre-process a key */
+	else /* here to pre-process a key */
 	{
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
-		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-		Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
-		Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+		/* set output hash bit count = state size */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);
+		/* set tweaks: T0 = 0; T1 = KEY type */
+		Skein_Start_New_Type(ctx, KEY);
+		/* zero the initial chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
+		/* hash the key */
+		Skein1024_Update(ctx, key, keyBytes);
+		/* put result into cfg.b[] */
+		Skein1024_Final_Pad(ctx, cfg.b);
+		/* copy over into ctx->X[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
 	}
-	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
-	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	/*
+	 * build/process the config block, type == CONFIG (could be
+	 * precomputed for each key)
+	 */
+	/* output hash bit count */
+	ctx->h.hashBitLen = hashBitLen;
 	Skein_Start_New_Type(ctx, CFG_FINAL);
 
-	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	/* pre-pad cfg.w[] with zeroes */
+	memset(&cfg.w, 0, sizeof(cfg.w));
 	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	/* hash result length in bits */
+	cfg.w[1] = Skein_Swap64(hashBitLen);
+	/* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	cfg.w[2] = Skein_Swap64(treeInfo);
 
 	Skein_Show_Key(1024, &ctx->h, key, keyBytes);
 
@@ -499,35 +628,46 @@ int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt)
 {
 	size_t n;
 
-	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
 	if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
 	{
-		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		/* finish up any buffered message data */
+		if (ctx->h.bCnt)
 		{
-			n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			/* # bytes free in buffer b[] */
+			n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;
 			if (n)
 			{
-				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				/* check on our logic here */
+				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
 				msgByteCnt  -= n;
 				msg         += n;
 				ctx->h.bCnt += n;
 			}
 			Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
-			Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
+			Skein1024_Process_Block(ctx, ctx->b, 1,
+						SKEIN1024_BLOCK_BYTES);
 			ctx->h.bCnt = 0;
 		}
-		/* now process any remaining full blocks, directly from input message data */
+		/*
+		 * now process any remaining full blocks, directly from input
+		 * message data
+		 */
 		if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
 		{
-			n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
-			Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
+			/* number of full blocks to process */
+			n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;
+			Skein1024_Process_Block(ctx, msg, n,
+						SKEIN1024_BLOCK_BYTES);
 			msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
 			msg        += n * SKEIN1024_BLOCK_BYTES;
 		}
@@ -551,31 +691,46 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN1024_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
 
-	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+	/* process the final block */
+	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;
 		if (n >= SKEIN1024_BLOCK_BYTES)
 			n  = SKEIN1024_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(1024, &ctx->h, n,
+				 hashVal+i*SKEIN1024_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -587,14 +742,20 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+	/* process the final block */
+	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
-	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+	/* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -603,14 +764,20 @@ int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+	/* process the final block */
+	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
-	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+	/* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -619,14 +786,20 @@ int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+	/* process the final block */
+	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
-	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+	/* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -638,25 +811,36 @@ int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN_256_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;
 		if (n >= SKEIN_256_BLOCK_BYTES)
 			n  = SKEIN_256_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(256, &ctx->h, n,
+				 hashVal+i*SKEIN_256_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -667,25 +851,36 @@ int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN_512_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;
 		if (n >= SKEIN_512_BLOCK_BYTES)
 			n  = SKEIN_512_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(256, &ctx->h, n,
+				 hashVal+i*SKEIN_512_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -696,25 +891,36 @@ int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN1024_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;
 		if (n >= SKEIN1024_BLOCK_BYTES)
 			n  = SKEIN1024_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(256, &ctx->h, n,
+				 hashVal+i*SKEIN1024_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 3ebb1d60ef93..f0015d5b10f5 100644
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -46,9 +46,9 @@ int skeinInit(struct skein_ctx *ctx, size_t hashBitLen)
 
 	Skein_Assert(ctx, SKEIN_FAIL);
 	/*
-	 * The following two lines rely of the fact that the real Skein contexts are
-	 * a union in out context and thus have tha maximum memory available.
-	 * The beauty of C :-) .
+	 * The following two lines rely of the fact that the real Skein
+	 * contexts are a union in out context and thus have tha maximum
+	 * memory available.  The beauty of C :-) .
 	 */
 	X = ctx->m.s256.X;
 	Xlen = ctx->skeinSize/8;
@@ -72,7 +72,10 @@ int skeinInit(struct skein_ctx *ctx, size_t hashBitLen)
 	}
 
 	if (ret == SKEIN_SUCCESS) {
-		/* Save chaining variables for this combination of size and hashBitLen */
+		/*
+		 * Save chaining variables for this combination of size and
+		 * hashBitLen
+		 */
 		memcpy(ctx->XSave, X, Xlen);
 	}
 	return ret;
@@ -113,7 +116,10 @@ int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
 		break;
 	}
 	if (ret == SKEIN_SUCCESS) {
-		/* Save chaining variables for this combination of key, keyLen, hashBitLen */
+		/*
+		 * Save chaining variables for this combination of key,
+		 * keyLen, hashBitLen
+		 */
 		memcpy(ctx->XSave, X, Xlen);
 	}
 	return ret;
@@ -125,9 +131,9 @@ void skeinReset(struct skein_ctx *ctx)
 	u64 *X = NULL;
 
 	/*
-	 * The following two lines rely of the fact that the real Skein contexts are
-	 * a union in out context and thus have tha maximum memory available.
-	 * The beautiy of C :-) .
+	 * The following two lines rely of the fact that the real Skein
+	 * contexts are a union in out context and thus have tha maximum
+	 * memory available.  The beautiy of C :-) .
 	 */
 	X = ctx->m.s256.X;
 	Xlen = ctx->skeinSize/8;
@@ -146,13 +152,16 @@ int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
 
 	switch (ctx->skeinSize) {
 	case Skein256:
-		ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg, msgByteCnt);
+		ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg,
+					msgByteCnt);
 		break;
 	case Skein512:
-		ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg, msgByteCnt);
+		ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg,
+					msgByteCnt);
 		break;
 	case Skein1024:
-		ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg, msgByteCnt);
+		ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg,
+					msgByteCnt);
 		break;
 	}
 	return ret;
@@ -164,15 +173,19 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
 {
 	/*
 	 * I've used the bit pad implementation from skein_test.c (see NIST CD)
-	 * and modified it to use the convenience functions and added some pointer
-	 * arithmetic.
+	 * and modified it to use the convenience functions and added some
+	 * pointer arithmetic.
 	 */
 	size_t length;
 	u8 mask;
 	u8 *up;
 
-	/* only the final Update() call is allowed do partial bytes, else assert an error */
-	Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
+	/*
+	 * only the final Update() call is allowed do partial bytes, else
+	 * assert an error
+	 */
+	Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 ||
+			msgBitCnt == 0, SKEIN_FAIL);
 
 	/* if number of bits is a multiple of bytes - that's easy */
 	if ((msgBitCnt & 0x7) == 0) {
@@ -188,13 +201,18 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
 	 */
 	up = (u8 *)ctx->m.s256.X + ctx->skeinSize / 8;
 
-	Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
+	/* set tweak flag for the skeinFinal call */
+	Skein_Set_Bit_Pad_Flag(ctx->m.h);
 
 	/* now "pad" the final partial byte the way NIST likes */
-	length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
-	Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
-	mask = (u8) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
-	up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
+	/* get the bCnt value (same location for all block sizes) */
+	length = ctx->m.h.bCnt;
+	/* internal sanity check: there IS a partial byte in the buffer! */
+	Skein_assert(length != 0);
+	/* partial byte bit mask */
+	mask = (u8) (1u << (7 - (msgBitCnt & 7)));
+	/* apply bit padding on final byte (in the buffer) */
+	up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);
 
 	return SKEIN_SUCCESS;
 }
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 376cd63d8f83..69176389fef9 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -11,10 +11,10 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 	struct threefish_key key;
 	u64 tweak[2];
 	int i;
-	u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+	u64  w[SKEIN_256_STATE_WORDS]; /* local copy of input block */
 	u64 words[3];
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	tweak[0] = ctx->h.T[0];
 	tweak[1] = ctx->h.T[1];
 
@@ -36,13 +36,14 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 
 		threefishSetKey(&key, Threefish256, ctx->X, tweak);
 
-		Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);
 
 		threefishEncryptBlockWords(&key, w, ctx->X);
 
 		blkPtr += SKEIN_256_BLOCK_BYTES;
 
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update ctx chaining vars */
 		ctx->X[0] = ctx->X[0] ^ w[0];
 		ctx->X[1] = ctx->X[1] ^ w[1];
 		ctx->X[2] = ctx->X[2] ^ w[2];
@@ -62,9 +63,9 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 	u64 tweak[2];
 	int i;
 	u64 words[3];
-	u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+	u64  w[SKEIN_512_STATE_WORDS]; /* local copy of input block */
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	tweak[0] = ctx->h.T[0];
 	tweak[1] = ctx->h.T[1];
 
@@ -86,13 +87,14 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 
 		threefishSetKey(&key, Threefish512, ctx->X, tweak);
 
-		Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);
 
 		threefishEncryptBlockWords(&key, w, ctx->X);
 
 		blkPtr += SKEIN_512_BLOCK_BYTES;
 
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update ctx chaining vars */
 		ctx->X[0] = ctx->X[0] ^ w[0];
 		ctx->X[1] = ctx->X[1] ^ w[1];
 		ctx->X[2] = ctx->X[2] ^ w[2];
@@ -116,9 +118,9 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 	u64 tweak[2];
 	int i;
 	u64 words[3];
-	u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+	u64  w[SKEIN1024_STATE_WORDS]; /* local copy of input block */
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	tweak[0] = ctx->h.T[0];
 	tweak[1] = ctx->h.T[1];
 
@@ -140,13 +142,14 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 
 		threefishSetKey(&key, Threefish1024, ctx->X, tweak);
 
-		Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);
 
 		threefishEncryptBlockWords(&key, w, ctx->X);
 
 		blkPtr += SKEIN1024_BLOCK_BYTES;
 
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update ctx chaining vars */
 		ctx->X[0]  = ctx->X[0]  ^ w[0];
 		ctx->X[1]  = ctx->X[1]  ^ w[1];
 		ctx->X[2]  = ctx->X[2]  ^ w[2];
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index d315f547feae..780b4936f783 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -18,14 +18,14 @@
 #include <skein.h>
 
 #ifndef SKEIN_USE_ASM
-#define SKEIN_USE_ASM   (0)                     /* default is all C code (no ASM) */
+#define SKEIN_USE_ASM   (0) /* default is all C code (no ASM) */
 #endif
 
 #ifndef SKEIN_LOOP
-#define SKEIN_LOOP 001                          /* default: unroll 256 and 512, but not 1024 */
+#define SKEIN_LOOP 001 /* default: unroll 256 and 512, but not 1024 */
 #endif
 
-#define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
+#define BLK_BITS        (WCNT*64) /* some useful definitions for code here */
 #define KW_TWK_BASE     (0)
 #define KW_KEY_BASE     (3)
 #define ks              (kw + KW_KEY_BASE)
@@ -39,7 +39,8 @@
 
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd)
 	{ /* do it in C */
 	enum {
 		WCNT = SKEIN_256_STATE_WORDS
@@ -47,7 +48,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 #undef  RCNT
 #define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
 
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
 #else
 #define SKEIN_UNROLL_256 (0)
@@ -55,25 +56,28 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 
 #if SKEIN_UNROLL_256
 #if (RCNT % SKEIN_UNROLL_256)
-#error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
+#error "Invalid SKEIN_UNROLL_256" /* sanity check on unroll count */
 #endif
 	size_t  r;
-	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	u64  kw[WCNT+4+RCNT*2]; /* key schedule: chaining vars + tweak + "rot"*/
 #else
-	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4]; /* key schedule words : chaining vars + tweak */
 #endif
-	u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
-	u64  w[WCNT];                           /* local copy of input block */
+	u64  X0, X1, X2, X3; /* local copy of context vars, for speed */
+	u64  w[WCNT]; /* local copy of input block */
 #ifdef SKEIN_DEBUG
-	const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+	const u64 *Xptr[4]; /* use for debugging (help cc put Xn in regs) */
 	Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
 #endif
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	ts[0] = ctx->h.T[0];
 	ts[1] = ctx->h.T[1];
 	do  {
-		/* this implementation only supports 2**64 input bytes (no carry out here) */
-		ts[0] += byteCntAdd;                    /* update processed length */
+		/*
+		 * this implementation only supports 2**64 input bytes
+		 * (no carry out here)
+		 */
+		ts[0] += byteCntAdd; /* update processed length */
 
 		/* precompute the key schedule for this block */
 		ks[0] = ctx->X[0];
@@ -84,16 +88,19 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 
 		ts[2] = ts[0] ^ ts[1];
 
-		Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, WCNT);
 		DebugSaveTweak(ctx);
 		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
-		X0 = w[0] + ks[0];                      /* do the first full key injection */
+		X0 = w[0] + ks[0]; /* do the first full key injection */
 		X1 = w[1] + ks[1] + ts[0];
 		X2 = w[2] + ks[2] + ts[1];
 		X3 = w[3] + ks[3];
 
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
+		/* show starting state values */
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL,
+				 Xptr);
 
 		blkPtr += SKEIN_256_BLOCK_BYTES;
 
@@ -104,31 +111,34 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 
 #if SKEIN_UNROLL_256 == 0
-#define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
-	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+#define R256(p0, p1, p2, p3, ROT, rNum) /* fully unrolled */ \
+	Round256(p0, p1, p2, p3, ROT, rNum) \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
-#define I256(R)                                                     \
-	X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
-	X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
-	X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
-	X3   += ks[((R)+4) % 5] +     (R)+1;                            \
+#define I256(R) \
+	/* inject the key schedule value */ \
+	X0   += ks[((R)+1) % 5]; \
+	X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3]; \
+	X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3]; \
+	X3   += ks[((R)+4) % 5] +     (R)+1;       \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R256(p0, p1, p2, p3, ROT, rNum)                                  \
-	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+#else /* looping version */
+#define R256(p0, p1, p2, p3, ROT, rNum) \
+	Round256(p0, p1, p2, p3, ROT, rNum) \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
-#define I256(R)                                                     \
-	X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
-	X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
-	X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
-	X3   += ks[r+(R)+3] +    r+(R);                              \
-	ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
-	ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
+#define I256(R) \
+	/* inject the key schedule value */ \
+	X0   += ks[r+(R)+0]; \
+	X1   += ks[r+(R)+1] + ts[r+(R)+0]; \
+	X2   += ks[r+(R)+2] + ts[r+(R)+1]; \
+	X3   += ks[r+(R)+3] +    r+(R);    \
+	/* rotate key schedule */ \
+	ks[r + (R) + 4]   = ks[r + (R) - 1]; \
+	ts[r + (R) + 2]   = ts[r + (R) - 1]; \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 
-	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
+	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)
 #endif
 		{
 #define R256_8_rounds(R)                  \
@@ -145,7 +155,10 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 
 		R256_8_rounds(0);
 
-#define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
+#define R256_Unroll_R(NN) \
+	((SKEIN_UNROLL_256 == 0 && \
+	  SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || \
+	 (SKEIN_UNROLL_256 > (NN)))
 
 	#if   R256_Unroll_R(1)
 		R256_8_rounds(1);
@@ -193,7 +206,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 #error  "need more unrolling in Skein_256_Process_Block"
 	#endif
 		}
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update context chaining */
 		ctx->X[0] = X0 ^ w[0];
 		ctx->X[1] = X1 ^ w[1];
 		ctx->X[2] = X2 ^ w[2];
@@ -223,7 +236,8 @@ unsigned int Skein_256_Unroll_Cnt(void)
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd)
 { /* do it in C */
 	enum {
 		WCNT = SKEIN_512_STATE_WORDS
@@ -231,7 +245,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 #undef  RCNT
 #define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
 
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
 #else
 #define SKEIN_UNROLL_512 (0)
@@ -239,27 +253,30 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 
 #if SKEIN_UNROLL_512
 #if (RCNT % SKEIN_UNROLL_512)
-#error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
+#error "Invalid SKEIN_UNROLL_512" /* sanity check on unroll count */
 #endif
 	size_t  r;
-	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	u64  kw[WCNT+4+RCNT*2]; /* key sched: chaining vars + tweak + "rot"*/
 #else
-	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4]; /* key schedule words : chaining vars + tweak */
 #endif
-	u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
-	u64  w[WCNT];                           /* local copy of input block */
+	u64  X0, X1, X2, X3, X4, X5, X6, X7; /* local copies, for speed */
+	u64  w[WCNT]; /* local copy of input block */
 #ifdef SKEIN_DEBUG
-	const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+	const u64 *Xptr[8]; /* use for debugging (help cc put Xn in regs) */
 	Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
 	Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
 #endif
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	ts[0] = ctx->h.T[0];
 	ts[1] = ctx->h.T[1];
 	do  {
-		/* this implementation only supports 2**64 input bytes (no carry out here) */
-		ts[0] += byteCntAdd;                    /* update processed length */
+		/*
+		 * this implementation only supports 2**64 input bytes
+		 * (no carry out here)
+		 */
+		ts[0] += byteCntAdd; /* update processed length */
 
 		/* precompute the key schedule for this block */
 		ks[0] = ctx->X[0];
@@ -275,11 +292,12 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 
 		ts[2] = ts[0] ^ ts[1];
 
-		Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, WCNT);
 		DebugSaveTweak(ctx);
 		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
-		X0   = w[0] + ks[0];                    /* do the first full key injection */
+		X0   = w[0] + ks[0]; /* do the first full key injection */
 		X1   = w[1] + ks[1];
 		X2   = w[2] + ks[2];
 		X3   = w[3] + ks[3];
@@ -290,65 +308,72 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 
 		blkPtr += SKEIN_512_BLOCK_BYTES;
 
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL,
+				 Xptr);
 		/* run the rounds */
-#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
-		X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-		X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
-		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
-		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
+#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+	X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
+	X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
 
 #if SKEIN_UNROLL_512 == 0
-#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
-		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
-
-#define I512(R)                                                     \
-		X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
-		X1   += ks[((R) + 2) % 9];                                        \
-		X2   += ks[((R) + 3) % 9];                                        \
-		X3   += ks[((R) + 4) % 9];                                        \
-		X4   += ks[((R) + 5) % 9];                                        \
-		X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
-		X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
-		X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
-
-#define I512(R)                                                     \
-		X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
-		X1   += ks[r + (R) + 1];                                            \
-		X2   += ks[r + (R) + 2];                                            \
-		X3   += ks[r + (R) + 3];                                            \
-		X4   += ks[r + (R) + 4];                                            \
-		X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
-		X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
-		X7   += ks[r + (R) + 7] +         r + (R);                              \
-		ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
-		ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-		for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
-#endif                         /* end of looped code definitions */
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) /* unrolled */ \
+	Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+
+#define I512(R) \
+	/* inject the key schedule value */ \
+	X0   += ks[((R) + 1) % 9]; \
+	X1   += ks[((R) + 2) % 9]; \
+	X2   += ks[((R) + 3) % 9]; \
+	X3   += ks[((R) + 4) % 9]; \
+	X4   += ks[((R) + 5) % 9]; \
+	X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3]; \
+	X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3]; \
+	X7   += ks[((R) + 8) % 9] +     (R) + 1;       \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+#else /* looping version */
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+	Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+
+#define I512(R) \
+	/* inject the key schedule value */ \
+	X0   += ks[r + (R) + 0]; \
+	X1   += ks[r + (R) + 1]; \
+	X2   += ks[r + (R) + 2]; \
+	X3   += ks[r + (R) + 3]; \
+	X4   += ks[r + (R) + 4]; \
+	X5   += ks[r + (R) + 5] + ts[r + (R) + 0]; \
+	X6   += ks[r + (R) + 6] + ts[r + (R) + 1]; \
+	X7   += ks[r + (R) + 7] +         r + (R); \
+	/* rotate key schedule */ \
+	ks[r +         (R) + 8] = ks[r + (R) - 1]; \
+	ts[r +         (R) + 2] = ts[r + (R) - 1]; \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+		for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)
+#endif /* end of looped code definitions */
 		{
 #define R512_8_rounds(R)  /* do 8 full rounds */  \
-			R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
-			R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
-			R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
-			R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
-			I512(2 * (R));                              \
-			R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
-			R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
-			R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
-			R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
-			I512(2 * (R) + 1);        /* and key injection */
+		R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
+		R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
+		R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
+		R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
+		I512(2 * (R));                              \
+		R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
+		R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
+		R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
+		R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
+		I512(2 * (R) + 1);        /* and key injection */
 
 			R512_8_rounds(0);
 
-#define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
+#define R512_Unroll_R(NN) \
+		((SKEIN_UNROLL_512 == 0 && \
+		  SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || \
+		 (SKEIN_UNROLL_512 > (NN)))
 
 	#if   R512_Unroll_R(1)
 			R512_8_rounds(1);
@@ -397,7 +422,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 	#endif
 		}
 
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update context chaining */
 		ctx->X[0] = X0 ^ w[0];
 		ctx->X[1] = X1 ^ w[1];
 		ctx->X[2] = X2 ^ w[2];
@@ -430,7 +455,8 @@ unsigned int Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
+				size_t blkCnt, size_t byteCntAdd)
 { /* do it in C, always looping (unrolled is bigger AND slower!) */
 	enum {
 		WCNT = SKEIN1024_STATE_WORDS
@@ -438,7 +464,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 #undef  RCNT
 #define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
 
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
 #else
 #define SKEIN_UNROLL_1024 (0)
@@ -446,31 +472,35 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 
 #if (SKEIN_UNROLL_1024 != 0)
 #if (RCNT % SKEIN_UNROLL_1024)
-#error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
+#error "Invalid SKEIN_UNROLL_1024" /* sanity check on unroll count */
 #endif
 	size_t  r;
-	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	u64  kw[WCNT+4+RCNT*2]; /* key sched: chaining vars + tweak + "rot" */
 #else
-	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4]; /* key schedule words : chaining vars + tweak */
 #endif
 
-	u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
-		X08, X09, X10, X11, X12, X13, X14, X15;
-	u64  w[WCNT];                            /* local copy of input block */
+	/* local copy of vars, for speed */
+	u64  X00, X01, X02, X03, X04, X05, X06, X07,
+	     X08, X09, X10, X11, X12, X13, X14, X15;
+	u64  w[WCNT]; /* local copy of input block */
 #ifdef SKEIN_DEBUG
-	const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+	const u64 *Xptr[16]; /* use for debugging (help cc put Xn in regs) */
 	Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
 	Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
 	Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
 	Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
 #endif
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	ts[0] = ctx->h.T[0];
 	ts[1] = ctx->h.T[1];
 	do  {
-		/* this implementation only supports 2**64 input bytes (no carry out here) */
-		ts[0] += byteCntAdd;                    /* update processed length */
+		/*
+		 * this implementation only supports 2**64 input bytes
+		 * (no carry out here)
+		 */
+		ts[0] += byteCntAdd; /* update processed length */
 
 		/* precompute the key schedule for this block */
 		ks[0]  = ctx->X[0];
@@ -496,11 +526,12 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 
 		ts[2]  = ts[0] ^ ts[1];
 
-		Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, WCNT);
 		DebugSaveTweak(ctx);
 		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
-		X00    =  w[0] +  ks[0];                 /* do the first full key injection */
+		X00    =  w[0] +  ks[0]; /* do the first full key injection */
 		X01    =  w[1] +  ks[1];
 		X02    =  w[2] +  ks[2];
 		X03    =  w[3] +  ks[3];
@@ -517,85 +548,105 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 		X14    = w[14] + ks[14] + ts[1];
 		X15    = w[15] + ks[15];
 
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL,
+				 Xptr);
 
-#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
-		X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
-		X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
-		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
-		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
-		X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
-		X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
-		X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
-		X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
+#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
+			pF, ROT, rNum) \
+	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
+	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
+	X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
+	X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
+	X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
+	X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
+	X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
+	X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
 
 #if SKEIN_UNROLL_1024 == 0
-#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
-
-#define I1024(R)                                                        \
-		X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
-		X01   += ks[((R) +  2) % 17];                                       \
-		X02   += ks[((R) +  3) % 17];                                       \
-		X03   += ks[((R) +  4) % 17];                                       \
-		X04   += ks[((R) +  5) % 17];                                       \
-		X05   += ks[((R) +  6) % 17];                                       \
-		X06   += ks[((R) +  7) % 17];                                       \
-		X07   += ks[((R) +  8) % 17];                                       \
-		X08   += ks[((R) +  9) % 17];                                       \
-		X09   += ks[((R) + 10) % 17];                                       \
-		X10   += ks[((R) + 11) % 17];                                       \
-		X11   += ks[((R) + 12) % 17];                                       \
-		X12   += ks[((R) + 13) % 17];                                       \
-		X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
-		X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
-		X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
-
-#define I1024(R)                                                      \
-		X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
-		X01   += ks[r + (R) +  1];                                            \
-		X02   += ks[r + (R) +  2];                                            \
-		X03   += ks[r + (R) +  3];                                            \
-		X04   += ks[r + (R) +  4];                                            \
-		X05   += ks[r + (R) +  5];                                            \
-		X06   += ks[r + (R) +  6];                                            \
-		X07   += ks[r + (R) +  7];                                            \
-		X08   += ks[r + (R) +  8];                                            \
-		X09   += ks[r + (R) +  9];                                            \
-		X10   += ks[r + (R) + 10];                                            \
-		X11   += ks[r + (R) + 11];                                            \
-		X12   += ks[r + (R) + 12];                                            \
-		X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
-		X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
-		X15   += ks[r + (R) + 15] +         r + (R);                          \
-		ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
-		ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
-		Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, \
+		ROT, rn) \
+	Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
+			pF, ROT, rn) \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+
+#define I1024(R) \
+	/* inject the key schedule value */ \
+	X00   += ks[((R) +  1) % 17]; \
+	X01   += ks[((R) +  2) % 17]; \
+	X02   += ks[((R) +  3) % 17]; \
+	X03   += ks[((R) +  4) % 17]; \
+	X04   += ks[((R) +  5) % 17]; \
+	X05   += ks[((R) +  6) % 17]; \
+	X06   += ks[((R) +  7) % 17]; \
+	X07   += ks[((R) +  8) % 17]; \
+	X08   += ks[((R) +  9) % 17]; \
+	X09   += ks[((R) + 10) % 17]; \
+	X10   += ks[((R) + 11) % 17]; \
+	X11   += ks[((R) + 12) % 17]; \
+	X12   += ks[((R) + 13) % 17]; \
+	X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3]; \
+	X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3]; \
+	X15   += ks[((R) + 16) % 17] +     (R) + 1;       \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+#else /* looping version */
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, \
+		ROT, rn) \
+	Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
+			pF, ROT, rn) \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
+
+#define I1024(R) \
+	/* inject the key schedule value */ \
+	X00   += ks[r + (R) +  0]; \
+	X01   += ks[r + (R) +  1]; \
+	X02   += ks[r + (R) +  2]; \
+	X03   += ks[r + (R) +  3]; \
+	X04   += ks[r + (R) +  4]; \
+	X05   += ks[r + (R) +  5]; \
+	X06   += ks[r + (R) +  6]; \
+	X07   += ks[r + (R) +  7]; \
+	X08   += ks[r + (R) +  8]; \
+	X09   += ks[r + (R) +  9]; \
+	X10   += ks[r + (R) + 10]; \
+	X11   += ks[r + (R) + 11]; \
+	X12   += ks[r + (R) + 12]; \
+	X13   += ks[r + (R) + 13] + ts[r + (R) + 0]; \
+	X14   += ks[r + (R) + 14] + ts[r + (R) + 1]; \
+	X15   += ks[r + (R) + 15] +         r + (R); \
+	/* rotate key schedule */ \
+	ks[r  +         (R) + 16] = ks[r + (R) - 1]; \
+	ts[r  +         (R) +  2] = ts[r + (R) - 1]; \
+	Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)
 #endif
 		{
-#define R1024_8_rounds(R)    /* do 8 full rounds */                               \
-			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
-			R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
-			R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
-			R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
-			I1024(2*(R));                                                             \
-			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
-			R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
-			R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
-			R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
-			I1024(2*(R)+1);
+#define R1024_8_rounds(R) \
+	R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, \
+		R1024_0, 8*(R) + 1); \
+	R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, \
+		R1024_1, 8*(R) + 2); \
+	R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, \
+		R1024_2, 8*(R) + 3); \
+	R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, \
+		R1024_3, 8*(R) + 4); \
+	I1024(2*(R)); \
+	R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, \
+		R1024_4, 8*(R) + 5); \
+	R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, \
+		R1024_5, 8*(R) + 6); \
+	R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, \
+		R1024_6, 8*(R) + 7); \
+	R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, \
+		R1024_7, 8*(R) + 8); \
+	I1024(2*(R)+1);
 
 			R1024_8_rounds(0);
 
-#define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
+#define R1024_Unroll_R(NN) \
+		((SKEIN_UNROLL_1024 == 0 && \
+		  SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || \
+		 (SKEIN_UNROLL_1024 > (NN)))
 
 	#if   R1024_Unroll_R(1)
 			R1024_8_rounds(1);
@@ -643,7 +694,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 #error  "need more unrolling in Skein_1024_Process_Block"
   #endif
 		}
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update context chaining */
 
 		ctx->X[0] = X00 ^ w[0];
 		ctx->X[1] = X01 ^ w[1];
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 1730a3120a0f..fe7517b2008c 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -24,646 +24,2085 @@ void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
 	  t2 = keyCtx->tweak[2];
 
-	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k5; b4 += b5 + k4; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k7; b6 += b7 + k6; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k9; b8 += b9 + k8; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k11; b10 += b11 + k10; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k13 + t0; b12 += b13 + k12; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k15; b14 += b15 + k14 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k6; b4 += b5 + k5; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k8; b6 += b7 + k7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k10; b8 += b9 + k9; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k12; b10 += b11 + k11; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k14 + t1; b12 += b13 + k13; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k16 + 1; b14 += b15 + k15 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k7; b4 += b5 + k6; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k9; b6 += b7 + k8; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k11; b8 += b9 + k10; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k13; b10 += b11 + k12; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k15 + t2; b12 += b13 + k14; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k0 + 2; b14 += b15 + k16 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k8; b4 += b5 + k7; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k10; b6 += b7 + k9; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k12; b8 += b9 + k11; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k14; b10 += b11 + k13; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k16 + t0; b12 += b13 + k15; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k1 + 3; b14 += b15 + k0 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k9; b4 += b5 + k8; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k11; b6 += b7 + k10; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k13; b8 += b9 + k12; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k15; b10 += b11 + k14; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k0 + t1; b12 += b13 + k16; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k2 + 4; b14 += b15 + k1 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k10; b4 += b5 + k9; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k12; b6 += b7 + k11; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k14; b8 += b9 + k13; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k16; b10 += b11 + k15; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k1 + t2; b12 += b13 + k0; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k3 + 5; b14 += b15 + k2 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k9; b2 += b3 + k8; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k11; b4 += b5 + k10; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k13; b6 += b7 + k12; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k15; b8 += b9 + k14; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k0; b10 += b11 + k16; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k2 + t0; b12 += b13 + k1; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k4 + 6; b14 += b15 + k3 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k10; b2 += b3 + k9; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k12; b4 += b5 + k11; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k14; b6 += b7 + k13; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k16; b8 += b9 + k15; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k1; b10 += b11 + k0; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k3 + t1; b12 += b13 + k2; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k5 + 7; b14 += b15 + k4 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k9; b0 += b1 + k8; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k11; b2 += b3 + k10; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k13; b4 += b5 + k12; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k15; b6 += b7 + k14; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k0; b8 += b9 + k16; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k2; b10 += b11 + k1; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k4 + t2; b12 += b13 + k3; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k6 + 8; b14 += b15 + k5 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k10; b0 += b1 + k9; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k12; b2 += b3 + k11; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k14; b4 += b5 + k13; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k16; b6 += b7 + k15; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k1; b8 += b9 + k0; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k3; b10 += b11 + k2; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k5 + t0; b12 += b13 + k4; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k7 + 9; b14 += b15 + k6 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k11; b0 += b1 + k10; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k13; b2 += b3 + k12; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k15; b4 += b5 + k14; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k0; b6 += b7 + k16; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k2; b8 += b9 + k1; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k4; b10 += b11 + k3; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k6 + t1; b12 += b13 + k5; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k8 + 10; b14 += b15 + k7 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k12; b0 += b1 + k11; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k14; b2 += b3 + k13; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k16; b4 += b5 + k15; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k1; b6 += b7 + k0; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k3; b8 += b9 + k2; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k5; b10 += b11 + k4; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k7 + t2; b12 += b13 + k6; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k9 + 11; b14 += b15 + k8 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k13; b0 += b1 + k12; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k15; b2 += b3 + k14; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k0; b4 += b5 + k16; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k2; b6 += b7 + k1; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k4; b8 += b9 + k3; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k6; b10 += b11 + k5; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k8 + t0; b12 += b13 + k7; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k10 + 12; b14 += b15 + k9 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k14; b0 += b1 + k13; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k16; b2 += b3 + k15; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k1; b4 += b5 + k0; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k3; b6 += b7 + k2; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k5; b8 += b9 + k4; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k7; b10 += b11 + k6; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k9 + t1; b12 += b13 + k8; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k11 + 13; b14 += b15 + k10 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k15; b0 += b1 + k14; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k0; b2 += b3 + k16; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k2; b4 += b5 + k1; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k4; b6 += b7 + k3; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k6; b8 += b9 + k5; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k8; b10 += b11 + k7; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k10 + t2; b12 += b13 + k9; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k12 + 14; b14 += b15 + k11 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k16; b0 += b1 + k15; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k3; b4 += b5 + k2; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k5; b6 += b7 + k4; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k7; b8 += b9 + k6; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k9; b10 += b11 + k8; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k11 + t0; b12 += b13 + k10; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k13 + 15; b14 += b15 + k12 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k0; b0 += b1 + k16; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k4; b4 += b5 + k3; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k6; b6 += b7 + k5; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k8; b8 += b9 + k7; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k10; b10 += b11 + k9; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k12 + t1; b12 += b13 + k11; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k14 + 16; b14 += b15 + k13 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k5; b4 += b5 + k4; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k7; b6 += b7 + k6; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k9; b8 += b9 + k8; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k11; b10 += b11 + k10; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k13 + t2; b12 += b13 + k12; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k15 + 17; b14 += b15 + k14 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k6; b4 += b5 + k5; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k8; b6 += b7 + k7; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k10; b8 += b9 + k9; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k12; b10 += b11 + k11; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k14 + t0; b12 += b13 + k13; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k16 + 18; b14 += b15 + k15 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k7; b4 += b5 + k6; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k9; b6 += b7 + k8; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k11; b8 += b9 + k10; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k13; b10 += b11 + k12; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k15 + t1; b12 += b13 + k14; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k0 + 19; b14 += b15 + k16 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k5;
+	b4 += b5 + k4;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k9;
+	b8 += b9 + k8;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k11;
+	b10 += b11 + k10;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k13 + t0;
+	b12 += b13 + k12;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k15;
+	b14 += b15 + k14 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k6;
+	b4 += b5 + k5;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k8;
+	b6 += b7 + k7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k10;
+	b8 += b9 + k9;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k12;
+	b10 += b11 + k11;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k14 + t1;
+	b12 += b13 + k13;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k16 + 1;
+	b14 += b15 + k15 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k7;
+	b4 += b5 + k6;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k9;
+	b6 += b7 + k8;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k11;
+	b8 += b9 + k10;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k13;
+	b10 += b11 + k12;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k15 + t2;
+	b12 += b13 + k14;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k0 + 2;
+	b14 += b15 + k16 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k8;
+	b4 += b5 + k7;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k10;
+	b6 += b7 + k9;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k12;
+	b8 += b9 + k11;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k14;
+	b10 += b11 + k13;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k16 + t0;
+	b12 += b13 + k15;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k1 + 3;
+	b14 += b15 + k0 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k9;
+	b4 += b5 + k8;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k11;
+	b6 += b7 + k10;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k13;
+	b8 += b9 + k12;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k15;
+	b10 += b11 + k14;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k0 + t1;
+	b12 += b13 + k16;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k2 + 4;
+	b14 += b15 + k1 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k10;
+	b4 += b5 + k9;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k12;
+	b6 += b7 + k11;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k14;
+	b8 += b9 + k13;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k16;
+	b10 += b11 + k15;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k1 + t2;
+	b12 += b13 + k0;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k3 + 5;
+	b14 += b15 + k2 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k9;
+	b2 += b3 + k8;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k11;
+	b4 += b5 + k10;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k13;
+	b6 += b7 + k12;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k15;
+	b8 += b9 + k14;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k0;
+	b10 += b11 + k16;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k2 + t0;
+	b12 += b13 + k1;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k4 + 6;
+	b14 += b15 + k3 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k10;
+	b2 += b3 + k9;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k12;
+	b4 += b5 + k11;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k14;
+	b6 += b7 + k13;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k16;
+	b8 += b9 + k15;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k1;
+	b10 += b11 + k0;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k3 + t1;
+	b12 += b13 + k2;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k5 + 7;
+	b14 += b15 + k4 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k9;
+	b0 += b1 + k8;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k11;
+	b2 += b3 + k10;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k13;
+	b4 += b5 + k12;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k15;
+	b6 += b7 + k14;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k0;
+	b8 += b9 + k16;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k2;
+	b10 += b11 + k1;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k4 + t2;
+	b12 += b13 + k3;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k6 + 8;
+	b14 += b15 + k5 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k10;
+	b0 += b1 + k9;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k12;
+	b2 += b3 + k11;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k14;
+	b4 += b5 + k13;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k16;
+	b6 += b7 + k15;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k1;
+	b8 += b9 + k0;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k3;
+	b10 += b11 + k2;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k5 + t0;
+	b12 += b13 + k4;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k7 + 9;
+	b14 += b15 + k6 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k11;
+	b0 += b1 + k10;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k13;
+	b2 += b3 + k12;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k15;
+	b4 += b5 + k14;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k0;
+	b6 += b7 + k16;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k2;
+	b8 += b9 + k1;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k4;
+	b10 += b11 + k3;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k6 + t1;
+	b12 += b13 + k5;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k8 + 10;
+	b14 += b15 + k7 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k12;
+	b0 += b1 + k11;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k14;
+	b2 += b3 + k13;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k16;
+	b4 += b5 + k15;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k1;
+	b6 += b7 + k0;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k3;
+	b8 += b9 + k2;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k5;
+	b10 += b11 + k4;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k7 + t2;
+	b12 += b13 + k6;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k9 + 11;
+	b14 += b15 + k8 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k13;
+	b0 += b1 + k12;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k15;
+	b2 += b3 + k14;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k0;
+	b4 += b5 + k16;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k2;
+	b6 += b7 + k1;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k4;
+	b8 += b9 + k3;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k6;
+	b10 += b11 + k5;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k8 + t0;
+	b12 += b13 + k7;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k10 + 12;
+	b14 += b15 + k9 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k14;
+	b0 += b1 + k13;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k16;
+	b2 += b3 + k15;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k1;
+	b4 += b5 + k0;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k3;
+	b6 += b7 + k2;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k5;
+	b8 += b9 + k4;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k7;
+	b10 += b11 + k6;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k9 + t1;
+	b12 += b13 + k8;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k11 + 13;
+	b14 += b15 + k10 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k15;
+	b0 += b1 + k14;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k16;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k2;
+	b4 += b5 + k1;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k4;
+	b6 += b7 + k3;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k6;
+	b8 += b9 + k5;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k8;
+	b10 += b11 + k7;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k10 + t2;
+	b12 += b13 + k9;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k12 + 14;
+	b14 += b15 + k11 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k16;
+	b0 += b1 + k15;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k3;
+	b4 += b5 + k2;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k5;
+	b6 += b7 + k4;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k7;
+	b8 += b9 + k6;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k9;
+	b10 += b11 + k8;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k11 + t0;
+	b12 += b13 + k10;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k13 + 15;
+	b14 += b15 + k12 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k0;
+	b0 += b1 + k16;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k4;
+	b4 += b5 + k3;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k6;
+	b6 += b7 + k5;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k8;
+	b8 += b9 + k7;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k10;
+	b10 += b11 + k9;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k12 + t1;
+	b12 += b13 + k11;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k14 + 16;
+	b14 += b15 + k13 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k5;
+	b4 += b5 + k4;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k9;
+	b8 += b9 + k8;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k11;
+	b10 += b11 + k10;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k13 + t2;
+	b12 += b13 + k12;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k15 + 17;
+	b14 += b15 + k14 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k6;
+	b4 += b5 + k5;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k8;
+	b6 += b7 + k7;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k10;
+	b8 += b9 + k9;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k12;
+	b10 += b11 + k11;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k14 + t0;
+	b12 += b13 + k13;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k16 + 18;
+	b14 += b15 + k15 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k7;
+	b4 += b5 + k6;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k9;
+	b6 += b7 + k8;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k11;
+	b8 += b9 + k10;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k13;
+	b10 += b11 + k12;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k15 + t1;
+	b12 += b13 + k14;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k0 + 19;
+	b14 += b15 + k16 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
 
 	output[0] = b0 + k3;
 	output[1] = b1 + k4;
@@ -683,685 +2122,2764 @@ void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	output[15] = b15 + k1 + 20;
 }
 
-void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	  b2 = input[2], b3 = input[3],
-	  b4 = input[4], b5 = input[5],
-	  b6 = input[6], b7 = input[7],
-	  b8 = input[8], b9 = input[9],
-	  b10 = input[10], b11 = input[11],
-	  b12 = input[12], b13 = input[13],
-	  b14 = input[14], b15 = input[15];
-	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-	  k8 = keyCtx->key[8], k9 = keyCtx->key[9],
-	  k10 = keyCtx->key[10], k11 = keyCtx->key[11],
-	  k12 = keyCtx->key[12], k13 = keyCtx->key[13],
-	  k14 = keyCtx->key[14], k15 = keyCtx->key[15],
-	  k16 = keyCtx->key[16];
-	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-	  t2 = keyCtx->tweak[2];
-	u64 tmp;
+void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7],
+	  b8 = input[8], b9 = input[9],
+	  b10 = input[10], b11 = input[11],
+	  b12 = input[12], b13 = input[13],
+	  b14 = input[14], b15 = input[15];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+	  k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+	  k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+	  k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+	  k16 = keyCtx->key[16];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
+	u64 tmp;
+
+	b0 -= k3;
+	b1 -= k4;
+	b2 -= k5;
+	b3 -= k6;
+	b4 -= k7;
+	b5 -= k8;
+	b6 -= k9;
+	b7 -= k10;
+	b8 -= k11;
+	b9 -= k12;
+	b10 -= k13;
+	b11 -= k14;
+	b12 -= k15;
+	b13 -= k16 + t2;
+	b14 -= k0 + t0;
+	b15 -= k1 + 20;
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k16 + t2;
+	b15 -= k0 + 19;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k14;
+	b13 -= k15 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k12;
+	b11 -= k13;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k10;
+	b9 -= k11;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k8;
+	b7 -= k9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k6;
+	b5 -= k7;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k15 + t1;
+	b15 -= k16 + 18;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k13;
+	b13 -= k14 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k11;
+	b11 -= k12;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k9;
+	b9 -= k10;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k7;
+	b7 -= k8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k5;
+	b5 -= k6;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k14 + t0;
+	b15 -= k15 + 17;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k12;
+	b13 -= k13 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k10;
+	b11 -= k11;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k8;
+	b9 -= k9;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k6;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k4;
+	b5 -= k5;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k0;
+	b1 -= k1;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k13 + t2;
+	b15 -= k14 + 16;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k11;
+	b13 -= k12 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k9;
+	b11 -= k10;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k7;
+	b9 -= k8;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k5;
+	b7 -= k6;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k3;
+	b5 -= k4;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k16;
+	b1 -= k0;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k12 + t1;
+	b15 -= k13 + 15;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k10;
+	b13 -= k11 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k8;
+	b11 -= k9;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k6;
+	b9 -= k7;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k4;
+	b7 -= k5;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k2;
+	b5 -= k3;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k15;
+	b1 -= k16;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k11 + t0;
+	b15 -= k12 + 14;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k9;
+	b13 -= k10 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k7;
+	b11 -= k8;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k5;
+	b9 -= k6;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k3;
+	b7 -= k4;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k1;
+	b5 -= k2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k16;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k14;
+	b1 -= k15;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k10 + t2;
+	b15 -= k11 + 13;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k8;
+	b13 -= k9 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k6;
+	b11 -= k7;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k4;
+	b9 -= k5;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k2;
+	b7 -= k3;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k0;
+	b5 -= k1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k15;
+	b3 -= k16;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k13;
+	b1 -= k14;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k9 + t1;
+	b15 -= k10 + 12;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k7;
+	b13 -= k8 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k5;
+	b11 -= k6;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k3;
+	b9 -= k4;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k1;
+	b7 -= k2;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k16;
+	b5 -= k0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k14;
+	b3 -= k15;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k12;
+	b1 -= k13;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k8 + t0;
+	b15 -= k9 + 11;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k6;
+	b13 -= k7 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k4;
+	b11 -= k5;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k2;
+	b9 -= k3;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k0;
+	b7 -= k1;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k15;
+	b5 -= k16;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k13;
+	b3 -= k14;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k11;
+	b1 -= k12;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k7 + t2;
+	b15 -= k8 + 10;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k5;
+	b13 -= k6 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k3;
+	b11 -= k4;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k1;
+	b9 -= k2;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k16;
+	b7 -= k0;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k14;
+	b5 -= k15;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k12;
+	b3 -= k13;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k10;
+	b1 -= k11;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k6 + t1;
+	b15 -= k7 + 9;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k4;
+	b13 -= k5 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k2;
+	b11 -= k3;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k0;
+	b9 -= k1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k15;
+	b7 -= k16;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k13;
+	b5 -= k14;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k11;
+	b3 -= k12;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k9;
+	b1 -= k10;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k5 + t0;
+	b15 -= k6 + 8;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k3;
+	b13 -= k4 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k1;
+	b11 -= k2;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k16;
+	b9 -= k0;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k14;
+	b7 -= k15;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k12;
+	b5 -= k13;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k10;
+	b3 -= k11;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k8;
+	b1 -= k9;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k4 + t2;
+	b15 -= k5 + 7;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k2;
+	b13 -= k3 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k0;
+	b11 -= k1;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k15;
+	b9 -= k16;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k13;
+	b7 -= k14;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k11;
+	b5 -= k12;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k9;
+	b3 -= k10;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
 
-	b0 -= k3;
-	b1 -= k4;
-	b2 -= k5;
-	b3 -= k6;
-	b4 -= k7;
-	b5 -= k8;
-	b6 -= k9;
-	b7 -= k10;
-	b8 -= k11;
-	b9 -= k12;
-	b10 -= k13;
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k3 + t1;
+	b15 -= k4 + 6;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k1;
+	b13 -= k2 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k16;
+	b11 -= k0;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k14;
+	b9 -= k15;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k12;
+	b7 -= k13;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k10;
+	b5 -= k11;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k8;
+	b3 -= k9;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k2 + t0;
+	b15 -= k3 + 5;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k0;
+	b13 -= k1 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k15;
+	b11 -= k16;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k13;
+	b9 -= k14;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k11;
+	b7 -= k12;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k9;
+	b5 -= k10;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k1 + t2;
+	b15 -= k2 + 4;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k16;
+	b13 -= k0 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k14;
+	b11 -= k15;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k12;
+	b9 -= k13;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k10;
+	b7 -= k11;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k8;
+	b5 -= k9;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k0 + t1;
+	b15 -= k1 + 3;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k15;
+	b13 -= k16 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k13;
 	b11 -= k14;
-	b12 -= k15;
-	b13 -= k16 + t2;
-	b14 -= k0 + t0;
-	b15 -= k1 + 20;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k16 + t2; b15 -= k0 + 19;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k14; b13 -= k15 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k12; b11 -= k13;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k10; b9 -= k11;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k8; b7 -= k9;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k6; b5 -= k7;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k4; b3 -= k5;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k2; b1 -= k3;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k15 + t1; b15 -= k16 + 18;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k13; b13 -= k14 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k11; b11 -= k12;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k9; b9 -= k10;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k7; b7 -= k8;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k5; b5 -= k6;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k3; b3 -= k4;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k1; b1 -= k2;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k14 + t0; b15 -= k15 + 17;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k12; b13 -= k13 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k10; b11 -= k11;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k8; b9 -= k9;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k6; b7 -= k7;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k4; b5 -= k5;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k2; b3 -= k3;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k0; b1 -= k1;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k13 + t2; b15 -= k14 + 16;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k11; b13 -= k12 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k9; b11 -= k10;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k7; b9 -= k8;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k5; b7 -= k6;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k3; b5 -= k4;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k1; b3 -= k2;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k16; b1 -= k0;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k12 + t1; b15 -= k13 + 15;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k10; b13 -= k11 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k8; b11 -= k9;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k6; b9 -= k7;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k4; b7 -= k5;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k2; b5 -= k3;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k0; b3 -= k1;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k15; b1 -= k16;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k11 + t0; b15 -= k12 + 14;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k9; b13 -= k10 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k7; b11 -= k8;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k5; b9 -= k6;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k3; b7 -= k4;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k1; b5 -= k2;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k16; b3 -= k0;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k14; b1 -= k15;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k10 + t2; b15 -= k11 + 13;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k8; b13 -= k9 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k6; b11 -= k7;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k4; b9 -= k5;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k2; b7 -= k3;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k0; b5 -= k1;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k15; b3 -= k16;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k13; b1 -= k14;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k9 + t1; b15 -= k10 + 12;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k7; b13 -= k8 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k5; b11 -= k6;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k3; b9 -= k4;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k1; b7 -= k2;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k16; b5 -= k0;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k14; b3 -= k15;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k12; b1 -= k13;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k8 + t0; b15 -= k9 + 11;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k6; b13 -= k7 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k4; b11 -= k5;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k2; b9 -= k3;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k0; b7 -= k1;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k15; b5 -= k16;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k13; b3 -= k14;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k11; b1 -= k12;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k7 + t2; b15 -= k8 + 10;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k5; b13 -= k6 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k3; b11 -= k4;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k1; b9 -= k2;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k16; b7 -= k0;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k14; b5 -= k15;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k12; b3 -= k13;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k10; b1 -= k11;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k6 + t1; b15 -= k7 + 9;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k4; b13 -= k5 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k2; b11 -= k3;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k0; b9 -= k1;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k15; b7 -= k16;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k13; b5 -= k14;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k11; b3 -= k12;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k9; b1 -= k10;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k5 + t0; b15 -= k6 + 8;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k3; b13 -= k4 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k1; b11 -= k2;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k16; b9 -= k0;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k14; b7 -= k15;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k12; b5 -= k13;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k10; b3 -= k11;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k8; b1 -= k9;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k4 + t2; b15 -= k5 + 7;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k2; b13 -= k3 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k0; b11 -= k1;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k15; b9 -= k16;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k13; b7 -= k14;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k11; b5 -= k12;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k9; b3 -= k10;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k7; b1 -= k8;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k3 + t1; b15 -= k4 + 6;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k1; b13 -= k2 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k16; b11 -= k0;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k14; b9 -= k15;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k12; b7 -= k13;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k10; b5 -= k11;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k8; b3 -= k9;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k6; b1 -= k7;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k2 + t0; b15 -= k3 + 5;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k0; b13 -= k1 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k15; b11 -= k16;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k13; b9 -= k14;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k11; b7 -= k12;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k9; b5 -= k10;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k7; b3 -= k8;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k5; b1 -= k6;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k1 + t2; b15 -= k2 + 4;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k16; b13 -= k0 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k14; b11 -= k15;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k12; b9 -= k13;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k10; b7 -= k11;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k8; b5 -= k9;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k6; b3 -= k7;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k4; b1 -= k5;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k0 + t1; b15 -= k1 + 3;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k15; b13 -= k16 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k13; b11 -= k14;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k11; b9 -= k12;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k9; b7 -= k10;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k7; b5 -= k8;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k5; b3 -= k6;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k3; b1 -= k4;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k16 + t0; b15 -= k0 + 2;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k14; b13 -= k15 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k12; b11 -= k13;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k10; b9 -= k11;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k8; b7 -= k9;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k6; b5 -= k7;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k4; b3 -= k5;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k2; b1 -= k3;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k15 + t2; b15 -= k16 + 1;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k13; b13 -= k14 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k11; b11 -= k12;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k9; b9 -= k10;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k7; b7 -= k8;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k5; b5 -= k6;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k3; b3 -= k4;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k1; b1 -= k2;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k14 + t1; b15 -= k15;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k12; b13 -= k13 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k10; b11 -= k11;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k8; b9 -= k9;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k6; b7 -= k7;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k4; b5 -= k5;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k2; b3 -= k3;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k0; b1 -= k1;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k11;
+	b9 -= k12;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k9;
+	b7 -= k10;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k7;
+	b5 -= k8;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k16 + t0;
+	b15 -= k0 + 2;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k14;
+	b13 -= k15 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k12;
+	b11 -= k13;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k10;
+	b9 -= k11;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k8;
+	b7 -= k9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k6;
+	b5 -= k7;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k15 + t2;
+	b15 -= k16 + 1;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k13;
+	b13 -= k14 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k11;
+	b11 -= k12;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k9;
+	b9 -= k10;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k7;
+	b7 -= k8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k5;
+	b5 -= k6;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k14 + t1;
+	b15 -= k15;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k12;
+	b13 -= k13 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k10;
+	b11 -= k11;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k8;
+	b9 -= k9;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k6;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k4;
+	b5 -= k5;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k0;
+	b1 -= k1;
 
 	output[15] = b15;
 	output[14] = b14;
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index da3b8357e47f..2ae746a641ae 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -12,158 +12,481 @@ void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
 	  t2 = keyCtx->tweak[2];
 
-	b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k3; b2 += b3 + k2 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k4 + 1; b2 += b3 + k3 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k0 + 2; b2 += b3 + k4 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k4 + t0; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k1 + 3; b2 += b3 + k0 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k0 + t1; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k2 + 4; b2 += b3 + k1 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k1 + t2; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k3 + 5; b2 += b3 + k2 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k2 + t0; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k4 + 6; b2 += b3 + k3 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k3 + t1; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k0 + 7; b2 += b3 + k4 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k4 + t2; b0 += b1 + k3; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k1 + 8; b2 += b3 + k0 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k0 + t0; b0 += b1 + k4; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k2 + 9; b2 += b3 + k1 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k1 + t1; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k3 + 10; b2 += b3 + k2 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k2 + t2; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k4 + 11; b2 += b3 + k3 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k3 + t0; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k0 + 12; b2 += b3 + k4 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k4 + t1; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k1 + 13; b2 += b3 + k0 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k0 + t2; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k2 + 14; b2 += b3 + k1 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k3 + 15; b2 += b3 + k2 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k4 + 16; b2 += b3 + k3 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k0 + 17; b2 += b3 + k4 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k1 + t0;
+	b0 += b1 + k0;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k2 + t1;
+	b0 += b1 + k1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k4 + 1;
+	b2 += b3 + k3 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k3 + t2;
+	b0 += b1 + k2;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k0 + 2;
+	b2 += b3 + k4 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k4 + t0;
+	b0 += b1 + k3;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k1 + 3;
+	b2 += b3 + k0 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k0 + t1;
+	b0 += b1 + k4;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k2 + 4;
+	b2 += b3 + k1 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k1 + t2;
+	b0 += b1 + k0;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k3 + 5;
+	b2 += b3 + k2 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k2 + t0;
+	b0 += b1 + k1;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k4 + 6;
+	b2 += b3 + k3 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k3 + t1;
+	b0 += b1 + k2;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k0 + 7;
+	b2 += b3 + k4 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k4 + t2;
+	b0 += b1 + k3;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k1 + 8;
+	b2 += b3 + k0 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k0 + t0;
+	b0 += b1 + k4;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k2 + 9;
+	b2 += b3 + k1 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k1 + t1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k3 + 10;
+	b2 += b3 + k2 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k2 + t2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k4 + 11;
+	b2 += b3 + k3 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k3 + t0;
+	b0 += b1 + k2;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k0 + 12;
+	b2 += b3 + k4 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k4 + t1;
+	b0 += b1 + k3;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k1 + 13;
+	b2 += b3 + k0 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k0 + t2;
+	b0 += b1 + k4;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k2 + 14;
+	b2 += b3 + k1 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k1 + t0;
+	b0 += b1 + k0;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k3 + 15;
+	b2 += b3 + k2 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k2 + t1;
+	b0 += b1 + k1;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k4 + 16;
+	b2 += b3 + k3 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k3 + t2;
+	b0 += b1 + k2;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k0 + 17;
+	b2 += b3 + k4 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
 	output[0] = b0 + k3;
 	output[1] = b1 + k4 + t0;
@@ -187,158 +510,625 @@ void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	b1 -= k4 + t0;
 	b2 -= k0 + t1;
 	b3 -= k1 + 18;
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t0; b3 -= k0 + 17;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t2; b3 -= k4 + 16;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t1; b3 -= k3 + 15;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t0; b3 -= k2 + 14;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t2; b3 -= k1 + 13;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t1; b3 -= k0 + 12;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t0; b3 -= k4 + 11;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t2; b3 -= k3 + 10;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k4; b1 -= k0 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k1 + t1; b3 -= k2 + 9;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k3; b1 -= k4 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k0 + t0; b3 -= k1 + 8;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t2; b3 -= k0 + 7;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t1; b3 -= k4 + 6;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t0; b3 -= k3 + 5;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t2; b3 -= k2 + 4;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t1; b3 -= k1 + 3;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t0; b3 -= k0 + 2;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t2; b3 -= k4 + 1;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t1; b3 -= k3;
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k2;
+	b1 -= k3 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k4 + t0;
+	b3 -= k0 + 17;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k1;
+	b1 -= k2 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k3 + t2;
+	b3 -= k4 + 16;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k0;
+	b1 -= k1 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k2 + t1;
+	b3 -= k3 + 15;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k4;
+	b1 -= k0 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k1 + t0;
+	b3 -= k2 + 14;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k3;
+	b1 -= k4 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k0 + t2;
+	b3 -= k1 + 13;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k2;
+	b1 -= k3 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k4 + t1;
+	b3 -= k0 + 12;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k1;
+	b1 -= k2 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k3 + t0;
+	b3 -= k4 + 11;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k0;
+	b1 -= k1 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k2 + t2;
+	b3 -= k3 + 10;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k4;
+	b1 -= k0 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k1 + t1;
+	b3 -= k2 + 9;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k3;
+	b1 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k0 + t0;
+	b3 -= k1 + 8;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k2;
+	b1 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k4 + t2;
+	b3 -= k0 + 7;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k1;
+	b1 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k3 + t1;
+	b3 -= k4 + 6;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k0;
+	b1 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k2 + t0;
+	b3 -= k3 + 5;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k4;
+	b1 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k1 + t2;
+	b3 -= k2 + 4;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k3;
+	b1 -= k4 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k0 + t1;
+	b3 -= k1 + 3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k2;
+	b1 -= k3 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k4 + t0;
+	b3 -= k0 + 2;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k1;
+	b1 -= k2 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k3 + t2;
+	b3 -= k4 + 1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k0;
+	b1 -= k1 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k2 + t1;
+	b3 -= k3;
 
 	output[0] = b0;
 	output[1] = b1;
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index dc96ba279720..f428fd6e1719 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -16,294 +16,941 @@ void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
 	  t2 = keyCtx->tweak[2];
 
-	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k7; b6 += b7 + k6 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k8 + 1; b6 += b7 + k7 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k0 + 2; b6 += b7 + k8 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k1 + 3; b6 += b7 + k0 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k2 + 4; b6 += b7 + k1 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k3 + 5; b6 += b7 + k2 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k0; b2 += b3 + k8; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k4 + 6; b6 += b7 + k3 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k5 + 7; b6 += b7 + k4 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k0; b0 += b1 + k8; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k6 + 8; b6 += b7 + k5 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k7 + 9; b6 += b7 + k6 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k8 + 10; b6 += b7 + k7 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k0 + 11; b6 += b7 + k8 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k1 + 12; b6 += b7 + k0 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k2 + 13; b6 += b7 + k1 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k3 + 14; b6 += b7 + k2 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k0; b2 += b3 + k8; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k4 + 15; b6 += b7 + k3 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k5 + 16; b6 += b7 + k4 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k0; b0 += b1 + k8; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k6 + 17; b6 += b7 + k5 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k5 + t0;
+	b4 += b5 + k4;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k6 + t1;
+	b4 += b5 + k5;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k8 + 1;
+	b6 += b7 + k7 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k7 + t2;
+	b4 += b5 + k6;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k0 + 2;
+	b6 += b7 + k8 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k8 + t0;
+	b4 += b5 + k7;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k1 + 3;
+	b6 += b7 + k0 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k0 + t1;
+	b4 += b5 + k8;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k2 + 4;
+	b6 += b7 + k1 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k1 + t2;
+	b4 += b5 + k0;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k3 + 5;
+	b6 += b7 + k2 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k8;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k2 + t0;
+	b4 += b5 + k1;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k4 + 6;
+	b6 += b7 + k3 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k3 + t1;
+	b4 += b5 + k2;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k5 + 7;
+	b6 += b7 + k4 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k0;
+	b0 += b1 + k8;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k4 + t2;
+	b4 += b5 + k3;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k6 + 8;
+	b6 += b7 + k5 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k5 + t0;
+	b4 += b5 + k4;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k7 + 9;
+	b6 += b7 + k6 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k6 + t1;
+	b4 += b5 + k5;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k8 + 10;
+	b6 += b7 + k7 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k7 + t2;
+	b4 += b5 + k6;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k0 + 11;
+	b6 += b7 + k8 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k8 + t0;
+	b4 += b5 + k7;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k1 + 12;
+	b6 += b7 + k0 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k0 + t1;
+	b4 += b5 + k8;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k2 + 13;
+	b6 += b7 + k1 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k1 + t2;
+	b4 += b5 + k0;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k3 + 14;
+	b6 += b7 + k2 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k8;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k2 + t0;
+	b4 += b5 + k1;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k4 + 15;
+	b6 += b7 + k3 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k3 + t1;
+	b4 += b5 + k2;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k5 + 16;
+	b6 += b7 + k4 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k0;
+	b0 += b1 + k8;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k4 + t2;
+	b4 += b5 + k3;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k6 + 17;
+	b6 += b7 + k5 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
 
 	output[0] = b0 + k0;
 	output[1] = b1 + k1;
@@ -315,318 +962,1254 @@ void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	output[7] = b7 + k7 + 18;
 }
 
-void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	  b2 = input[2], b3 = input[3],
-	  b4 = input[4], b5 = input[5],
-	  b6 = input[6], b7 = input[7];
-	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-	  k8 = keyCtx->key[8];
-	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-	  t2 = keyCtx->tweak[2];
+void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
+
+	u64 tmp;
+
+	b0 -= k0;
+	b1 -= k1;
+	b2 -= k2;
+	b3 -= k3;
+	b4 -= k4;
+	b5 -= k5 + t0;
+	b6 -= k6 + t1;
+	b7 -= k7 + 18;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k5 + t0;
+	b7 -= k6 + 17;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k3;
+	b5 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k8;
+	b1 -= k0;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k4 + t2;
+	b7 -= k5 + 16;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k2;
+	b5 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k3 + t1;
+	b7 -= k4 + 15;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k1;
+	b5 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k8;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
 
-	u64 tmp;
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
 
-	b0 -= k0;
-	b1 -= k1;
-	b2 -= k2;
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k2 + t0;
+	b7 -= k3 + 14;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k0;
+	b5 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k1 + t2;
+	b7 -= k2 + 13;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k8;
+	b5 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k0 + t1;
+	b7 -= k1 + 12;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k7;
+	b5 -= k8 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k8 + t0;
+	b7 -= k0 + 11;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k6;
+	b5 -= k7 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k7 + t2;
+	b7 -= k8 + 10;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k5;
+	b5 -= k6 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k6 + t1;
+	b7 -= k7 + 9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k4;
+	b5 -= k5 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k2;
 	b3 -= k3;
-	b4 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k0;
+	b1 -= k1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k5 + t0;
+	b7 -= k6 + 8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k3;
+	b5 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k8;
+	b1 -= k0;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k4 + t2;
+	b7 -= k5 + 7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k2;
+	b5 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k3 + t1;
+	b7 -= k4 + 6;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k1;
+	b5 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k8;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k2 + t0;
+	b7 -= k3 + 5;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k0;
+	b5 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k1 + t2;
+	b7 -= k2 + 4;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k8;
+	b5 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k0 + t1;
+	b7 -= k1 + 3;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k7;
+	b5 -= k8 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k8 + t0;
+	b7 -= k0 + 2;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k6;
+	b5 -= k7 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k7 + t2;
+	b7 -= k8 + 1;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k5;
+	b5 -= k6 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k6 + t1;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k4;
 	b5 -= k5 + t0;
-	b6 -= k6 + t1;
-	b7 -= k7 + 18;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k5 + t0; b7 -= k6 + 17;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k3; b5 -= k4 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k1; b3 -= k2;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k8; b1 -= k0;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k4 + t2; b7 -= k5 + 16;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k2; b5 -= k3 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k0; b3 -= k1;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k7; b1 -= k8;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k3 + t1; b7 -= k4 + 15;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k1; b5 -= k2 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k8; b3 -= k0;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k6; b1 -= k7;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k2 + t0; b7 -= k3 + 14;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k0; b5 -= k1 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k7; b3 -= k8;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k5; b1 -= k6;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k1 + t2; b7 -= k2 + 13;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k8; b5 -= k0 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k6; b3 -= k7;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k4; b1 -= k5;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k0 + t1; b7 -= k1 + 12;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k7; b5 -= k8 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k5; b3 -= k6;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k3; b1 -= k4;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k8 + t0; b7 -= k0 + 11;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k6; b5 -= k7 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k4; b3 -= k5;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k2; b1 -= k3;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k7 + t2; b7 -= k8 + 10;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k5; b5 -= k6 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k3; b3 -= k4;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k1; b1 -= k2;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k6 + t1; b7 -= k7 + 9;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k4; b5 -= k5 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k2; b3 -= k3;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k0; b1 -= k1;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k5 + t0; b7 -= k6 + 8;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k3; b5 -= k4 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k1; b3 -= k2;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k8; b1 -= k0;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k4 + t2; b7 -= k5 + 7;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k2; b5 -= k3 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k0; b3 -= k1;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k7; b1 -= k8;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k3 + t1; b7 -= k4 + 6;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k1; b5 -= k2 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k8; b3 -= k0;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k6; b1 -= k7;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k2 + t0; b7 -= k3 + 5;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k0; b5 -= k1 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k7; b3 -= k8;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k5; b1 -= k6;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k1 + t2; b7 -= k2 + 4;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k8; b5 -= k0 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k6; b3 -= k7;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k4; b1 -= k5;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k0 + t1; b7 -= k1 + 3;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k7; b5 -= k8 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k5; b3 -= k6;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k3; b1 -= k4;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k8 + t0; b7 -= k0 + 2;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k6; b5 -= k7 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k4; b3 -= k5;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k2; b1 -= k3;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k7 + t2; b7 -= k8 + 1;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k5; b5 -= k6 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k3; b3 -= k4;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k1; b1 -= k2;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k6 + t1; b7 -= k7;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k4; b5 -= k5 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k2; b3 -= k3;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k0; b1 -= k1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k0;
+	b1 -= k1;
 
 	output[0] = b0;
 	output[1] = b1;
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index e8ce06a9122f..1e70f66b7032 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -3,8 +3,9 @@
 #include <linux/string.h>
 #include <threefishApi.h>
 
-void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize,
-		     u64 *keyData, u64 *tweak)
+void threefishSetKey(struct threefish_key *keyCtx,
+			enum threefish_size stateSize,
+			u64 *keyData, u64 *tweak)
 {
 	int keyWords = stateSize / 64;
 	int i;
@@ -28,9 +29,9 @@ void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
 	u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
 	u64 cipher[SKEIN_MAX_STATE_WORDS];
 
-	Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
+	Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);
 	threefishEncryptBlockWords(keyCtx, plain, cipher);
-	Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
+	Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);
 }
 
 void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in,
@@ -55,9 +56,9 @@ void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
 	u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
 	u64 cipher[SKEIN_MAX_STATE_WORDS];
 
-	Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
+	Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);
 	threefishDecryptBlockWords(keyCtx, cipher, plain);
-	Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
+	Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);
 }
 
 void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in,
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 16/22] staging: crypto: skein: fix do/while brace formatting
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (14 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 15/22] staging: crypto: skein: cleanup >80 character lines Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 17/22] staging: crypto: skein: fix brace placement errors Jason Cooper
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/skein_block.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 780b4936f783..6e0f4a21aae3 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -215,8 +215,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-	}
-	while (--blkCnt);
+	} while (--blkCnt);
 	ctx->h.T[0] = ts[0];
 	ctx->h.T[1] = ts[1];
 }
@@ -434,8 +433,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-	}
-	while (--blkCnt);
+	} while (--blkCnt);
 	ctx->h.T[0] = ts[0];
 	ctx->h.T[1] = ts[1];
 }
@@ -717,8 +715,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
 		blkPtr += SKEIN1024_BLOCK_BYTES;
-	}
-	while (--blkCnt);
+	} while (--blkCnt);
 	ctx->h.T[0] = ts[0];
 	ctx->h.T[1] = ts[1];
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 17/22] staging: crypto: skein: fix brace placement errors
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (15 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 16/22] staging: crypto: skein: fix do/while brace formatting Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 18/22] staging: crypto: skein: wrap multi-line macros in do-while loops Jason Cooper
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h    |  30 ++++-----
 drivers/staging/skein/include/skein_iv.h |  65 ++++++++----------
 drivers/staging/skein/skein.c            | 111 ++++++++++---------------------
 3 files changed, 74 insertions(+), 132 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index f92dc40711d1..0a2abcecd2f7 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -37,12 +37,11 @@
 #define Skein_Get64_LSB_First(dst64, src08, wCnt) memcpy(dst64, src08, 8*(wCnt))
 #define Skein_Swap64(w64)  (w64)
 
-enum
-	{
+enum {
 	SKEIN_SUCCESS         =      0, /* return codes from Skein calls */
 	SKEIN_FAIL            =      1,
 	SKEIN_BAD_HASHLEN     =      2
-	};
+};
 
 #define  SKEIN_MODIFIER_WORDS   (2) /* number of modifier (tweak) words */
 
@@ -63,33 +62,29 @@ enum
 #define  SKEIN_512_BLOCK_BYTES  (8*SKEIN_512_STATE_WORDS)
 #define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
 
-struct skein_ctx_hdr
-	{
+struct skein_ctx_hdr {
 	size_t  hashBitLen;		/* size of hash result, in bits */
 	size_t  bCnt;			/* current byte count in buffer b[] */
 	u64  T[SKEIN_MODIFIER_WORDS];	/* tweak: T[0]=byte cnt, T[1]=flags */
-	};
+};
 
-struct skein_256_ctx /* 256-bit Skein hash context structure */
-	{
+struct skein_256_ctx { /* 256-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
 	u64  X[SKEIN_256_STATE_WORDS];	/* chaining variables */
 	u8  b[SKEIN_256_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
-	};
+};
 
-struct skein_512_ctx /* 512-bit Skein hash context structure */
-	{
+struct skein_512_ctx { /* 512-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
 	u64  X[SKEIN_512_STATE_WORDS];	/* chaining variables */
 	u8  b[SKEIN_512_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
-	};
+};
 
-struct skein1024_ctx /* 1024-bit Skein hash context structure */
-	{
+struct skein1024_ctx { /* 1024-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
 	u64  X[SKEIN1024_STATE_WORDS];	/* chaining variables */
 	u8  b[SKEIN1024_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
-	};
+};
 
 /*   Skein APIs for (incremental) "straight hashing" */
 int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
@@ -296,8 +291,7 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 /*****************************************************************
 ** Skein block function constants (shared across Ref and Opt code)
 ******************************************************************/
-enum
-	{
+enum {
 	    /* Skein_256 round rotation constants */
 	R_256_0_0 = 14, R_256_0_1 = 16,
 	R_256_1_0 = 52, R_256_1_1 = 57,
@@ -335,7 +329,7 @@ enum
 	R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
 	R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52,
 	R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
-	};
+};
 
 #ifndef SKEIN_ROUNDS
 #define SKEIN_256_ROUNDS_TOTAL (72)	/* # rounds for diff block sizes */
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index bbbba77c44d3..8dd5e4d88a1d 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -20,44 +20,39 @@
 #define MK_64 SKEIN_MK_64
 
 /* blkSize =  256 bits. hashSize =  128 bits */
-const u64 SKEIN_256_IV_128[] =
-	{
+const u64 SKEIN_256_IV_128[] = {
 	MK_64(0xE1111906, 0x964D7260),
 	MK_64(0x883DAAA7, 0x7C8D811C),
 	MK_64(0x10080DF4, 0x91960F7A),
 	MK_64(0xCCF7DDE5, 0xB45BC1C2)
-	};
+};
 
 /* blkSize =  256 bits. hashSize =  160 bits */
-const u64 SKEIN_256_IV_160[] =
-	{
+const u64 SKEIN_256_IV_160[] = {
 	MK_64(0x14202314, 0x72825E98),
 	MK_64(0x2AC4E9A2, 0x5A77E590),
 	MK_64(0xD47A5856, 0x8838D63E),
 	MK_64(0x2DD2E496, 0x8586AB7D)
-	};
+};
 
 /* blkSize =  256 bits. hashSize =  224 bits */
-const u64 SKEIN_256_IV_224[] =
-	{
+const u64 SKEIN_256_IV_224[] = {
 	MK_64(0xC6098A8C, 0x9AE5EA0B),
 	MK_64(0x876D5686, 0x08C5191C),
 	MK_64(0x99CB88D7, 0xD7F53884),
 	MK_64(0x384BDDB1, 0xAEDDB5DE)
-	};
+};
 
 /* blkSize =  256 bits. hashSize =  256 bits */
-const u64 SKEIN_256_IV_256[] =
-	{
+const u64 SKEIN_256_IV_256[] = {
 	MK_64(0xFC9DA860, 0xD048B449),
 	MK_64(0x2FCA6647, 0x9FA7D833),
 	MK_64(0xB33BC389, 0x6656840F),
 	MK_64(0x6A54E920, 0xFDE8DA69)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  128 bits */
-const u64 SKEIN_512_IV_128[] =
-	{
+const u64 SKEIN_512_IV_128[] = {
 	MK_64(0xA8BC7BF3, 0x6FBF9F52),
 	MK_64(0x1E9872CE, 0xBD1AF0AA),
 	MK_64(0x309B1790, 0xB32190D3),
@@ -66,11 +61,10 @@ const u64 SKEIN_512_IV_128[] =
 	MK_64(0x1A18EBEA, 0xD46A32E3),
 	MK_64(0xA2CC5B18, 0xCE84AA82),
 	MK_64(0x6982AB28, 0x9D46982D)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  160 bits */
-const u64 SKEIN_512_IV_160[] =
-	{
+const u64 SKEIN_512_IV_160[] = {
 	MK_64(0x28B81A2A, 0xE013BD91),
 	MK_64(0xC2F11668, 0xB5BDF78F),
 	MK_64(0x1760D8F3, 0xF6A56F12),
@@ -79,11 +73,10 @@ const u64 SKEIN_512_IV_160[] =
 	MK_64(0xD908922E, 0x63ED70B8),
 	MK_64(0xB8EC76FF, 0xECCB52FA),
 	MK_64(0x01A47BB8, 0xA3F27A6E)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  224 bits */
-const u64 SKEIN_512_IV_224[] =
-	{
+const u64 SKEIN_512_IV_224[] = {
 	MK_64(0xCCD06162, 0x48677224),
 	MK_64(0xCBA65CF3, 0xA92339EF),
 	MK_64(0x8CCD69D6, 0x52FF4B64),
@@ -92,11 +85,10 @@ const u64 SKEIN_512_IV_224[] =
 	MK_64(0x6776FE65, 0x75D4EB3D),
 	MK_64(0x99FBC70E, 0x997413E9),
 	MK_64(0x9E2CFCCF, 0xE1C41EF7)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  256 bits */
-const u64 SKEIN_512_IV_256[] =
-	{
+const u64 SKEIN_512_IV_256[] = {
 	MK_64(0xCCD044A1, 0x2FDB3E13),
 	MK_64(0xE8359030, 0x1A79A9EB),
 	MK_64(0x55AEA061, 0x4F816E6F),
@@ -105,11 +97,10 @@ const u64 SKEIN_512_IV_256[] =
 	MK_64(0xE7A436CD, 0xC4746251),
 	MK_64(0xC36FBAF9, 0x393AD185),
 	MK_64(0x3EEDBA18, 0x33EDFC13)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  384 bits */
-const u64 SKEIN_512_IV_384[] =
-	{
+const u64 SKEIN_512_IV_384[] = {
 	MK_64(0xA3F6C6BF, 0x3A75EF5F),
 	MK_64(0xB0FEF9CC, 0xFD84FAA4),
 	MK_64(0x9D77DD66, 0x3D770CFE),
@@ -118,11 +109,10 @@ const u64 SKEIN_512_IV_384[] =
 	MK_64(0x7ED7D434, 0xE5807407),
 	MK_64(0x548FC1AC, 0xD4EC44D6),
 	MK_64(0x266E1754, 0x6AA18FF8)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  512 bits */
-const u64 SKEIN_512_IV_512[] =
-	{
+const u64 SKEIN_512_IV_512[] = {
 	MK_64(0x4903ADFF, 0x749C51CE),
 	MK_64(0x0D95DE39, 0x9746DF03),
 	MK_64(0x8FD19341, 0x27C79BCE),
@@ -131,11 +121,10 @@ const u64 SKEIN_512_IV_512[] =
 	MK_64(0xEABE394C, 0xA9D5C3F4),
 	MK_64(0x991112C7, 0x1A75B523),
 	MK_64(0xAE18A40B, 0x660FCC33)
-	};
+};
 
 /* blkSize = 1024 bits. hashSize =  384 bits */
-const u64 SKEIN1024_IV_384[] =
-	{
+const u64 SKEIN1024_IV_384[] = {
 	MK_64(0x5102B6B8, 0xC1894A35),
 	MK_64(0xFEEBC9E3, 0xFE8AF11A),
 	MK_64(0x0C807F06, 0xE32BED71),
@@ -152,11 +141,10 @@ const u64 SKEIN1024_IV_384[] =
 	MK_64(0x3B5A6530, 0x0DBC6516),
 	MK_64(0x484B9CD2, 0x167BBCE1),
 	MK_64(0x2D136947, 0xD4CBAFEA)
-	};
+};
 
 /* blkSize = 1024 bits. hashSize =  512 bits */
-const u64 SKEIN1024_IV_512[] =
-	{
+const u64 SKEIN1024_IV_512[] = {
 	MK_64(0xCAEC0E5D, 0x7C1B1B18),
 	MK_64(0xA01B0E04, 0x5F03E802),
 	MK_64(0x33840451, 0xED912885),
@@ -173,11 +161,10 @@ const u64 SKEIN1024_IV_512[] =
 	MK_64(0x67070872, 0x5B749816),
 	MK_64(0xB9CD28FB, 0xF0581BD1),
 	MK_64(0x0E2940B8, 0x15804974)
-	};
+};
 
 /* blkSize = 1024 bits. hashSize = 1024 bits */
-const u64 SKEIN1024_IV_1024[] =
-	{
+const u64 SKEIN1024_IV_1024[] = {
 	MK_64(0xD593DA07, 0x41E72355),
 	MK_64(0x15B5E511, 0xAC73E00C),
 	MK_64(0x5180E5AE, 0xBAF2C4F0),
@@ -194,6 +181,6 @@ const u64 SKEIN1024_IV_1024[] =
 	MK_64(0x6572DD22, 0xF2B4969A),
 	MK_64(0x61FD3062, 0xD00A579A),
 	MK_64(0x1DE0536E, 0x8682E539)
-	};
+};
 
 #endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index ed603ee7b170..0d8c70c02c6f 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -31,8 +31,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 /* init the context for a straight hashing operation  */
 int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN_256_STATE_BYTES];
 		u64  w[SKEIN_256_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -40,8 +39,7 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
 	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
-	switch (hashBitLen)
-	{             /* use pre-computed values, where available */
+	switch (hashBitLen) { /* use pre-computed values, where available */
 	case  256:
 		memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
 		break;
@@ -91,8 +89,7 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen,
 			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN_256_STATE_BYTES];
 		u64  w[SKEIN_256_STATE_WORDS];
 	} cfg; /* config block */
@@ -101,13 +98,10 @@ int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen,
 	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
 	/* compute the initial chaining values ctx->X[], based on key */
-	if (keyBytes == 0) /* is there a key? */
-	{
+	if (keyBytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
 		memset(ctx->X, 0, sizeof(ctx->X));
-	}
-	else /* here to pre-process a key */
-	{
+	} else { /* here to pre-process a key */
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
@@ -162,15 +156,12 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
 	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
-	if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
-	{
+	if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES) {
 		/* finish up any buffered message data */
-		if (ctx->h.bCnt)
-		{
+		if (ctx->h.bCnt) {
 			/* # bytes free in buffer b[] */
 			n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;
-			if (n)
-			{
+			if (n) {
 				/* check on our logic here */
 				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
@@ -187,8 +178,7 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
 		 * now process any remaining full blocks, directly from input
 		 * message data
 		 */
-		if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
-		{
+		if (msgByteCnt > SKEIN_256_BLOCK_BYTES) {
 			/* number of full blocks to process */
 			n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;
 			Skein_256_Process_Block(ctx, msg, n,
@@ -200,8 +190,7 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
 	}
 
 	/* copy any remaining source message data bytes into b[] */
-	if (msgByteCnt)
-	{
+	if (msgByteCnt) {
 		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
 		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
 		ctx->h.bCnt += msgByteCnt;
@@ -238,8 +227,7 @@ int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -268,8 +256,7 @@ int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 /* init the context for a straight hashing operation  */
 int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN_512_STATE_BYTES];
 		u64  w[SKEIN_512_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -277,8 +264,7 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
 	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
-	switch (hashBitLen)
-	{             /* use pre-computed values, where available */
+	switch (hashBitLen) { /* use pre-computed values, where available */
 	case  512:
 		memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
 		break;
@@ -332,8 +318,7 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen,
 			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN_512_STATE_BYTES];
 		u64  w[SKEIN_512_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -342,13 +327,10 @@ int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen,
 	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
 	/* compute the initial chaining values ctx->X[], based on key */
-	if (keyBytes == 0)                          /* is there a key? */
-	{
+	if (keyBytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
 		memset(ctx->X, 0, sizeof(ctx->X));
-	}
-	else /* here to pre-process a key */
-	{
+	} else { /* here to pre-process a key */
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
@@ -402,15 +384,12 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
 	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
-	if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
-	{
+	if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES) {
 		/* finish up any buffered message data */
-		if (ctx->h.bCnt)
-		{
+		if (ctx->h.bCnt) {
 			/* # bytes free in buffer b[] */
 			n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;
-			if (n)
-			{
+			if (n) {
 				/* check on our logic here */
 				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
@@ -427,8 +406,7 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
 		 * now process any remaining full blocks, directly from input
 		 * message data
 		 */
-		if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
-		{
+		if (msgByteCnt > SKEIN_512_BLOCK_BYTES) {
 			/* number of full blocks to process */
 			n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;
 			Skein_512_Process_Block(ctx, msg, n,
@@ -440,8 +418,7 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
 	}
 
 	/* copy any remaining source message data bytes into b[] */
-	if (msgByteCnt)
-	{
+	if (msgByteCnt) {
 		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
 		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
 		ctx->h.bCnt += msgByteCnt;
@@ -478,8 +455,7 @@ int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -508,8 +484,7 @@ int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 /* init the context for a straight hashing operation  */
 int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN1024_STATE_BYTES];
 		u64  w[SKEIN1024_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -517,8 +492,7 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
 	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
-	switch (hashBitLen)
-	{              /* use pre-computed values, where available */
+	switch (hashBitLen) { /* use pre-computed values, where available */
 	case  512:
 		memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
 		break;
@@ -566,8 +540,7 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen,
 			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN1024_STATE_BYTES];
 		u64  w[SKEIN1024_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -576,13 +549,10 @@ int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen,
 	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
 	/* compute the initial chaining values ctx->X[], based on key */
-	if (keyBytes == 0)                          /* is there a key? */
-	{
+	if (keyBytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
 		memset(ctx->X, 0, sizeof(ctx->X));
-	}
-	else /* here to pre-process a key */
-	{
+	} else { /* here to pre-process a key */
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
@@ -637,15 +607,12 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
 	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
-	if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
-	{
+	if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES) {
 		/* finish up any buffered message data */
-		if (ctx->h.bCnt)
-		{
+		if (ctx->h.bCnt) {
 			/* # bytes free in buffer b[] */
 			n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;
-			if (n)
-			{
+			if (n) {
 				/* check on our logic here */
 				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
@@ -662,8 +629,7 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
 		 * now process any remaining full blocks, directly from input
 		 * message data
 		 */
-		if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
-		{
+		if (msgByteCnt > SKEIN1024_BLOCK_BYTES) {
 			/* number of full blocks to process */
 			n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;
 			Skein1024_Process_Block(ctx, msg, n,
@@ -675,8 +641,7 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
 	}
 
 	/* copy any remaining source message data bytes into b[] */
-	if (msgByteCnt)
-	{
+	if (msgByteCnt) {
 		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
 		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
 		ctx->h.bCnt += msgByteCnt;
@@ -713,8 +678,7 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -823,8 +787,7 @@ int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -863,8 +826,7 @@ int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -903,8 +865,7 @@ int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 18/22] staging: crypto: skein: wrap multi-line macros in do-while loops
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (16 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 17/22] staging: crypto: skein: fix brace placement errors Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 19/22] staging: crypto: skein: remove externs from .c files Jason Cooper
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/skein_block.c | 66 ++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 6e0f4a21aae3..707a21ae53c6 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -107,27 +107,36 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 		/* run the rounds */
 
 #define Round256(p0, p1, p2, p3, ROT, rNum)                              \
+do { \
 	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+} while (0)
 
 #if SKEIN_UNROLL_256 == 0
 #define R256(p0, p1, p2, p3, ROT, rNum) /* fully unrolled */ \
+do { \
 	Round256(p0, p1, p2, p3, ROT, rNum) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr); \
+} while (0)
 
 #define I256(R) \
+do { \
 	/* inject the key schedule value */ \
 	X0   += ks[((R)+1) % 5]; \
 	X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3]; \
 	X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3]; \
 	X3   += ks[((R)+4) % 5] +     (R)+1;       \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 #else /* looping version */
 #define R256(p0, p1, p2, p3, ROT, rNum) \
+do { \
 	Round256(p0, p1, p2, p3, ROT, rNum) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr); \
+} while (0)
 
 #define I256(R) \
+do { \
 	/* inject the key schedule value */ \
 	X0   += ks[r+(R)+0]; \
 	X1   += ks[r+(R)+1] + ts[r+(R)+0]; \
@@ -136,12 +145,14 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 	/* rotate key schedule */ \
 	ks[r + (R) + 4]   = ks[r + (R) - 1]; \
 	ts[r + (R) + 2]   = ts[r + (R) - 1]; \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 
 	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)
 #endif
 		{
 #define R256_8_rounds(R)                  \
+do { \
 		R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
 		R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
 		R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
@@ -151,7 +162,8 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 		R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
 		R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
 		R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
-		I256(2 * (R) + 1);
+		I256(2 * (R) + 1); \
+} while (0)
 
 		R256_8_rounds(0);
 
@@ -311,17 +323,22 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 				 Xptr);
 		/* run the rounds */
 #define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+do { \
 	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 	X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
 	X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
+} while (0)
 
 #if SKEIN_UNROLL_512 == 0
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) /* unrolled */ \
+do { \
 	Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr); \
+} while (0)
 
 #define I512(R) \
+do { \
 	/* inject the key schedule value */ \
 	X0   += ks[((R) + 1) % 9]; \
 	X1   += ks[((R) + 2) % 9]; \
@@ -331,13 +348,17 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 	X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3]; \
 	X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3]; \
 	X7   += ks[((R) + 8) % 9] +     (R) + 1;       \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 #else /* looping version */
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+do { \
 	Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr); \
+} while (0)
 
 #define I512(R) \
+do { \
 	/* inject the key schedule value */ \
 	X0   += ks[r + (R) + 0]; \
 	X1   += ks[r + (R) + 1]; \
@@ -350,12 +371,14 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 	/* rotate key schedule */ \
 	ks[r +         (R) + 8] = ks[r + (R) - 1]; \
 	ts[r +         (R) + 2] = ts[r + (R) - 1]; \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 
 		for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)
 #endif /* end of looped code definitions */
 		{
 #define R512_8_rounds(R)  /* do 8 full rounds */  \
+do { \
 		R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
 		R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
 		R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
@@ -365,7 +388,8 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 		R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
 		R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
 		R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
-		I512(2 * (R) + 1);        /* and key injection */
+		I512(2 * (R) + 1);        /* and key injection */ \
+} while (0)
 
 			R512_8_rounds(0);
 
@@ -551,6 +575,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 
 #define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
 			pF, ROT, rNum) \
+do { \
 	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
 	X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
@@ -559,15 +584,19 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 	X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
 	X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
 	X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
+} while (0)
 
 #if SKEIN_UNROLL_1024 == 0
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, \
 		ROT, rn) \
+do { \
 	Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
 			pF, ROT, rn) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr); \
+} while (0)
 
 #define I1024(R) \
+do { \
 	/* inject the key schedule value */ \
 	X00   += ks[((R) +  1) % 17]; \
 	X01   += ks[((R) +  2) % 17]; \
@@ -585,15 +614,19 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 	X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3]; \
 	X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3]; \
 	X15   += ks[((R) + 16) % 17] +     (R) + 1;       \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 #else /* looping version */
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, \
 		ROT, rn) \
+do { \
 	Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
 			pF, ROT, rn) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr); \
+} while (0)
 
 #define I1024(R) \
+do { \
 	/* inject the key schedule value */ \
 	X00   += ks[r + (R) +  0]; \
 	X01   += ks[r + (R) +  1]; \
@@ -614,12 +647,14 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 	/* rotate key schedule */ \
 	ks[r  +         (R) + 16] = ks[r + (R) - 1]; \
 	ts[r  +         (R) +  2] = ts[r + (R) - 1]; \
-	Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 
 		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)
 #endif
 		{
 #define R1024_8_rounds(R) \
+do { \
 	R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, \
 		R1024_0, 8*(R) + 1); \
 	R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, \
@@ -637,7 +672,8 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 		R1024_6, 8*(R) + 7); \
 	R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, \
 		R1024_7, 8*(R) + 8); \
-	I1024(2*(R)+1);
+	I1024(2*(R)+1); \
+} while (0)
 
 			R1024_8_rounds(0);
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 19/22] staging: crypto: skein: remove externs from .c files
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (17 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 18/22] staging: crypto: skein: wrap multi-line macros in do-while loops Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 20/22] staging: crypto: skein: remove braces from single-statement block Jason Cooper
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein_block.h | 22 ++++++++++++++++++++++
 drivers/staging/skein/skein.c               | 10 +---------
 2 files changed, 23 insertions(+), 9 deletions(-)
 create mode 100644 drivers/staging/skein/include/skein_block.h

diff --git a/drivers/staging/skein/include/skein_block.h b/drivers/staging/skein/include/skein_block.h
new file mode 100644
index 000000000000..b15c079b5bd4
--- /dev/null
+++ b/drivers/staging/skein/include/skein_block.h
@@ -0,0 +1,22 @@
+/***********************************************************************
+**
+** Implementation of the Skein hash function.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+************************************************************************/
+#ifndef _SKEIN_BLOCK_H_
+#define _SKEIN_BLOCK_H_
+
+#include <skein.h> /* get the Skein API definitions   */
+
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+
+#endif
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 0d8c70c02c6f..096b86bf9430 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -13,15 +13,7 @@
 #include <linux/string.h>       /* get the memcpy/memset functions */
 #include <skein.h> /* get the Skein API definitions   */
 #include <skein_iv.h>    /* get precomputed IVs */
-
-/*****************************************************************/
-/* External function to process blkCnt (nonzero) full block(s) of data. */
-void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
-				size_t blkCnt, size_t byteCntAdd);
-void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
-				size_t blkCnt, size_t byteCntAdd);
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
-				size_t blkCnt, size_t byteCntAdd);
+#include <skein_block.h>
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 20/22] staging: crypto: skein: remove braces from single-statement block
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (18 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 19/22] staging: crypto: skein: remove externs from .c files Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 21/22] staging: crypto: skein: remove unnecessary line continuation Jason Cooper
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/skeinApi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index f0015d5b10f5..dd109bf6f7b9 100644
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -188,9 +188,9 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
 			msgBitCnt == 0, SKEIN_FAIL);
 
 	/* if number of bits is a multiple of bytes - that's easy */
-	if ((msgBitCnt & 0x7) == 0) {
+	if ((msgBitCnt & 0x7) == 0)
 		return skeinUpdate(ctx, msg, msgBitCnt >> 3);
-	}
+
 	skeinUpdate(ctx, msg, (msgBitCnt >> 3) + 1);
 
 	/*
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 21/22] staging: crypto: skein: remove unnecessary line continuation
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (19 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 20/22] staging: crypto: skein: remove braces from single-statement block Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-11 21:32 ` [RFC PATCH 22/22] staging: crypto: skein: add TODO file Jason Cooper
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/skein_block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 707a21ae53c6..fd96ca0ad0ed 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -477,7 +477,7 @@ unsigned int Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 				size_t blkCnt, size_t byteCntAdd)
 { /* do it in C, always looping (unrolled is bigger AND slower!) */
 	enum {
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [RFC PATCH 22/22] staging: crypto: skein: add TODO file
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (20 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 21/22] staging: crypto: skein: remove unnecessary line continuation Jason Cooper
@ 2014-03-11 21:32 ` Jason Cooper
  2014-03-12 16:55 ` [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-11 21:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/TODO | 11 +++++++++++
 1 file changed, 11 insertions(+)
 create mode 100644 drivers/staging/skein/TODO

diff --git a/drivers/staging/skein/TODO b/drivers/staging/skein/TODO
new file mode 100644
index 000000000000..f5c167a305ae
--- /dev/null
+++ b/drivers/staging/skein/TODO
@@ -0,0 +1,11 @@
+skein/threefish TODO
+
+ - rename camelcase vars
+ - rename camelcase functions
+ - rename files
+ - move macros into appropriate header files
+ - add / pass test vectors
+ - module support
+
+Please send patches to Jason Cooper <jason@lakedaemon.net> in addition to the
+staging tree mailinglist.
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 00/22] staging: add skein/threefish crypto algos
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (21 preceding siblings ...)
  2014-03-11 21:32 ` [RFC PATCH 22/22] staging: crypto: skein: add TODO file Jason Cooper
@ 2014-03-12 16:55 ` Jason Cooper
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
  23 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-12 16:55 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto

On Tue, Mar 11, 2014 at 09:32:32PM +0000, Jason Cooper wrote:
> To facilitate tinkering with this, One can pull from the following:
> 
>   git://git.infradead.org/users/jcooper/linux.git tags/staging-skein-3.14-rc1
> 
> This is based on v3.14-rc1, and is prone to rebasing based on comments.

Hmmm, apparently, I forgot to tell you *how* to tinker with this :)

Once you've pulled in the series,

$ make ARCH=x86_64 defconfig

enable CRYPTO_SKEIN (implies CRYPTO_THREEFISH) under drivers/staging.

$ git rebase -i v3.14-rc1

In the rebase config file:

#######################################
pick cc77327 scripts: objdiff: detect object code changes between two commits
pick bdb4dad staging: crypto: skein: import code from Skein3Fish.git
pick e3d822c staging: crypto: skein: allow building statically
x kbuild.sh none
x ./scripts/objdiff record drivers/staging/skein/*.o

pick 227c819 staging: crypto: skein: remove brg_*.h includes
x kbuild.sh none
x ./scripts/objdiff record drivers/staging/skein/*.o
x ./scripts/objdiff diff HEAD^ HEAD

pick c175303 staging: crypto: skein: remove skein_port.h
x kbuild.sh none
x ./scripts/objdiff record drivers/staging/skein/*.o
x ./scripts/objdiff diff HEAD^ HEAD

pick a612f45 staging: crypto: skein: remove __cplusplus and an unneeded stddef.h
x kbuild.sh none
x ./scripts/objdiff record drivers/staging/skein/*.o
x ./scripts/objdiff diff HEAD^ HEAD

... lather, rinse, repeat ...

pick 4438372 staging: crypto: skein: remove unnecessary line continuation
x kbuild.sh none
x ./scripts/objdiff record drivers/staging/skein/*.o
x ./scripts/objdiff diff HEAD^ HEAD

pick 641f05d staging: crypto: skein: add TODO file
x kbuild.sh none
x ./scripts/objdiff record drivers/staging/skein/*.o
x ./scripts/objdiff diff HEAD^ HEAD

...
#######################################

Save, quit, let it run (kbuild.sh is my local script for building the
kernel)

If it makes it to the end, then the object code never changed.  Well, it
works here. ;-)

Once you're finished poking around (objdump output in is
/tmp/objdiff-<commit>/):

$ ./scripts/objdiff clean all

Also, it looks like linux-crypto dropped several of the patches for
being too large.  Here's a link to the top of the series (choose the
linuxdriverproject archive):

https://lkml.kernel.org/r/cover.1394570067.git.jason@lakedaemon.net

hth,

Jason.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 03/22] staging: crypto: skein: allow building statically
  2014-03-11 21:32 ` [RFC PATCH 03/22] staging: crypto: skein: allow building statically Jason Cooper
@ 2014-03-17 21:52   ` Greg KH
  2014-03-18 12:58     ` Jason Cooper
  0 siblings, 1 reply; 51+ messages in thread
From: Greg KH @ 2014-03-17 21:52 UTC (permalink / raw)
  To: Jason Cooper; +Cc: Herbert Xu, David S. Miller, devel, linux-crypto

On Tue, Mar 11, 2014 at 09:32:35PM +0000, Jason Cooper wrote:
> These are the minimum changes required to get the code to build
> statically in the kernel.  It's necessary to do this first so that we
> can empirically determine that future cleanup patches aren't changing
> the generated object code.
> 
> Signed-off-by: Jason Cooper <jason@lakedaemon.net>

This doesn't apply to my latest tree :(

> --- a/drivers/staging/Makefile
> +++ b/drivers/staging/Makefile
> @@ -65,3 +65,4 @@ obj-$(CONFIG_XILLYBUS)		+= xillybus/
>  obj-$(CONFIG_DGNC)			+= dgnc/
>  obj-$(CONFIG_DGAP)			+= dgap/
>  obj-$(CONFIG_MTD_SPINAND_MT29F)	+= mt29f_spinand/
> +obj-$(CONFIG_CRYPTO_SKEIN) += skein/

Care to align these up with the way this file is formatted?

And I have no objection to taking the drivers/staging/ patches, the
script looks useful, but I can't take it through the staging tree,
sorry.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 03/22] staging: crypto: skein: allow building statically
  2014-03-17 21:52   ` Greg KH
@ 2014-03-18 12:58     ` Jason Cooper
  2014-03-18 14:28       ` Greg KH
  0 siblings, 1 reply; 51+ messages in thread
From: Jason Cooper @ 2014-03-18 12:58 UTC (permalink / raw)
  To: Greg KH; +Cc: Herbert Xu, David S. Miller, devel, linux-crypto

On Mon, Mar 17, 2014 at 02:52:52PM -0700, Greg KH wrote:
> On Tue, Mar 11, 2014 at 09:32:35PM +0000, Jason Cooper wrote:
> > These are the minimum changes required to get the code to build
> > statically in the kernel.  It's necessary to do this first so that we
> > can empirically determine that future cleanup patches aren't changing
> > the generated object code.
> > 
> > Signed-off-by: Jason Cooper <jason@lakedaemon.net>
> 
> This doesn't apply to my latest tree :(

Ah, ok.  I'll rebase this series on the staging tree.

> > --- a/drivers/staging/Makefile
> > +++ b/drivers/staging/Makefile
> > @@ -65,3 +65,4 @@ obj-$(CONFIG_XILLYBUS)		+= xillybus/
> >  obj-$(CONFIG_DGNC)			+= dgnc/
> >  obj-$(CONFIG_DGAP)			+= dgap/
> >  obj-$(CONFIG_MTD_SPINAND_MT29F)	+= mt29f_spinand/
> > +obj-$(CONFIG_CRYPTO_SKEIN) += skein/
> 
> Care to align these up with the way this file is formatted?

Of course, not sure what happened there (well, other than the obvious
:-P)

> And I have no objection to taking the drivers/staging/ patches, the
> script looks useful, but I can't take it through the staging tree,
> sorry.

Ok, I'll pull that out as a separate branch.  Do you mind taking a
series that depends on a topic branch from another tree?  We do it a lot
in arm-soc, but I'm not sure how popular that is elsewhere.

It's purely an audit/testing dependency, but it would be nice to have it
available in the history if someone wants to audit the changes.

I have one change I'd like to do to the objdiff script.  I'd like it to
assume 'HEAD^ HEAD' when the user executes './scripts/objdiff diff'.

I'll respin both and submit a v1.

Thanks for the review.

thx,

Jason.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 03/22] staging: crypto: skein: allow building statically
  2014-03-18 12:58     ` Jason Cooper
@ 2014-03-18 14:28       ` Greg KH
  2014-03-24  2:22         ` Jason Cooper
  0 siblings, 1 reply; 51+ messages in thread
From: Greg KH @ 2014-03-18 14:28 UTC (permalink / raw)
  To: Jason Cooper; +Cc: devel, David S. Miller, linux-crypto, Herbert Xu

On Tue, Mar 18, 2014 at 08:58:49AM -0400, Jason Cooper wrote:
> On Mon, Mar 17, 2014 at 02:52:52PM -0700, Greg KH wrote:
> > On Tue, Mar 11, 2014 at 09:32:35PM +0000, Jason Cooper wrote:
> > > These are the minimum changes required to get the code to build
> > > statically in the kernel.  It's necessary to do this first so that we
> > > can empirically determine that future cleanup patches aren't changing
> > > the generated object code.
> > > 
> > > Signed-off-by: Jason Cooper <jason@lakedaemon.net>
> > 
> > This doesn't apply to my latest tree :(
> 
> Ah, ok.  I'll rebase this series on the staging tree.
> 
> > > --- a/drivers/staging/Makefile
> > > +++ b/drivers/staging/Makefile
> > > @@ -65,3 +65,4 @@ obj-$(CONFIG_XILLYBUS)		+= xillybus/
> > >  obj-$(CONFIG_DGNC)			+= dgnc/
> > >  obj-$(CONFIG_DGAP)			+= dgap/
> > >  obj-$(CONFIG_MTD_SPINAND_MT29F)	+= mt29f_spinand/
> > > +obj-$(CONFIG_CRYPTO_SKEIN) += skein/
> > 
> > Care to align these up with the way this file is formatted?
> 
> Of course, not sure what happened there (well, other than the obvious
> :-P)
> 
> > And I have no objection to taking the drivers/staging/ patches, the
> > script looks useful, but I can't take it through the staging tree,
> > sorry.
> 
> Ok, I'll pull that out as a separate branch.  Do you mind taking a
> series that depends on a topic branch from another tree?  We do it a lot
> in arm-soc, but I'm not sure how popular that is elsewhere.

It's not a dependancy at all, and I don't take git pull requests for the
staging tree, just email patches, sorry.

So just resend these patches thanks.

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH V2 00/21] staging: add skein/threefish crypto algos
  2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
                   ` (22 preceding siblings ...)
  2014-03-12 16:55 ` [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
@ 2014-03-24  1:48 ` Jason Cooper
  2014-03-24  1:48   ` [PATCH V2 01/21] staging: crypto: skein: import code from Skein3Fish.git Jason Cooper
                     ` (20 more replies)
  23 siblings, 21 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:48 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Greg, all,

Changes from RFC:

 - dropped scripts/objdiff patch to be submitted separately

 - rebased onto staging-next and resolved conflicts

-- updated original text to reflect the above --

Attached is a series I've sat on for the past month and a half.

I'm hoping that by posting it in it's incomplete state, it will either a)
motivate me to finish hooking into the crypto API, or b) motivate someone else
to pitch in. ;-)

>From patch 2, all commits build successfully.  In addition, using the script
I've submitted separately (objdiff), I can confirm that no object code was
harmed in this process.

I'm under no time crunch with this, and I add to it as I find time.  If Greg
wants to take it for v3.15, great.  Otherwise is fine as well.

Barring a few false-positives, this series makes the code checkpatch-clean, but
it is not ready for mainline as yet.  In particular, I really don't like the
adhoc macro definitions, nor the camelCase.

The plan is to get skein and threefish registered into the crypto API, build as
modules, and then move it to crypto/.

thx,

Jason.


Jason Cooper (21):
  staging: crypto: skein: import code from Skein3Fish.git
  staging: crypto: skein: allow building statically
  staging: crypto: skein: remove brg_*.h includes
  staging: crypto: skein: remove skein_port.h
  staging: crypto: skein: remove __cplusplus and an unneeded stddef.h
  staging: crypto: skein: remove unneeded typedefs
  staging: crypto: skein: remove all typedef {struct,enum}
  staging: crypto: skein: use u8, u64 vice uint*_t
  staging: crypto: skein: fixup pointer whitespace
  staging: crypto: skein: cleanup whitespace around operators/punc.
  staging: crypto: skein: dos2unix, remove executable perms
  staging: crypto: skein: fix leading whitespace
  staging: crypto: skein: remove trailing whitespace
  staging: crypto: skein: cleanup >80 character lines
  staging: crypto: skein: fix do/while brace formatting
  staging: crypto: skein: fix brace placement errors
  staging: crypto: skein: wrap multi-line macros in do-while loops
  staging: crypto: skein: remove externs from .c files
  staging: crypto: skein: remove braces from single-statement block
  staging: crypto: skein: remove unnecessary line continuation
  staging: crypto: skein: add TODO file

 drivers/staging/Kconfig                      |    2 +
 drivers/staging/Makefile                     |    1 +
 drivers/staging/skein/Kconfig                |   32 +
 drivers/staging/skein/Makefile               |   13 +
 drivers/staging/skein/TODO                   |   11 +
 drivers/staging/skein/include/skein.h        |  344 ++
 drivers/staging/skein/include/skeinApi.h     |  230 ++
 drivers/staging/skein/include/skein_block.h  |   22 +
 drivers/staging/skein/include/skein_iv.h     |  186 +
 drivers/staging/skein/include/threefishApi.h |  164 +
 drivers/staging/skein/skein.c                |  880 +++++
 drivers/staging/skein/skeinApi.c             |  237 ++
 drivers/staging/skein/skeinBlockNo3F.c       |  175 +
 drivers/staging/skein/skein_block.c          |  770 ++++
 drivers/staging/skein/threefish1024Block.c   | 4900 ++++++++++++++++++++++++++
 drivers/staging/skein/threefish256Block.c    | 1137 ++++++
 drivers/staging/skein/threefish512Block.c    | 2223 ++++++++++++
 drivers/staging/skein/threefishApi.c         |   79 +
 18 files changed, 11406 insertions(+)
 create mode 100644 drivers/staging/skein/Kconfig
 create mode 100644 drivers/staging/skein/Makefile
 create mode 100644 drivers/staging/skein/TODO
 create mode 100644 drivers/staging/skein/include/skein.h
 create mode 100644 drivers/staging/skein/include/skeinApi.h
 create mode 100644 drivers/staging/skein/include/skein_block.h
 create mode 100644 drivers/staging/skein/include/skein_iv.h
 create mode 100644 drivers/staging/skein/include/threefishApi.h
 create mode 100644 drivers/staging/skein/skein.c
 create mode 100644 drivers/staging/skein/skeinApi.c
 create mode 100644 drivers/staging/skein/skeinBlockNo3F.c
 create mode 100644 drivers/staging/skein/skein_block.c
 create mode 100644 drivers/staging/skein/threefish1024Block.c
 create mode 100644 drivers/staging/skein/threefish256Block.c
 create mode 100644 drivers/staging/skein/threefish512Block.c
 create mode 100644 drivers/staging/skein/threefishApi.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH V2 01/21] staging: crypto: skein: import code from Skein3Fish.git
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
@ 2014-03-24  1:48   ` Jason Cooper
  2014-03-24  1:48   ` [PATCH V2 02/21] staging: crypto: skein: allow building statically Jason Cooper
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:48 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

This is a byte-for-byte copy of the skein implementation found at:

  https://github.com/wernerd/Skein3Fish.git

Specifically, from the master branch at commit:

  00e925444c2c Merge pull request #4 from csm/master

The next commit will do the minimum necessary to build this code as a
module.

I've generated the sha256 sums of the files by:

$ (cd drivers/staging/skein; find . -type f | sort | xargs sha256sum)
bcd73168e5805b1b157dbf08863e6a8c217a7b270b6be1a361540591b00624e3  ./CMakeLists.txt
e1adb97dd9e87bc7c05892ed7863a66d1d9fde6728a97a8b7b092709da664d29  ./include/brg_endian.h
240329b4ca4d829ac4d1490e96e83118e161e719e448c7e8dbf15735ab8a8e87  ./include/brg_types.h
0d8f16438f641fa365844a5991220eb04969f0a19c60dff08e10f521e74db5c3  ./include/skein.h
8f7362796e9e43f7619d51020d6faeedce786492b65bebd2ff6a833b621051cb  ./include/skeinApi.h
90510d8a9f686c3bfbf6cf7737237e3fa263c1ed5046b0f19727ba55b9bffeb9  ./include/skein_iv.h
42c6c8eff8f364ee2f0de3177d468dbceba9c6a73222fea473fe6d603213806a  ./include/skein_port.h
0154a4b8d54f5aa424b39a7ee668b31f2522b907bf3a8536fe46440b584531a1  ./include/threefishApi.h
ac0fc0f95a48a716d30cf02e5adad77af17725a938f939cf94f6dfba42badeca  ./skein.c
7af70b177bc63690f68eebceca2dbfef8a4473dcc847ae3525508c65c7d7bcc1  ./skeinApi.c
d7ef7330be8253f7f061de3c36880dbc83b0f5d90c8f2b72d3478766f54fbff0  ./skeinBlockNo3F.c
8bb3d7864afc9eab5569949fb2799cb6f14e583ba00641313cf877a5aea1c763  ./skein_block.c
438e6cb59a0090166e8f1e39418c0a2d0036737a32c5e2822c2ed8b803e2132f  ./threefish1024Block.c
e812ec6f2881300e90c803cfd9d044e954f1ca64faa2fc17a709f56a2f122ff8  ./threefish256Block.c
926f680057e128cdd1feba4a8544c177a74420137af480267b949ae79f3d02b8  ./threefish512Block.c
19357f5d47e7183bc8558a8d0949a3f5a80a931848917d26f36eebb7d205f003  ./threefishApi.c

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/CMakeLists.txt         |   27 +
 drivers/staging/skein/include/brg_endian.h   |  148 +++
 drivers/staging/skein/include/brg_types.h    |  188 ++++
 drivers/staging/skein/include/skein.h        |  327 ++++++
 drivers/staging/skein/include/skeinApi.h     |  239 +++++
 drivers/staging/skein/include/skein_iv.h     |  199 ++++
 drivers/staging/skein/include/skein_port.h   |  124 +++
 drivers/staging/skein/include/threefishApi.h |  167 ++++
 drivers/staging/skein/skein.c                |  742 ++++++++++++++
 drivers/staging/skein/skeinApi.c             |  221 ++++
 drivers/staging/skein/skeinBlockNo3F.c       |  172 ++++
 drivers/staging/skein/skein_block.c          |  689 +++++++++++++
 drivers/staging/skein/threefish1024Block.c   | 1385 ++++++++++++++++++++++++++
 drivers/staging/skein/threefish256Block.c    |  349 +++++++
 drivers/staging/skein/threefish512Block.c    |  643 ++++++++++++
 drivers/staging/skein/threefishApi.c         |   79 ++
 16 files changed, 5699 insertions(+)
 create mode 100755 drivers/staging/skein/CMakeLists.txt
 create mode 100644 drivers/staging/skein/include/brg_endian.h
 create mode 100644 drivers/staging/skein/include/brg_types.h
 create mode 100644 drivers/staging/skein/include/skein.h
 create mode 100755 drivers/staging/skein/include/skeinApi.h
 create mode 100644 drivers/staging/skein/include/skein_iv.h
 create mode 100644 drivers/staging/skein/include/skein_port.h
 create mode 100644 drivers/staging/skein/include/threefishApi.h
 create mode 100644 drivers/staging/skein/skein.c
 create mode 100755 drivers/staging/skein/skeinApi.c
 create mode 100644 drivers/staging/skein/skeinBlockNo3F.c
 create mode 100644 drivers/staging/skein/skein_block.c
 create mode 100644 drivers/staging/skein/threefish1024Block.c
 create mode 100644 drivers/staging/skein/threefish256Block.c
 create mode 100644 drivers/staging/skein/threefish512Block.c
 create mode 100644 drivers/staging/skein/threefishApi.c

diff --git a/drivers/staging/skein/CMakeLists.txt b/drivers/staging/skein/CMakeLists.txt
new file mode 100755
index 000000000000..604aaa394cb1
--- /dev/null
+++ b/drivers/staging/skein/CMakeLists.txt
@@ -0,0 +1,27 @@
+cmake_minimum_required (VERSION 2.6)
+
+include_directories (${CMAKE_CURRENT_SOURCE_DIR}/include)
+
+# set(skeinBlock_src skein_block.c)
+set(skeinBlock_src skeinBlockNo3F.c)
+
+set(skein_src 
+    ${skeinBlock_src}
+    skein.c
+    skeinApi.c
+    )
+
+set(threefish_src
+    threefishApi.c
+    threefish256Block.c
+    threefish512Block.c
+    threefish1024Block.c
+    )
+set(s3f_src ${skein_src} ${threefish_src})
+
+add_library(skein3fish SHARED ${s3f_src})
+set_target_properties(skein3fish PROPERTIES VERSION ${VERSION} SOVERSION ${SOVERSION})
+target_link_libraries(skein3fish ${LIBS})
+
+install(TARGETS skein3fish DESTINATION ${LIBDIRNAME})
+
diff --git a/drivers/staging/skein/include/brg_endian.h b/drivers/staging/skein/include/brg_endian.h
new file mode 100644
index 000000000000..c03c7c5d1eb4
--- /dev/null
+++ b/drivers/staging/skein/include/brg_endian.h
@@ -0,0 +1,148 @@
+/*
+ ---------------------------------------------------------------------------
+ Copyright (c) 2003, Dr Brian Gladman, Worcester, UK.   All rights reserved.
+
+ LICENSE TERMS
+
+ The free distribution and use of this software in both source and binary
+ form is allowed (with or without changes) provided that:
+
+   1. distributions of this source code include the above copyright
+      notice, this list of conditions and the following disclaimer;
+
+   2. distributions in binary form include the above copyright
+      notice, this list of conditions and the following disclaimer
+      in the documentation and/or other associated materials;
+
+   3. the copyright holder's name is not used to endorse products
+      built using this software without specific written permission.
+
+ ALTERNATIVELY, provided that this notice is retained in full, this product
+ may be distributed under the terms of the GNU General Public License (GPL),
+ in which case the provisions of the GPL apply INSTEAD OF those given above.
+
+ DISCLAIMER
+
+ This software is provided 'as is' with no explicit or implied warranties
+ in respect of its properties, including, but not limited to, correctness
+ and/or fitness for purpose.
+ ---------------------------------------------------------------------------
+ Issue 20/10/2006
+*/
+
+#ifndef BRG_ENDIAN_H
+#define BRG_ENDIAN_H
+
+#define IS_BIG_ENDIAN      4321 /* byte 0 is most significant (mc68k) */
+#define IS_LITTLE_ENDIAN   1234 /* byte 0 is least significant (i386) */
+
+/* Include files where endian defines and byteswap functions may reside */
+#if defined( __FreeBSD__ ) || defined( __OpenBSD__ ) || defined( __NetBSD__ )
+#  include <sys/endian.h>
+#elif defined( BSD ) && ( BSD >= 199103 ) || defined( __APPLE__ ) || \
+      defined( __CYGWIN32__ ) || defined( __DJGPP__ ) || defined( __osf__ )
+#  include <machine/endian.h>
+#elif defined( __linux__ ) || defined( __GNUC__ ) || defined( __GNU_LIBRARY__ )
+#  if !defined( __MINGW32__ ) && !defined(AVR)
+#    include <endian.h>
+#    if !defined( __BEOS__ )
+#      include <byteswap.h>
+#    endif
+#  endif
+#endif
+
+/* Now attempt to set the define for platform byte order using any  */
+/* of the four forms SYMBOL, _SYMBOL, __SYMBOL & __SYMBOL__, which  */
+/* seem to encompass most endian symbol definitions                 */
+
+#if defined( BIG_ENDIAN ) && defined( LITTLE_ENDIAN )
+#  if defined( BYTE_ORDER ) && BYTE_ORDER == BIG_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#  elif defined( BYTE_ORDER ) && BYTE_ORDER == LITTLE_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#  endif
+#elif defined( BIG_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#elif defined( LITTLE_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+
+#if defined( _BIG_ENDIAN ) && defined( _LITTLE_ENDIAN )
+#  if defined( _BYTE_ORDER ) && _BYTE_ORDER == _BIG_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#  elif defined( _BYTE_ORDER ) && _BYTE_ORDER == _LITTLE_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#  endif
+#elif defined( _BIG_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#elif defined( _LITTLE_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+
+#if defined( __BIG_ENDIAN ) && defined( __LITTLE_ENDIAN )
+#  if defined( __BYTE_ORDER ) && __BYTE_ORDER == __BIG_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#  elif defined( __BYTE_ORDER ) && __BYTE_ORDER == __LITTLE_ENDIAN
+#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#  endif
+#elif defined( __BIG_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#elif defined( __LITTLE_ENDIAN )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+
+#if defined( __BIG_ENDIAN__ ) && defined( __LITTLE_ENDIAN__ )
+#  if defined( __BYTE_ORDER__ ) && __BYTE_ORDER__ == __BIG_ENDIAN__
+#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#  elif defined( __BYTE_ORDER__ ) && __BYTE_ORDER__ == __LITTLE_ENDIAN__
+#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#  endif
+#elif defined( __BIG_ENDIAN__ )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#elif defined( __LITTLE_ENDIAN__ )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+
+/*  if the platform byte order could not be determined, then try to */
+/*  set this define using common machine defines                    */
+#if !defined(PLATFORM_BYTE_ORDER)
+
+#if   defined( __alpha__ ) || defined( __alpha ) || defined( i386 )       || \
+      defined( __i386__ )  || defined( _M_I86 )  || defined( _M_IX86 )    || \
+      defined( __OS2__ )   || defined( sun386 )  || defined( __TURBOC__ ) || \
+      defined( vax )       || defined( vms )     || defined( VMS )        || \
+      defined( __VMS )     || defined( _M_X64 )  || defined( AVR )
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+
+#elif defined( AMIGA )   || defined( applec )    || defined( __AS400__ )  || \
+      defined( _CRAY )   || defined( __hppa )    || defined( __hp9000 )   || \
+      defined( ibm370 )  || defined( mc68000 )   || defined( m68k )       || \
+      defined( __MRC__ ) || defined( __MVS__ )   || defined( __MWERKS__ ) || \
+      defined( sparc )   || defined( __sparc)    || defined( SYMANTEC_C ) || \
+      defined( __VOS__ ) || defined( __TIGCC__ ) || defined( __TANDEM )   || \
+      defined( THINK_C ) || defined( __VMCMS__ )
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+
+#elif 0     /* **** EDIT HERE IF NECESSARY **** */
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#elif 0     /* **** EDIT HERE IF NECESSARY **** */
+#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
+#else
+#  error Please edit lines 126 or 128 in brg_endian.h to set the platform byte order
+#endif
+#endif
+
+/* special handler for IA64, which may be either endianness (?)  */
+/* here we assume little-endian, but this may need to be changed */
+#if defined(__ia64) || defined(__ia64__) || defined(_M_IA64)
+#  define PLATFORM_MUST_ALIGN (1)
+#ifndef PLATFORM_BYTE_ORDER
+#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
+#endif
+#endif
+
+#ifndef   PLATFORM_MUST_ALIGN
+#  define PLATFORM_MUST_ALIGN (0)
+#endif
+
+#endif  /* ifndef BRG_ENDIAN_H */
diff --git a/drivers/staging/skein/include/brg_types.h b/drivers/staging/skein/include/brg_types.h
new file mode 100644
index 000000000000..6db737d71b9e
--- /dev/null
+++ b/drivers/staging/skein/include/brg_types.h
@@ -0,0 +1,188 @@
+/*
+ ---------------------------------------------------------------------------
+ Copyright (c) 1998-2006, Brian Gladman, Worcester, UK. All rights reserved.
+
+ LICENSE TERMS
+
+ The free distribution and use of this software in both source and binary
+ form is allowed (with or without changes) provided that:
+
+   1. distributions of this source code include the above copyright
+      notice, this list of conditions and the following disclaimer;
+
+   2. distributions in binary form include the above copyright
+      notice, this list of conditions and the following disclaimer
+      in the documentation and/or other associated materials;
+
+   3. the copyright holder's name is not used to endorse products
+      built using this software without specific written permission.
+
+ ALTERNATIVELY, provided that this notice is retained in full, this product
+ may be distributed under the terms of the GNU General Public License (GPL),
+ in which case the provisions of the GPL apply INSTEAD OF those given above.
+
+ DISCLAIMER
+
+ This software is provided 'as is' with no explicit or implied warranties
+ in respect of its properties, including, but not limited to, correctness
+ and/or fitness for purpose.
+ ---------------------------------------------------------------------------
+ Issue 09/09/2006
+
+ The unsigned integer types defined here are of the form uint_<nn>t where
+ <nn> is the length of the type; for example, the unsigned 32-bit type is
+ 'uint_32t'.  These are NOT the same as the 'C99 integer types' that are
+ defined in the inttypes.h and stdint.h headers since attempts to use these
+ types have shown that support for them is still highly variable.  However,
+ since the latter are of the form uint<nn>_t, a regular expression search
+ and replace (in VC++ search on 'uint_{:z}t' and replace with 'uint\1_t')
+ can be used to convert the types used here to the C99 standard types.
+*/
+
+#ifndef BRG_TYPES_H
+#define BRG_TYPES_H
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#include <limits.h>
+
+#ifndef BRG_UI8
+#  define BRG_UI8
+#  if UCHAR_MAX == 255u
+     typedef unsigned char uint_8t;
+#  else
+#    error Please define uint_8t as an 8-bit unsigned integer type in brg_types.h
+#  endif
+#endif
+
+#ifndef BRG_UI16
+#  define BRG_UI16
+#  if USHRT_MAX == 65535u
+     typedef unsigned short uint_16t;
+#  else
+#    error Please define uint_16t as a 16-bit unsigned short type in brg_types.h
+#  endif
+#endif
+
+#ifndef BRG_UI32
+#  define BRG_UI32
+#  if UINT_MAX == 4294967295u
+#    define li_32(h) 0x##h##u
+     typedef unsigned int uint_32t;
+#  elif ULONG_MAX == 4294967295u
+#    define li_32(h) 0x##h##ul
+     typedef unsigned long uint_32t;
+#  elif defined( _CRAY )
+#    error This code needs 32-bit data types, which Cray machines do not provide
+#  else
+#    error Please define uint_32t as a 32-bit unsigned integer type in brg_types.h
+#  endif
+#endif
+
+#ifndef BRG_UI64
+#  if defined( __BORLANDC__ ) && !defined( __MSDOS__ )
+#    define BRG_UI64
+#    define li_64(h) 0x##h##ui64
+     typedef unsigned __int64 uint_64t;
+#  elif defined( _MSC_VER ) && ( _MSC_VER < 1300 )    /* 1300 == VC++ 7.0 */
+#    define BRG_UI64
+#    define li_64(h) 0x##h##ui64
+     typedef unsigned __int64 uint_64t;
+#  elif defined( __sun ) && defined(ULONG_MAX) && ULONG_MAX == 0xfffffffful
+#    define BRG_UI64
+#    define li_64(h) 0x##h##ull
+     typedef unsigned long long uint_64t;
+#  elif defined( UINT_MAX ) && UINT_MAX > 4294967295u
+#    if UINT_MAX == 18446744073709551615u
+#      define BRG_UI64
+#      define li_64(h) 0x##h##u
+       typedef unsigned int uint_64t;
+#    endif
+#  elif defined( ULONG_MAX ) && ULONG_MAX > 4294967295u
+#    if ULONG_MAX == 18446744073709551615ul
+#      define BRG_UI64
+#      define li_64(h) 0x##h##ul
+       typedef unsigned long uint_64t;
+#    endif
+#  elif defined( ULLONG_MAX ) && ULLONG_MAX > 4294967295u
+#    if ULLONG_MAX == 18446744073709551615ull
+#      define BRG_UI64
+#      define li_64(h) 0x##h##ull
+       typedef unsigned long long uint_64t;
+#    endif
+#  elif defined( ULONG_LONG_MAX ) && ULONG_LONG_MAX > 4294967295u
+#    if ULONG_LONG_MAX == 18446744073709551615ull
+#      define BRG_UI64
+#      define li_64(h) 0x##h##ull
+       typedef unsigned long long uint_64t;
+#    endif
+#  elif defined(__GNUC__)  /* DLW: avoid mingw problem with -ansi */
+#      define BRG_UI64
+#      define li_64(h) 0x##h##ull
+       typedef unsigned long long uint_64t;
+#  endif
+#endif
+
+#if defined( NEED_UINT_64T ) && !defined( BRG_UI64 )
+#  error Please define uint_64t as an unsigned 64 bit type in brg_types.h
+#endif
+
+#ifndef RETURN_VALUES
+#  define RETURN_VALUES
+#  if defined( DLL_EXPORT )
+#    if defined( _MSC_VER ) || defined ( __INTEL_COMPILER )
+#      define VOID_RETURN    __declspec( dllexport ) void __stdcall
+#      define INT_RETURN     __declspec( dllexport ) int  __stdcall
+#    elif defined( __GNUC__ )
+#      define VOID_RETURN    __declspec( __dllexport__ ) void
+#      define INT_RETURN     __declspec( __dllexport__ ) int
+#    else
+#      error Use of the DLL is only available on the Microsoft, Intel and GCC compilers
+#    endif
+#  elif defined( DLL_IMPORT )
+#    if defined( _MSC_VER ) || defined ( __INTEL_COMPILER )
+#      define VOID_RETURN    __declspec( dllimport ) void __stdcall
+#      define INT_RETURN     __declspec( dllimport ) int  __stdcall
+#    elif defined( __GNUC__ )
+#      define VOID_RETURN    __declspec( __dllimport__ ) void
+#      define INT_RETURN     __declspec( __dllimport__ ) int
+#    else
+#      error Use of the DLL is only available on the Microsoft, Intel and GCC compilers
+#    endif
+#  elif defined( __WATCOMC__ )
+#    define VOID_RETURN  void __cdecl
+#    define INT_RETURN   int  __cdecl
+#  else
+#    define VOID_RETURN  void
+#    define INT_RETURN   int
+#  endif
+#endif
+
+/*  These defines are used to declare buffers in a way that allows
+    faster operations on longer variables to be used.  In all these
+    defines 'size' must be a power of 2 and >= 8
+
+    dec_unit_type(size,x)       declares a variable 'x' of length 
+                                'size' bits
+
+    dec_bufr_type(size,bsize,x) declares a buffer 'x' of length 'bsize' 
+                                bytes defined as an array of variables
+                                each of 'size' bits (bsize must be a 
+                                multiple of size / 8)
+
+    ptr_cast(x,size)            casts a pointer to a pointer to a 
+                                varaiable of length 'size' bits
+*/
+
+#define ui_type(size)               uint_##size##t
+#define dec_unit_type(size,x)       typedef ui_type(size) x
+#define dec_bufr_type(size,bsize,x) typedef ui_type(size) x[bsize / (size >> 3)]
+#define ptr_cast(x,size)            ((ui_type(size)*)(x))
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
new file mode 100644
index 000000000000..cb613fa09d9e
--- /dev/null
+++ b/drivers/staging/skein/include/skein.h
@@ -0,0 +1,327 @@
+#ifndef _SKEIN_H_
+#define _SKEIN_H_     1
+/**************************************************************************
+**
+** Interface declarations and internal definitions for Skein hashing.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+***************************************************************************
+** 
+** The following compile-time switches may be defined to control some
+** tradeoffs between speed, code size, error checking, and security.
+**
+** The "default" note explains what happens when the switch is not defined.
+**
+**  SKEIN_DEBUG            -- make callouts from inside Skein code
+**                            to examine/display intermediate values.
+**                            [default: no callouts (no overhead)]
+**
+**  SKEIN_ERR_CHECK        -- how error checking is handled inside Skein
+**                            code. If not defined, most error checking 
+**                            is disabled (for performance). Otherwise, 
+**                            the switch value is interpreted as:
+**                                0: use assert()      to flag errors
+**                                1: return SKEIN_FAIL to flag errors
+**
+***************************************************************************/
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+#include <stddef.h>                          /* get size_t definition */
+#include <skein_port.h>               /* get platform-specific definitions */
+
+enum
+    {
+    SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
+    SKEIN_FAIL            =      1,
+    SKEIN_BAD_HASHLEN     =      2
+    };
+
+#define  SKEIN_MODIFIER_WORDS  ( 2)          /* number of modifier (tweak) words */
+
+#define  SKEIN_256_STATE_WORDS ( 4)
+#define  SKEIN_512_STATE_WORDS ( 8)
+#define  SKEIN1024_STATE_WORDS (16)
+#define  SKEIN_MAX_STATE_WORDS (16)
+
+#define  SKEIN_256_STATE_BYTES ( 8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BYTES ( 8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BYTES ( 8*SKEIN1024_STATE_WORDS)
+
+#define  SKEIN_256_STATE_BITS  (64*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BITS  (64*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BITS  (64*SKEIN1024_STATE_WORDS)
+
+#define  SKEIN_256_BLOCK_BYTES ( 8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_BLOCK_BYTES ( 8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_BLOCK_BYTES ( 8*SKEIN1024_STATE_WORDS)
+
+typedef struct
+    {
+    size_t  hashBitLen;                      /* size of hash result, in bits */
+    size_t  bCnt;                            /* current byte count in buffer b[] */
+    u64b_t  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+    } Skein_Ctxt_Hdr_t;
+
+typedef struct                               /*  256-bit Skein hash context structure */
+    {
+    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    u64b_t  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
+    u08b_t  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    } Skein_256_Ctxt_t;
+
+typedef struct                               /*  512-bit Skein hash context structure */
+    {
+    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    u64b_t  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
+    u08b_t  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    } Skein_512_Ctxt_t;
+
+typedef struct                               /* 1024-bit Skein hash context structure */
+    {
+    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    u64b_t  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
+    u08b_t  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    } Skein1024_Ctxt_t;
+
+/*   Skein APIs for (incremental) "straight hashing" */
+int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
+int  Skein_512_Init  (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
+int  Skein1024_Init  (Skein1024_Ctxt_t *ctx, size_t hashBitLen);
+
+int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
+int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
+int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
+
+int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+
+/*
+**   Skein APIs for "extended" initialization: MAC keys, tree hashing.
+**   After an InitExt() call, just use Update/Final calls as with Init().
+**
+**   Notes: Same parameters as _Init() calls, plus treeInfo/key/keyBytes.
+**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL, 
+**              the results of InitExt() are identical to calling Init().
+**          The function Init() may be called once to "precompute" the IV for
+**              a given hashBitLen value, then by saving a copy of the context
+**              the IV computation may be avoided in later calls.
+**          Similarly, the function InitExt() may be called once per MAC key 
+**              to precompute the MAC IV, then a copy of the context saved and
+**              reused for each new MAC computation.
+**/
+int  Skein_256_InitExt(Skein_256_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
+int  Skein_512_InitExt(Skein_512_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
+int  Skein1024_InitExt(Skein1024_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
+
+/*
+**   Skein APIs for MAC and tree hash:
+**      Final_Pad:  pad, do final block, but no OUTPUT type
+**      Output:     do just the output stage
+*/
+int  Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+
+#ifndef SKEIN_TREE_HASH
+#define SKEIN_TREE_HASH (1)
+#endif
+#if  SKEIN_TREE_HASH
+int  Skein_256_Output   (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_512_Output   (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+#endif
+
+/*****************************************************************
+** "Internal" Skein definitions
+**    -- not needed for sequential hashing API, but will be 
+**           helpful for other uses of Skein (e.g., tree hash mode).
+**    -- included here so that they can be shared between
+**           reference and optimized code.
+******************************************************************/
+
+/* tweak word T[1]: bit field starting positions */
+#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
+                                
+#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
+#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
+#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
+#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
+#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
+                                
+/* tweak word T[1]: flag bit definition(s) */
+#define SKEIN_T1_FLAG_FIRST     (((u64b_t)  1 ) << SKEIN_T1_POS_FIRST)
+#define SKEIN_T1_FLAG_FINAL     (((u64b_t)  1 ) << SKEIN_T1_POS_FINAL)
+#define SKEIN_T1_FLAG_BIT_PAD   (((u64b_t)  1 ) << SKEIN_T1_POS_BIT_PAD)
+                                
+/* tweak word T[1]: tree level bit field mask */
+#define SKEIN_T1_TREE_LVL_MASK  (((u64b_t)0x7F) << SKEIN_T1_POS_TREE_LVL)
+#define SKEIN_T1_TREE_LEVEL(n)  (((u64b_t) (n)) << SKEIN_T1_POS_TREE_LVL)
+
+/* tweak word T[1]: block type field */
+#define SKEIN_BLK_TYPE_KEY      ( 0)                    /* key, for MAC and KDF */
+#define SKEIN_BLK_TYPE_CFG      ( 4)                    /* configuration block */
+#define SKEIN_BLK_TYPE_PERS     ( 8)                    /* personalization string */
+#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
+#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
+#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
+#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
+#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
+#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
+
+#define SKEIN_T1_BLK_TYPE(T)   (((u64b_t) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
+#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
+#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
+#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
+#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
+#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
+#define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
+#define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
+#define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
+#define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */
+
+#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
+#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)
+
+#define SKEIN_VERSION           (1)
+
+#ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
+#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
+#endif
+
+#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64b_t) (hi32)) << 32))
+#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION,SKEIN_ID_STRING_LE)
+#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA,0xA9FC1A22)
+
+#define SKEIN_CFG_STR_LEN       (4*8)
+
+/* bit field definitions in config block treeInfo word */
+#define SKEIN_CFG_TREE_LEAF_SIZE_POS  ( 0)
+#define SKEIN_CFG_TREE_NODE_SIZE_POS  ( 8)
+#define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
+
+#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
+#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
+#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
+
+#define SKEIN_CFG_TREE_INFO(leaf,node,maxLvl)                   \
+    ( (((u64b_t)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+      (((u64b_t)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+      (((u64b_t)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )
+
+#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0,0,0) /* use as treeInfo in InitExt() call for sequential processing */
+
+/*
+**   Skein macros for getting/setting tweak words, etc.
+**   These are useful for partial input bytes, hash tree init/update, etc.
+**/
+#define Skein_Get_Tweak(ctxPtr,TWK_NUM)         ((ctxPtr)->h.T[TWK_NUM])
+#define Skein_Set_Tweak(ctxPtr,TWK_NUM,tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal);}
+
+#define Skein_Get_T0(ctxPtr)    Skein_Get_Tweak(ctxPtr,0)
+#define Skein_Get_T1(ctxPtr)    Skein_Get_Tweak(ctxPtr,1)
+#define Skein_Set_T0(ctxPtr,T0) Skein_Set_Tweak(ctxPtr,0,T0)
+#define Skein_Set_T1(ctxPtr,T1) Skein_Set_Tweak(ctxPtr,1,T1)
+
+/* set both tweak words at once */
+#define Skein_Set_T0_T1(ctxPtr,T0,T1)           \
+    {                                           \
+    Skein_Set_T0(ctxPtr,(T0));                  \
+    Skein_Set_T1(ctxPtr,(T1));                  \
+    }
+
+#define Skein_Set_Type(ctxPtr,BLK_TYPE)         \
+    Skein_Set_T1(ctxPtr,SKEIN_T1_BLK_TYPE_##BLK_TYPE)
+
+/* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
+#define Skein_Start_New_Type(ctxPtr,BLK_TYPE)   \
+    { Skein_Set_T0_T1(ctxPtr,0,SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt=0; }
+
+#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
+#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
+
+#define Skein_Set_Tree_Level(hdr,height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height);}
+
+/*****************************************************************
+** "Internal" Skein definitions for debugging and error checking
+******************************************************************/
+#ifdef  SKEIN_DEBUG             /* examine/display intermediate values? */
+#include "skein_debug.h"
+#else                           /* default is no callouts */
+#define Skein_Show_Block(bits,ctx,X,blkPtr,wPtr,ksEvenPtr,ksOddPtr)
+#define Skein_Show_Round(bits,ctx,r,X)
+#define Skein_Show_R_Ptr(bits,ctx,r,X_ptr)
+#define Skein_Show_Final(bits,ctx,cnt,outPtr)
+#define Skein_Show_Key(bits,ctx,key,keyBytes)
+#endif
+
+#ifndef SKEIN_ERR_CHECK        /* run-time checks (e.g., bad params, uninitialized context)? */
+#define Skein_Assert(x,retCode)/* default: ignore all Asserts, for performance */
+#define Skein_assert(x)
+#elif   defined(SKEIN_ASSERT)
+#include <assert.h>     
+#define Skein_Assert(x,retCode) assert(x) 
+#define Skein_assert(x)         assert(x) 
+#else
+#include <assert.h>     
+#define Skein_Assert(x,retCode) { if (!(x)) return retCode; } /*  caller  error */
+#define Skein_assert(x)         assert(x)                     /* internal error */
+#endif
+
+/*****************************************************************
+** Skein block function constants (shared across Ref and Opt code)
+******************************************************************/
+enum    
+    {   
+        /* Skein_256 round rotation constants */
+    R_256_0_0=14, R_256_0_1=16,
+    R_256_1_0=52, R_256_1_1=57,
+    R_256_2_0=23, R_256_2_1=40,
+    R_256_3_0= 5, R_256_3_1=37,
+    R_256_4_0=25, R_256_4_1=33,
+    R_256_5_0=46, R_256_5_1=12,
+    R_256_6_0=58, R_256_6_1=22,
+    R_256_7_0=32, R_256_7_1=32,
+
+        /* Skein_512 round rotation constants */
+    R_512_0_0=46, R_512_0_1=36, R_512_0_2=19, R_512_0_3=37,
+    R_512_1_0=33, R_512_1_1=27, R_512_1_2=14, R_512_1_3=42,
+    R_512_2_0=17, R_512_2_1=49, R_512_2_2=36, R_512_2_3=39,
+    R_512_3_0=44, R_512_3_1= 9, R_512_3_2=54, R_512_3_3=56,
+    R_512_4_0=39, R_512_4_1=30, R_512_4_2=34, R_512_4_3=24,
+    R_512_5_0=13, R_512_5_1=50, R_512_5_2=10, R_512_5_3=17,
+    R_512_6_0=25, R_512_6_1=29, R_512_6_2=39, R_512_6_3=43,
+    R_512_7_0= 8, R_512_7_1=35, R_512_7_2=56, R_512_7_3=22,
+
+        /* Skein1024 round rotation constants */
+    R1024_0_0=24, R1024_0_1=13, R1024_0_2= 8, R1024_0_3=47, R1024_0_4= 8, R1024_0_5=17, R1024_0_6=22, R1024_0_7=37,
+    R1024_1_0=38, R1024_1_1=19, R1024_1_2=10, R1024_1_3=55, R1024_1_4=49, R1024_1_5=18, R1024_1_6=23, R1024_1_7=52,
+    R1024_2_0=33, R1024_2_1= 4, R1024_2_2=51, R1024_2_3=13, R1024_2_4=34, R1024_2_5=41, R1024_2_6=59, R1024_2_7=17,
+    R1024_3_0= 5, R1024_3_1=20, R1024_3_2=48, R1024_3_3=41, R1024_3_4=47, R1024_3_5=28, R1024_3_6=16, R1024_3_7=25,
+    R1024_4_0=41, R1024_4_1= 9, R1024_4_2=37, R1024_4_3=31, R1024_4_4=12, R1024_4_5=47, R1024_4_6=44, R1024_4_7=30,
+    R1024_5_0=16, R1024_5_1=34, R1024_5_2=56, R1024_5_3=51, R1024_5_4= 4, R1024_5_5=53, R1024_5_6=42, R1024_5_7=41,
+    R1024_6_0=31, R1024_6_1=44, R1024_6_2=47, R1024_6_3=46, R1024_6_4=19, R1024_6_5=42, R1024_6_6=44, R1024_6_7=25,
+    R1024_7_0= 9, R1024_7_1=48, R1024_7_2=35, R1024_7_3=52, R1024_7_4=23, R1024_7_5=31, R1024_7_6=37, R1024_7_7=20
+    };
+
+#ifndef SKEIN_ROUNDS
+#define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
+#define SKEIN_512_ROUNDS_TOTAL (72)
+#define SKEIN1024_ROUNDS_TOTAL (80)
+#else                                        /* allow command-line define in range 8*(5..14)   */
+#define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
+#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/ 10) + 5) % 10) + 5))
+#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS    ) + 5) % 10) + 5))
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif  /* ifndef _SKEIN_H_ */
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
new file mode 100755
index 000000000000..19c3225460fc
--- /dev/null
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -0,0 +1,239 @@
+/*
+Copyright (c) 2010 Werner Dittmann
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+
+*/
+
+#ifndef SKEINAPI_H
+#define SKEINAPI_H
+
+/**
+ * @file skeinApi.h
+ * @brief A Skein API and its functions.
+ * @{
+ *
+ * This API and the functions that implement this API simplify the usage
+ * of Skein. The design and the way to use the functions follow the openSSL
+ * design but at the same time take care of some Skein specific behaviour
+ * and possibilities.
+ * 
+ * The functions enable applications to create a normal Skein hashes and
+ * message authentication codes (MAC).
+ * 
+ * Using these functions is simple and straight forward:
+ * 
+ * @code
+ * 
+ * #include <skeinApi.h>
+ * 
+ * ...
+ * SkeinCtx_t ctx;             // a Skein hash or MAC context
+ * 
+ * // prepare context, here for a Skein with a state size of 512 bits.
+ * skeinCtxPrepare(&ctx, Skein512);
+ * 
+ * // Initialize the context to set the requested hash length in bits
+ * // here request a output hash size of 31 bits (Skein supports variable
+ * // output sizes even very strange sizes)
+ * skeinInit(&ctx, 31);
+ * 
+ * // Now update Skein with any number of message bits. A function that
+ * // takes a number of bytes is also available.
+ * skeinUpdateBits(&ctx, message, msgLength);
+ * 
+ * // Now get the result of the Skein hash. The output buffer must be
+ * // large enough to hold the request number of output bits. The application
+ * // may now extract the bits.
+ * skeinFinal(&ctx, result);
+ * ...
+ * @endcode
+ * 
+ * An application may use @c skeinReset to reset a Skein context and use
+ * it for creation of another hash with the same Skein state size and output
+ * bit length. In this case the API implementation restores some internal
+ * internal state data and saves a full Skein initialization round.
+ * 
+ * To create a MAC the application just uses @c skeinMacInit instead of 
+ * @c skeinInit. All other functions calls remain the same.
+ * 
+ */
+
+#include <skein.h>
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+    /**
+     * Which Skein size to use
+     */
+    typedef enum SkeinSize {
+        Skein256 = 256,     /*!< Skein with 256 bit state */
+        Skein512 = 512,     /*!< Skein with 512 bit state */
+        Skein1024 = 1024    /*!< Skein with 1024 bit state */
+    } SkeinSize_t;
+
+    /**
+     * Context for Skein.
+     *
+     * This structure was setup with some know-how of the internal
+     * Skein structures, in particular ordering of header and size dependent
+     * variables. If Skein implementation changes this, then adapt these
+     * structures as well.
+     */
+    typedef struct SkeinCtx {
+        u64b_t skeinSize;
+        u64b_t  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
+        union {
+            Skein_Ctxt_Hdr_t h;
+            Skein_256_Ctxt_t s256;
+            Skein_512_Ctxt_t s512;
+            Skein1024_Ctxt_t s1024;
+        } m;
+    } SkeinCtx_t;
+
+    /**
+     * Prepare a Skein context.
+     * 
+     * An application must call this function before it can use the Skein
+     * context. The functions clears memory and initializes size dependent
+     * variables.
+     *
+     * @param ctx
+     *     Pointer to a Skein context.
+     * @param size
+     *     Which Skein size to use.
+     * @return
+     *     SKEIN_SUCESS of SKEIN_FAIL
+     */
+    int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size);
+
+    /**
+     * Initialize a Skein context.
+     *
+     * Initializes the context with this data and saves the resulting Skein 
+     * state variables for further use.
+     *
+     * @param ctx
+     *     Pointer to a Skein context.
+     * @param hashBitLen
+     *     Number of MAC hash bits to compute
+     * @return
+     *     SKEIN_SUCESS of SKEIN_FAIL
+     * @see skeinReset
+     */
+    int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen);
+
+    /**
+     * Resets a Skein context for further use.
+     * 
+     * Restores the saved chaining variables to reset the Skein context. 
+     * Thus applications can reuse the same setup to  process several 
+     * messages. This saves a complete Skein initialization cycle.
+     * 
+     * @param ctx
+     *     Pointer to a pre-initialized Skein MAC context
+     */
+    void skeinReset(SkeinCtx_t* ctx);
+    
+    /**
+     * Initializes a Skein context for MAC usage.
+     * 
+     * Initializes the context with this data and saves the resulting Skein 
+     * state variables for further use.
+     *
+     * Applications call the normal Skein functions to update the MAC and
+     * get the final result.
+     *
+     * @param ctx
+     *     Pointer to an empty or preinitialized Skein MAC context
+     * @param key
+     *     Pointer to key bytes or NULL
+     * @param keyLen
+     *     Length of the key in bytes or zero
+     * @param hashBitLen
+     *     Number of MAC hash bits to compute
+     * @return
+     *     SKEIN_SUCESS of SKEIN_FAIL
+     */
+    int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
+                     size_t hashBitLen);
+
+    /**
+     * Update Skein with the next part of the message.
+     *
+     * @param ctx
+     *     Pointer to initialized Skein context
+     * @param msg
+     *     Pointer to the message.
+     * @param msgByteCnt
+     *     Length of the message in @b bytes
+     * @return
+     *     Success or error code.
+     */
+    int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
+                    size_t msgByteCnt);
+
+    /**
+     * Update the hash with a message bit string.
+     *
+     * Skein can handle data not only as bytes but also as bit strings of
+     * arbitrary length (up to its maximum design size).
+     *
+     * @param ctx
+     *     Pointer to initialized Skein context
+     * @param msg
+     *     Pointer to the message.
+     * @param msgBitCnt
+     *     Length of the message in @b bits.
+     */
+    int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
+                        size_t msgBitCnt);
+
+    /**
+     * Finalize Skein and return the hash.
+     * 
+     * Before an application can reuse a Skein setup the application must
+     * reset the Skein context.
+     *
+     * @param ctx
+     *     Pointer to initialized Skein context
+     * @param hash
+     *     Pointer to buffer that receives the hash. The buffer must be large
+     *     enough to store @c hashBitLen bits.
+     * @return
+     *     Success or error code.
+     * @see skeinReset
+     */
+    int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash);
+
+#ifdef __cplusplus
+}
+#endif
+
+/**
+ * @}
+ */
+#endif
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
new file mode 100644
index 000000000000..555ea619500b
--- /dev/null
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -0,0 +1,199 @@
+#ifndef _SKEIN_IV_H_
+#define _SKEIN_IV_H_
+
+#include <skein.h>    /* get Skein macros and types */
+
+/*
+***************** Pre-computed Skein IVs *******************
+**
+** NOTE: these values are not "magic" constants, but
+** are generated using the Threefish block function.
+** They are pre-computed here only for speed; i.e., to
+** avoid the need for a Threefish call during Init().
+**
+** The IV for any fixed hash length may be pre-computed.
+** Only the most common values are included here.
+**
+************************************************************
+**/
+
+#define MK_64 SKEIN_MK_64
+
+/* blkSize =  256 bits. hashSize =  128 bits */
+const u64b_t SKEIN_256_IV_128[] =
+    {
+    MK_64(0xE1111906,0x964D7260),
+    MK_64(0x883DAAA7,0x7C8D811C),
+    MK_64(0x10080DF4,0x91960F7A),
+    MK_64(0xCCF7DDE5,0xB45BC1C2)
+    };
+
+/* blkSize =  256 bits. hashSize =  160 bits */
+const u64b_t SKEIN_256_IV_160[] =
+    {
+    MK_64(0x14202314,0x72825E98),
+    MK_64(0x2AC4E9A2,0x5A77E590),
+    MK_64(0xD47A5856,0x8838D63E),
+    MK_64(0x2DD2E496,0x8586AB7D)
+    };
+
+/* blkSize =  256 bits. hashSize =  224 bits */
+const u64b_t SKEIN_256_IV_224[] =
+    {
+    MK_64(0xC6098A8C,0x9AE5EA0B),
+    MK_64(0x876D5686,0x08C5191C),
+    MK_64(0x99CB88D7,0xD7F53884),
+    MK_64(0x384BDDB1,0xAEDDB5DE)
+    };
+
+/* blkSize =  256 bits. hashSize =  256 bits */
+const u64b_t SKEIN_256_IV_256[] =
+    {
+    MK_64(0xFC9DA860,0xD048B449),
+    MK_64(0x2FCA6647,0x9FA7D833),
+    MK_64(0xB33BC389,0x6656840F),
+    MK_64(0x6A54E920,0xFDE8DA69)
+    };
+
+/* blkSize =  512 bits. hashSize =  128 bits */
+const u64b_t SKEIN_512_IV_128[] =
+    {
+    MK_64(0xA8BC7BF3,0x6FBF9F52),
+    MK_64(0x1E9872CE,0xBD1AF0AA),
+    MK_64(0x309B1790,0xB32190D3),
+    MK_64(0xBCFBB854,0x3F94805C),
+    MK_64(0x0DA61BCD,0x6E31B11B),
+    MK_64(0x1A18EBEA,0xD46A32E3),
+    MK_64(0xA2CC5B18,0xCE84AA82),
+    MK_64(0x6982AB28,0x9D46982D)
+    };
+
+/* blkSize =  512 bits. hashSize =  160 bits */
+const u64b_t SKEIN_512_IV_160[] =
+    {
+    MK_64(0x28B81A2A,0xE013BD91),
+    MK_64(0xC2F11668,0xB5BDF78F),
+    MK_64(0x1760D8F3,0xF6A56F12),
+    MK_64(0x4FB74758,0x8239904F),
+    MK_64(0x21EDE07F,0x7EAF5056),
+    MK_64(0xD908922E,0x63ED70B8),
+    MK_64(0xB8EC76FF,0xECCB52FA),
+    MK_64(0x01A47BB8,0xA3F27A6E)
+    };
+
+/* blkSize =  512 bits. hashSize =  224 bits */
+const u64b_t SKEIN_512_IV_224[] =
+    {
+    MK_64(0xCCD06162,0x48677224),
+    MK_64(0xCBA65CF3,0xA92339EF),
+    MK_64(0x8CCD69D6,0x52FF4B64),
+    MK_64(0x398AED7B,0x3AB890B4),
+    MK_64(0x0F59D1B1,0x457D2BD0),
+    MK_64(0x6776FE65,0x75D4EB3D),
+    MK_64(0x99FBC70E,0x997413E9),
+    MK_64(0x9E2CFCCF,0xE1C41EF7)
+    };
+
+/* blkSize =  512 bits. hashSize =  256 bits */
+const u64b_t SKEIN_512_IV_256[] =
+    {
+    MK_64(0xCCD044A1,0x2FDB3E13),
+    MK_64(0xE8359030,0x1A79A9EB),
+    MK_64(0x55AEA061,0x4F816E6F),
+    MK_64(0x2A2767A4,0xAE9B94DB),
+    MK_64(0xEC06025E,0x74DD7683),
+    MK_64(0xE7A436CD,0xC4746251),
+    MK_64(0xC36FBAF9,0x393AD185),
+    MK_64(0x3EEDBA18,0x33EDFC13)
+    };
+
+/* blkSize =  512 bits. hashSize =  384 bits */
+const u64b_t SKEIN_512_IV_384[] =
+    {
+    MK_64(0xA3F6C6BF,0x3A75EF5F),
+    MK_64(0xB0FEF9CC,0xFD84FAA4),
+    MK_64(0x9D77DD66,0x3D770CFE),
+    MK_64(0xD798CBF3,0xB468FDDA),
+    MK_64(0x1BC4A666,0x8A0E4465),
+    MK_64(0x7ED7D434,0xE5807407),
+    MK_64(0x548FC1AC,0xD4EC44D6),
+    MK_64(0x266E1754,0x6AA18FF8)
+    };
+
+/* blkSize =  512 bits. hashSize =  512 bits */
+const u64b_t SKEIN_512_IV_512[] =
+    {
+    MK_64(0x4903ADFF,0x749C51CE),
+    MK_64(0x0D95DE39,0x9746DF03),
+    MK_64(0x8FD19341,0x27C79BCE),
+    MK_64(0x9A255629,0xFF352CB1),
+    MK_64(0x5DB62599,0xDF6CA7B0),
+    MK_64(0xEABE394C,0xA9D5C3F4),
+    MK_64(0x991112C7,0x1A75B523),
+    MK_64(0xAE18A40B,0x660FCC33)
+    };
+
+/* blkSize = 1024 bits. hashSize =  384 bits */
+const u64b_t SKEIN1024_IV_384[] =
+    {
+    MK_64(0x5102B6B8,0xC1894A35),
+    MK_64(0xFEEBC9E3,0xFE8AF11A),
+    MK_64(0x0C807F06,0xE32BED71),
+    MK_64(0x60C13A52,0xB41A91F6),
+    MK_64(0x9716D35D,0xD4917C38),
+    MK_64(0xE780DF12,0x6FD31D3A),
+    MK_64(0x797846B6,0xC898303A),
+    MK_64(0xB172C2A8,0xB3572A3B),
+    MK_64(0xC9BC8203,0xA6104A6C),
+    MK_64(0x65909338,0xD75624F4),
+    MK_64(0x94BCC568,0x4B3F81A0),
+    MK_64(0x3EBBF51E,0x10ECFD46),
+    MK_64(0x2DF50F0B,0xEEB08542),
+    MK_64(0x3B5A6530,0x0DBC6516),
+    MK_64(0x484B9CD2,0x167BBCE1),
+    MK_64(0x2D136947,0xD4CBAFEA)
+    };
+
+/* blkSize = 1024 bits. hashSize =  512 bits */
+const u64b_t SKEIN1024_IV_512[] =
+    {
+    MK_64(0xCAEC0E5D,0x7C1B1B18),
+    MK_64(0xA01B0E04,0x5F03E802),
+    MK_64(0x33840451,0xED912885),
+    MK_64(0x374AFB04,0xEAEC2E1C),
+    MK_64(0xDF25A0E2,0x813581F7),
+    MK_64(0xE4004093,0x8B12F9D2),
+    MK_64(0xA662D539,0xC2ED39B6),
+    MK_64(0xFA8B85CF,0x45D8C75A),
+    MK_64(0x8316ED8E,0x29EDE796),
+    MK_64(0x053289C0,0x2E9F91B8),
+    MK_64(0xC3F8EF1D,0x6D518B73),
+    MK_64(0xBDCEC3C4,0xD5EF332E),
+    MK_64(0x549A7E52,0x22974487),
+    MK_64(0x67070872,0x5B749816),
+    MK_64(0xB9CD28FB,0xF0581BD1),
+    MK_64(0x0E2940B8,0x15804974)
+    };
+
+/* blkSize = 1024 bits. hashSize = 1024 bits */
+const u64b_t SKEIN1024_IV_1024[] =
+    {
+    MK_64(0xD593DA07,0x41E72355),
+    MK_64(0x15B5E511,0xAC73E00C),
+    MK_64(0x5180E5AE,0xBAF2C4F0),
+    MK_64(0x03BD41D3,0xFCBCAFAF),
+    MK_64(0x1CAEC6FD,0x1983A898),
+    MK_64(0x6E510B8B,0xCDD0589F),
+    MK_64(0x77E2BDFD,0xC6394ADA),
+    MK_64(0xC11E1DB5,0x24DCB0A3),
+    MK_64(0xD6D14AF9,0xC6329AB5),
+    MK_64(0x6A9B0BFC,0x6EB67E0D),
+    MK_64(0x9243C60D,0xCCFF1332),
+    MK_64(0x1A1F1DDE,0x743F02D4),
+    MK_64(0x0996753C,0x10ED0BB8),
+    MK_64(0x6572DD22,0xF2B4969A),
+    MK_64(0x61FD3062,0xD00A579A),
+    MK_64(0x1DE0536E,0x8682E539)
+    };
+
+#endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
new file mode 100644
index 000000000000..18d892553c8d
--- /dev/null
+++ b/drivers/staging/skein/include/skein_port.h
@@ -0,0 +1,124 @@
+#ifndef _SKEIN_PORT_H_
+#define _SKEIN_PORT_H_
+/*******************************************************************
+**
+** Platform-specific definitions for Skein hash function.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+** Many thanks to Brian Gladman for his portable header files.
+**
+** To port Skein to an "unsupported" platform, change the definitions
+** in this file appropriately.
+** 
+********************************************************************/
+
+#include <brg_types.h>                      /* get integer type definitions */
+
+typedef unsigned int    uint_t;             /* native unsigned integer */
+typedef uint_8t         u08b_t;             /*  8-bit unsigned integer */
+typedef uint_64t        u64b_t;             /* 64-bit unsigned integer */
+
+#ifndef RotL_64
+#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
+#endif
+
+/*
+ * Skein is "natively" little-endian (unlike SHA-xxx), for optimal
+ * performance on x86 CPUs.  The Skein code requires the following
+ * definitions for dealing with endianness:
+ *
+ *    SKEIN_NEED_SWAP:  0 for little-endian, 1 for big-endian
+ *    Skein_Put64_LSB_First
+ *    Skein_Get64_LSB_First
+ *    Skein_Swap64
+ *
+ * If SKEIN_NEED_SWAP is defined at compile time, it is used here
+ * along with the portable versions of Put64/Get64/Swap64, which 
+ * are slow in general.
+ *
+ * Otherwise, an "auto-detect" of endianness is attempted below.
+ * If the default handling doesn't work well, the user may insert
+ * platform-specific code instead (e.g., for big-endian CPUs).
+ *
+ */
+#ifndef SKEIN_NEED_SWAP /* compile-time "override" for endianness? */
+
+#include <brg_endian.h>              /* get endianness selection */
+#if   PLATFORM_BYTE_ORDER == IS_BIG_ENDIAN
+    /* here for big-endian CPUs */
+#define SKEIN_NEED_SWAP   (1)
+#elif PLATFORM_BYTE_ORDER == IS_LITTLE_ENDIAN
+    /* here for x86 and x86-64 CPUs (and other detected little-endian CPUs) */
+#define SKEIN_NEED_SWAP   (0)
+#if   PLATFORM_MUST_ALIGN == 0              /* ok to use "fast" versions? */
+#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
+#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
+#endif
+#else
+#error "Skein needs endianness setting!"
+#endif
+
+#endif /* ifndef SKEIN_NEED_SWAP */
+
+/*
+ ******************************************************************
+ *      Provide any definitions still needed.
+ ******************************************************************
+ */
+#ifndef Skein_Swap64  /* swap for big-endian, nop for little-endian */
+#if     SKEIN_NEED_SWAP
+#define Skein_Swap64(w64)                       \
+  ( (( ((u64b_t)(w64))       & 0xFF) << 56) |   \
+    (((((u64b_t)(w64)) >> 8) & 0xFF) << 48) |   \
+    (((((u64b_t)(w64)) >>16) & 0xFF) << 40) |   \
+    (((((u64b_t)(w64)) >>24) & 0xFF) << 32) |   \
+    (((((u64b_t)(w64)) >>32) & 0xFF) << 24) |   \
+    (((((u64b_t)(w64)) >>40) & 0xFF) << 16) |   \
+    (((((u64b_t)(w64)) >>48) & 0xFF) <<  8) |   \
+    (((((u64b_t)(w64)) >>56) & 0xFF)      ) )
+#else
+#define Skein_Swap64(w64)  (w64)
+#endif
+#endif  /* ifndef Skein_Swap64 */
+
+
+#ifndef Skein_Put64_LSB_First
+void    Skein_Put64_LSB_First(u08b_t *dst,const u64b_t *src,size_t bCnt)
+#ifdef  SKEIN_PORT_CODE /* instantiate the function code here? */
+    { /* this version is fully portable (big-endian or little-endian), but slow */
+    size_t n;
+
+    for (n=0;n<bCnt;n++)
+        dst[n] = (u08b_t) (src[n>>3] >> (8*(n&7)));
+    }
+#else
+    ;    /* output only the function prototype */
+#endif
+#endif   /* ifndef Skein_Put64_LSB_First */
+
+
+#ifndef Skein_Get64_LSB_First
+void    Skein_Get64_LSB_First(u64b_t *dst,const u08b_t *src,size_t wCnt)
+#ifdef  SKEIN_PORT_CODE /* instantiate the function code here? */
+    { /* this version is fully portable (big-endian or little-endian), but slow */
+    size_t n;
+
+    for (n=0;n<8*wCnt;n+=8)
+        dst[n/8] = (((u64b_t) src[n  ])      ) +
+                   (((u64b_t) src[n+1]) <<  8) +
+                   (((u64b_t) src[n+2]) << 16) +
+                   (((u64b_t) src[n+3]) << 24) +
+                   (((u64b_t) src[n+4]) << 32) +
+                   (((u64b_t) src[n+5]) << 40) +
+                   (((u64b_t) src[n+6]) << 48) +
+                   (((u64b_t) src[n+7]) << 56) ;
+    }
+#else
+    ;    /* output only the function prototype */
+#endif
+#endif   /* ifndef Skein_Get64_LSB_First */
+
+#endif   /* ifndef _SKEIN_PORT_H_ */
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
new file mode 100644
index 000000000000..85afd72fe987
--- /dev/null
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -0,0 +1,167 @@
+
+#ifndef THREEFISHAPI_H
+#define THREEFISHAPI_H
+
+/**
+ * @file threefishApi.h
+ * @brief A Threefish cipher API and its functions.
+ * @{
+ *
+ * This API and the functions that implement this API simplify the usage
+ * of the Threefish cipher. The design and the way to use the functions 
+ * follow the openSSL design but at the same time take care of some Threefish
+ * specific behaviour and possibilities.
+ *
+ * These are the low level functions that deal with Threefisch blocks only.
+ * Implementations for cipher modes such as ECB, CFB, or CBC may use these 
+ * functions.
+ * 
+@code
+    // Threefish cipher context data
+    ThreefishKey_t keyCtx;
+
+    // Initialize the context
+    threefishSetKey(&keyCtx, Threefish512, key, tweak);
+
+    // Encrypt
+    threefishEncryptBlockBytes(&keyCtx, input, cipher);
+@endcode
+ */
+
+#include <skein.h>
+#include <stdint.h>
+
+#define KeyScheduleConst 0x1BD11BDAA9FC1A22L
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+    /**
+     * Which Threefish size to use
+     */
+    typedef enum ThreefishSize {
+        Threefish256 = 256,     /*!< Skein with 256 bit state */
+        Threefish512 = 512,     /*!< Skein with 512 bit state */
+        Threefish1024 = 1024    /*!< Skein with 1024 bit state */
+    } ThreefishSize_t;
+    
+    /**
+     * Context for Threefish key and tweak words.
+     * 
+     * This structure was setup with some know-how of the internal
+     * Skein structures, in particular ordering of header and size dependent
+     * variables. If Skein implementation changes this, the adapt these
+     * structures as well.
+     */
+    typedef struct ThreefishKey {
+        u64b_t stateSize;
+        u64b_t key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
+        u64b_t tweak[3];
+    } ThreefishKey_t;
+
+    /**
+     * Set Threefish key and tweak data.
+     * 
+     * This function sets the key and tweak data for the Threefish cipher of
+     * the given size. The key data must have the same length (number of bits)
+     * as the state size 
+     *
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param size
+     *     Which Skein size to use.
+     * @param keyData
+     *     Pointer to the key words (word has 64 bits).
+     * @param tweak
+     *     Pointer to the two tweak words (word has 64 bits).
+     */
+    void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize, uint64_t* keyData, uint64_t* tweak);
+    
+    /**
+     * Encrypt Threefisch block (bytes).
+     * 
+     * The buffer must have at least the same length (number of bits) aas the 
+     * state size for this key. The function uses the first @c stateSize bits
+     * of the input buffer, encrypts them and stores the result in the output
+     * buffer.
+     * 
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param in
+     *     Poionter to plaintext data buffer.
+     * @param out
+     *     Pointer to cipher buffer.
+     */
+    void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in, uint8_t* out);
+    
+    /**
+     * Encrypt Threefisch block (words).
+     * 
+     * The buffer must have at least the same length (number of bits) aas the 
+     * state size for this key. The function uses the first @c stateSize bits
+     * of the input buffer, encrypts them and stores the result in the output
+     * buffer.
+     * 
+     * The wordsize ist set to 64 bits.
+     * 
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param in
+     *     Poionter to plaintext data buffer.
+     * @param out
+     *     Pointer to cipher buffer.
+     */
+    void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in, uint64_t* out);
+
+    /**
+     * Decrypt Threefisch block (bytes).
+     * 
+     * The buffer must have at least the same length (number of bits) aas the 
+     * state size for this key. The function uses the first @c stateSize bits
+     * of the input buffer, decrypts them and stores the result in the output
+     * buffer
+     * 
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param in
+     *     Poionter to cipher data buffer.
+     * @param out
+     *     Pointer to plaintext buffer.
+     */
+    void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in, uint8_t* out);
+
+    /**
+     * Decrypt Threefisch block (words).
+     * 
+     * The buffer must have at least the same length (number of bits) aas the 
+     * state size for this key. The function uses the first @c stateSize bits
+     * of the input buffer, encrypts them and stores the result in the output
+     * buffer.
+     * 
+     * The wordsize ist set to 64 bits.
+     * 
+     * @param keyCtx
+     *     Pointer to a Threefish key structure.
+     * @param in
+     *     Poionter to cipher data buffer.
+     * @param out
+     *     Pointer to plaintext buffer.
+     */
+    void threefishDecryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in, uint64_t* out);
+
+    void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+#ifdef __cplusplus
+}
+#endif
+
+/**
+ * @}
+ */
+#endif
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
new file mode 100644
index 000000000000..f0b176ac1dc7
--- /dev/null
+++ b/drivers/staging/skein/skein.c
@@ -0,0 +1,742 @@
+/***********************************************************************
+**
+** Implementation of the Skein hash function.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+************************************************************************/
+
+#define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
+
+#include <string.h>       /* get the memcpy/memset functions */
+#include <skein.h> /* get the Skein API definitions   */
+#include <skein_iv.h>    /* get precomputed IVs */
+
+/*****************************************************************/
+/* External function to process blkCnt (nonzero) full block(s) of data. */
+void    Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
+
+/*****************************************************************/
+/*     256-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u08b_t  b[SKEIN_256_STATE_BYTES];
+        u64b_t  w[SKEIN_256_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {             /* use pre-computed values, where available */
+    case  256:
+        memcpy(ctx->X,SKEIN_256_IV_256,sizeof(ctx->X));
+        break;
+    case  224:
+        memcpy(ctx->X,SKEIN_256_IV_224,sizeof(ctx->X));
+        break;
+    case  160:
+        memcpy(ctx->X,SKEIN_256_IV_160,sizeof(ctx->X));
+        break;
+    case  128:
+        memcpy(ctx->X,SKEIN_256_IV_128,sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_256_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        break;
+    }
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+{
+    union
+    {
+        u08b_t  b[SKEIN_256_STATE_BYTES];
+        u64b_t  w[SKEIN_256_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_256_Update(ctx,key,keyBytes);     /* hash the key */
+        Skein_256_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+#if SKEIN_NEED_SWAP
+        {
+            uint_t i;
+            for (i=0;i<SKEIN_256_STATE_WORDS;i++)   /* convert key bytes to context words */
+                ctx->X[i] = Skein_Swap64(ctx->X[i]);
+        }
+#endif
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx,CFG_FINAL);
+
+    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(256,&ctx->h,key,keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein_256_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
+            Skein_256_Process_Block(ctx,ctx->b,1,SKEIN_256_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein_256_Process_Block(ctx,msg,n,SKEIN_256_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
+            msg        += n * SKEIN_256_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein_256_Final(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN_256_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein_256_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_256_BLOCK_BYTES)
+            n  = SKEIN_256_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*****************************************************************/
+/*     512-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u08b_t  b[SKEIN_512_STATE_BYTES];
+        u64b_t  w[SKEIN_512_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {             /* use pre-computed values, where available */
+    case  512:
+        memcpy(ctx->X,SKEIN_512_IV_512,sizeof(ctx->X));
+        break;
+    case  384:
+        memcpy(ctx->X,SKEIN_512_IV_384,sizeof(ctx->X));
+        break;
+    case  256:
+        memcpy(ctx->X,SKEIN_512_IV_256,sizeof(ctx->X));
+        break;
+    case  224:
+        memcpy(ctx->X,SKEIN_512_IV_224,sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_512_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        break;
+    }
+
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+{
+    union
+    {
+        u08b_t  b[SKEIN_512_STATE_BYTES];
+        u64b_t  w[SKEIN_512_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_512_Update(ctx,key,keyBytes);     /* hash the key */
+        Skein_512_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+#if SKEIN_NEED_SWAP
+        {
+            uint_t i;
+            for (i=0;i<SKEIN_512_STATE_WORDS;i++)   /* convert key bytes to context words */
+                ctx->X[i] = Skein_Swap64(ctx->X[i]);
+        }
+#endif
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx,CFG_FINAL);
+
+    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(512,&ctx->h,key,keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein_512_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
+            Skein_512_Process_Block(ctx,ctx->b,1,SKEIN_512_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein_512_Process_Block(ctx,msg,n,SKEIN_512_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
+            msg        += n * SKEIN_512_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein_512_Final(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN_512_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein_512_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_512_BLOCK_BYTES)
+            n  = SKEIN_512_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(512,&ctx->h,n,hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*****************************************************************/
+/*    1024-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u08b_t  b[SKEIN1024_STATE_BYTES];
+        u64b_t  w[SKEIN1024_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {              /* use pre-computed values, where available */
+    case  512:
+        memcpy(ctx->X,SKEIN1024_IV_512 ,sizeof(ctx->X));
+        break;
+    case  384:
+        memcpy(ctx->X,SKEIN1024_IV_384 ,sizeof(ctx->X));
+        break;
+    case 1024:
+        memcpy(ctx->X,SKEIN1024_IV_1024,sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
+        Skein1024_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        break;
+    }
+
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+{
+    union
+    {
+        u08b_t  b[SKEIN1024_STATE_BYTES];
+        u64b_t  w[SKEIN1024_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein1024_Update(ctx,key,keyBytes);     /* hash the key */
+        Skein1024_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+#if SKEIN_NEED_SWAP
+        {
+            uint_t i;
+            for (i=0;i<SKEIN1024_STATE_WORDS;i++)   /* convert key bytes to context words */
+                ctx->X[i] = Skein_Swap64(ctx->X[i]);
+        }
+#endif
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx,CFG_FINAL);
+
+    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(1024,&ctx->h,key,keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein1024_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx,MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
+            Skein1024_Process_Block(ctx,ctx->b,1,SKEIN1024_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein1024_Process_Block(ctx,msg,n,SKEIN1024_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
+            msg        += n * SKEIN1024_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN1024_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN1024_BLOCK_BYTES)
+            n  = SKEIN1024_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(1024,&ctx->h,n,hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/**************** Functions to support MAC/tree hashing ***************/
+/*   (this code is identical for Optimized and Reference versions)    */
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_256_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_512_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt],0,SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+#if SKEIN_TREE_HASH
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein_256_Output(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN_256_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_256_BLOCK_BYTES)
+            n  = SKEIN_256_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein_512_Output(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN_512_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_512_BLOCK_BYTES)
+            n  = SKEIN_512_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein1024_Output(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+{
+    size_t i,n,byteCnt;
+    u64b_t X[SKEIN1024_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
+    {
+        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        Skein_Start_New_Type(ctx,OUT_FINAL);
+        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN1024_BLOCK_BYTES)
+            n  = SKEIN1024_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+#endif
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
new file mode 100755
index 000000000000..7b963758d32c
--- /dev/null
+++ b/drivers/staging/skein/skeinApi.c
@@ -0,0 +1,221 @@
+/*
+Copyright (c) 2010 Werner Dittmann
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+
+*/
+
+#define SKEIN_ERR_CHECK 1
+#include <skeinApi.h>
+#include <string.h>
+#include <stdio.h>
+
+int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size)
+{
+    Skein_Assert(ctx && size, SKEIN_FAIL);
+
+    memset(ctx ,0, sizeof(SkeinCtx_t));
+    ctx->skeinSize = size;
+
+    return SKEIN_SUCCESS;
+}
+
+int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen)
+{
+    int ret = SKEIN_FAIL;
+    size_t Xlen = 0;
+    u64b_t*  X = NULL;
+    uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+
+    Skein_Assert(ctx, SKEIN_FAIL);
+    /*
+     * The following two lines rely of the fact that the real Skein contexts are
+     * a union in out context and thus have tha maximum memory available.
+     * The beauty of C :-) .
+     */
+    X = ctx->m.s256.X;
+    Xlen = ctx->skeinSize/8;
+    /*
+     * If size is the same and hash bit length is zero then reuse
+     * the save chaining variables.
+     */
+    switch (ctx->skeinSize) {
+    case Skein256:
+        ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
+                                treeInfo, NULL, 0);
+        break;
+    case Skein512:
+        ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
+                                treeInfo, NULL, 0);
+        break;
+    case Skein1024:
+        ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
+                                treeInfo, NULL, 0);
+        break;
+    }
+
+    if (ret == SKEIN_SUCCESS) {
+        /* Save chaining variables for this combination of size and hashBitLen */
+        memcpy(ctx->XSave, X, Xlen);
+    }
+    return ret;
+}
+
+int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
+                 size_t hashBitLen)
+{
+    int ret = SKEIN_FAIL;
+    u64b_t*  X = NULL;
+    size_t Xlen = 0;
+    uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+
+    Skein_Assert(ctx, SKEIN_FAIL);
+
+    X = ctx->m.s256.X;
+    Xlen = ctx->skeinSize/8;
+
+    Skein_Assert(hashBitLen, SKEIN_BAD_HASHLEN);
+
+    switch (ctx->skeinSize) {
+    case Skein256:
+        ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
+                                treeInfo,
+                                (const u08b_t*)key, keyLen);
+
+        break;
+    case Skein512:
+        ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
+                                treeInfo,
+                                (const u08b_t*)key, keyLen);
+        break;
+    case Skein1024:
+        ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
+                                treeInfo,
+                                (const u08b_t*)key, keyLen);
+
+        break;
+    }
+    if (ret == SKEIN_SUCCESS) {
+        /* Save chaining variables for this combination of key, keyLen, hashBitLen */
+        memcpy(ctx->XSave, X, Xlen);
+    }
+    return ret;
+}
+
+void skeinReset(SkeinCtx_t* ctx)
+{
+    size_t Xlen = 0;
+    u64b_t*  X = NULL;
+
+    /*
+     * The following two lines rely of the fact that the real Skein contexts are
+     * a union in out context and thus have tha maximum memory available.
+     * The beautiy of C :-) .
+     */
+    X = ctx->m.s256.X;
+    Xlen = ctx->skeinSize/8;
+    /* Restore the chaing variable, reset byte counter */
+    memcpy(X, ctx->XSave, Xlen);
+
+    /* Setup context to process the message */
+    Skein_Start_New_Type(&ctx->m, MSG);
+}
+
+int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
+                size_t msgByteCnt)
+{
+    int ret = SKEIN_FAIL;
+    Skein_Assert(ctx, SKEIN_FAIL);
+
+    switch (ctx->skeinSize) {
+    case Skein256:
+        ret = Skein_256_Update(&ctx->m.s256, (const u08b_t*)msg, msgByteCnt);
+        break;
+    case Skein512:
+        ret = Skein_512_Update(&ctx->m.s512, (const u08b_t*)msg, msgByteCnt);
+        break;
+    case Skein1024:
+        ret = Skein1024_Update(&ctx->m.s1024, (const u08b_t*)msg, msgByteCnt);
+        break;
+    }
+    return ret;
+
+}
+
+int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
+                    size_t msgBitCnt)
+{
+    /*
+     * I've used the bit pad implementation from skein_test.c (see NIST CD)
+     * and modified it to use the convenience functions and added some pointer
+     * arithmetic.
+     */
+    size_t length;
+    uint8_t mask;
+    uint8_t* up;
+
+    /* only the final Update() call is allowed do partial bytes, else assert an error */
+    Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
+
+    /* if number of bits is a multiple of bytes - that's easy */
+    if ((msgBitCnt & 0x7) == 0) {
+        return skeinUpdate(ctx, msg, msgBitCnt >> 3);
+    }
+    skeinUpdate(ctx, msg, (msgBitCnt >> 3) + 1);
+
+    /*
+     * The next line rely on the fact that the real Skein contexts
+     * are a union in our context. After the addition the pointer points to
+     * Skein's real partial block buffer.
+     * If this layout ever changes we have to adapt this as well.
+     */
+    up = (uint8_t*)ctx->m.s256.X + ctx->skeinSize / 8;
+
+    Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
+
+    /* now "pad" the final partial byte the way NIST likes */
+    length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
+    Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
+    mask = (uint8_t) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
+    up[length-1]  = (uint8_t)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
+
+    return SKEIN_SUCCESS;
+}
+
+int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash)
+{
+    int ret = SKEIN_FAIL;
+    Skein_Assert(ctx, SKEIN_FAIL);
+
+    switch (ctx->skeinSize) {
+    case Skein256:
+        ret = Skein_256_Final(&ctx->m.s256, (u08b_t*)hash);
+        break;
+    case Skein512:
+        ret = Skein_512_Final(&ctx->m.s512, (u08b_t*)hash);
+        break;
+    case Skein1024:
+        ret = Skein1024_Final(&ctx->m.s1024, (u08b_t*)hash);
+        break;
+    }
+    return ret;
+}
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
new file mode 100644
index 000000000000..4ad6c50360e7
--- /dev/null
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -0,0 +1,172 @@
+
+#include <string.h>
+#include <skein.h>
+#include <threefishApi.h>
+
+
+/*****************************  Skein_256 ******************************/
+void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u08b_t *blkPtr,
+                             size_t blkCnt, size_t byteCntAdd)
+{
+    ThreefishKey_t key;
+    u64b_t tweak[2];
+    int i;
+    u64b_t  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+    u64b_t words[3];
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64b_t carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish256, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN_256_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = ctx->X[0] ^ w[0];
+        ctx->X[1] = ctx->X[1] ^ w[1];
+        ctx->X[2] = ctx->X[2] ^ w[2];
+        ctx->X[3] = ctx->X[3] ^ w[3];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
+
+void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u08b_t *blkPtr,
+                             size_t blkCnt, size_t byteCntAdd)
+{
+    ThreefishKey_t key;
+    u64b_t tweak[2];
+    int i;
+    u64b_t words[3];
+    u64b_t  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64b_t carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish512, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN_512_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = ctx->X[0] ^ w[0];
+        ctx->X[1] = ctx->X[1] ^ w[1];
+        ctx->X[2] = ctx->X[2] ^ w[2];
+        ctx->X[3] = ctx->X[3] ^ w[3];
+        ctx->X[4] = ctx->X[4] ^ w[4];
+        ctx->X[5] = ctx->X[5] ^ w[5];
+        ctx->X[6] = ctx->X[6] ^ w[6];
+        ctx->X[7] = ctx->X[7] ^ w[7];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
+
+void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx, const u08b_t *blkPtr,
+                              size_t blkCnt, size_t byteCntAdd)
+{
+    ThreefishKey_t key;
+    u64b_t tweak[2];
+    int i;
+    u64b_t words[3];
+    u64b_t  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64b_t carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish1024, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN1024_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[ 0] = ctx->X[ 0] ^ w[ 0];
+        ctx->X[ 1] = ctx->X[ 1] ^ w[ 1];
+        ctx->X[ 2] = ctx->X[ 2] ^ w[ 2];
+        ctx->X[ 3] = ctx->X[ 3] ^ w[ 3];
+        ctx->X[ 4] = ctx->X[ 4] ^ w[ 4];
+        ctx->X[ 5] = ctx->X[ 5] ^ w[ 5];
+        ctx->X[ 6] = ctx->X[ 6] ^ w[ 6];
+        ctx->X[ 7] = ctx->X[ 7] ^ w[ 7];
+        ctx->X[ 8] = ctx->X[ 8] ^ w[ 8];
+        ctx->X[ 9] = ctx->X[ 9] ^ w[ 9];
+        ctx->X[10] = ctx->X[10] ^ w[10];
+        ctx->X[11] = ctx->X[11] ^ w[11];
+        ctx->X[12] = ctx->X[12] ^ w[12];
+        ctx->X[13] = ctx->X[13] ^ w[13];
+        ctx->X[14] = ctx->X[14] ^ w[14];
+        ctx->X[15] = ctx->X[15] ^ w[15];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
new file mode 100644
index 000000000000..86724a2443b5
--- /dev/null
+++ b/drivers/staging/skein/skein_block.c
@@ -0,0 +1,689 @@
+/***********************************************************************
+**
+** Implementation of the Skein block functions.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+** Compile-time switches:
+**
+**  SKEIN_USE_ASM  -- set bits (256/512/1024) to select which
+**                    versions use ASM code for block processing
+**                    [default: use C for all block sizes]
+**
+************************************************************************/
+
+#include <string.h>
+#include <skein.h>
+
+#ifndef SKEIN_USE_ASM
+#define SKEIN_USE_ASM   (0)                     /* default is all C code (no ASM) */
+#endif
+
+#ifndef SKEIN_LOOP
+#define SKEIN_LOOP 001                          /* default: unroll 256 and 512, but not 1024 */
+#endif
+
+#define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
+#define KW_TWK_BASE     (0)
+#define KW_KEY_BASE     (3)
+#define ks              (kw + KW_KEY_BASE)                
+#define ts              (kw + KW_TWK_BASE)
+
+#ifdef SKEIN_DEBUG
+#define DebugSaveTweak(ctx) { ctx->h.T[0] = ts[0]; ctx->h.T[1] = ts[1]; }
+#else
+#define DebugSaveTweak(ctx)
+#endif
+
+/*****************************  Skein_256 ******************************/
+#if !(SKEIN_USE_ASM & 256)
+void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+    { /* do it in C */
+    enum
+        {
+        WCNT = SKEIN_256_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
+
+#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
+#else
+#define SKEIN_UNROLL_256 (0)
+#endif
+
+#if SKEIN_UNROLL_256
+#if (RCNT % SKEIN_UNROLL_256)
+#error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+    u64b_t  X0,X1,X2,X3;                        /* local copy of context vars, for speed */
+    u64b_t  w [WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64b_t *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+#endif
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0] = ctx->X[0];     
+        ks[1] = ctx->X[1];
+        ks[2] = ctx->X[2];
+        ks[3] = ctx->X[3];
+        ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
+
+        ts[2] = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w,blkPtr,WCNT);   /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+
+        X0 = w[0] + ks[0];                      /* do the first full key injection */
+        X1 = w[1] + ks[1] + ts[0];
+        X2 = w[2] + ks[2] + ts[1];
+        X3 = w[3] + ks[3];
+
+        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);    /* show starting state values */
+
+        blkPtr += SKEIN_256_BLOCK_BYTES;
+
+        /* run the rounds */
+
+#define Round256(p0,p1,p2,p3,ROT,rNum)                              \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2; \
+
+#if SKEIN_UNROLL_256 == 0                       
+#define R256(p0,p1,p2,p3,ROT,rNum)           /* fully unrolled */   \
+    Round256(p0,p1,p2,p3,ROT,rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rNum,Xptr);
+
+#define I256(R)                                                     \
+    X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
+    X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
+    X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
+    X3   += ks[((R)+4) % 5] +     (R)+1;                            \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+#else                                       /* looping version */
+#define R256(p0,p1,p2,p3,ROT,rNum)                                  \
+    Round256(p0,p1,p2,p3,ROT,rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rNum,Xptr);
+
+#define I256(R)                                                     \
+    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
+    X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
+    X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
+    X3   += ks[r+(R)+3] +    r+(R)   ;                              \
+    ks[r + (R)+4    ]   = ks[r+(R)-1];     /* rotate key schedule */\
+    ts[r + (R)+2    ]   = ts[r+(R)-1];                              \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+
+    for (r=1;r < 2*RCNT;r+=2*SKEIN_UNROLL_256)  /* loop thru it */
+#endif  
+        {    
+#define R256_8_rounds(R)                  \
+        R256(0,1,2,3,R_256_0,8*(R) + 1);  \
+        R256(0,3,2,1,R_256_1,8*(R) + 2);  \
+        R256(0,1,2,3,R_256_2,8*(R) + 3);  \
+        R256(0,3,2,1,R_256_3,8*(R) + 4);  \
+        I256(2*(R));                      \
+        R256(0,1,2,3,R_256_4,8*(R) + 5);  \
+        R256(0,3,2,1,R_256_5,8*(R) + 6);  \
+        R256(0,1,2,3,R_256_6,8*(R) + 7);  \
+        R256(0,3,2,1,R_256_7,8*(R) + 8);  \
+        I256(2*(R)+1);
+
+        R256_8_rounds( 0);
+
+#define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
+
+  #if   R256_Unroll_R( 1)
+        R256_8_rounds( 1);
+  #endif
+  #if   R256_Unroll_R( 2)
+        R256_8_rounds( 2);
+  #endif
+  #if   R256_Unroll_R( 3)
+        R256_8_rounds( 3);
+  #endif
+  #if   R256_Unroll_R( 4)
+        R256_8_rounds( 4);
+  #endif
+  #if   R256_Unroll_R( 5)
+        R256_8_rounds( 5);
+  #endif
+  #if   R256_Unroll_R( 6)
+        R256_8_rounds( 6);
+  #endif
+  #if   R256_Unroll_R( 7)
+        R256_8_rounds( 7);
+  #endif
+  #if   R256_Unroll_R( 8)
+        R256_8_rounds( 8);
+  #endif
+  #if   R256_Unroll_R( 9)
+        R256_8_rounds( 9);
+  #endif
+  #if   R256_Unroll_R(10)
+        R256_8_rounds(10);
+  #endif
+  #if   R256_Unroll_R(11)
+        R256_8_rounds(11);
+  #endif
+  #if   R256_Unroll_R(12)
+        R256_8_rounds(12);
+  #endif
+  #if   R256_Unroll_R(13)
+        R256_8_rounds(13);
+  #endif
+  #if   R256_Unroll_R(14)
+        R256_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_256 > 14)
+#error  "need more unrolling in Skein_256_Process_Block"
+  #endif
+        }
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = X0 ^ w[0];
+        ctx->X[1] = X1 ^ w[1];
+        ctx->X[2] = X2 ^ w[2];
+        ctx->X[3] = X3 ^ w[3];
+
+        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein_256_Process_Block_CodeSize(void)
+    {
+    return ((u08b_t *) Skein_256_Process_Block_CodeSize) -
+           ((u08b_t *) Skein_256_Process_Block);
+    }
+uint_t Skein_256_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_256;
+    }
+#endif
+#endif
+
+/*****************************  Skein_512 ******************************/
+#if !(SKEIN_USE_ASM & 512)
+void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+    { /* do it in C */
+    enum
+        {
+        WCNT = SKEIN_512_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
+
+#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
+#else
+#define SKEIN_UNROLL_512 (0)
+#endif
+
+#if SKEIN_UNROLL_512
+#if (RCNT % SKEIN_UNROLL_512)
+#error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+    u64b_t  X0,X1,X2,X3,X4,X5,X6,X7;            /* local copy of vars, for speed */
+    u64b_t  w [WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64b_t *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+    Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
+#endif
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0] = ctx->X[0];
+        ks[1] = ctx->X[1];
+        ks[2] = ctx->X[2];
+        ks[3] = ctx->X[3];
+        ks[4] = ctx->X[4];
+        ks[5] = ctx->X[5];
+        ks[6] = ctx->X[6];
+        ks[7] = ctx->X[7];
+        ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
+                ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
+
+        ts[2] = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+
+        X0   = w[0] + ks[0];                    /* do the first full key injection */
+        X1   = w[1] + ks[1];
+        X2   = w[2] + ks[2];
+        X3   = w[3] + ks[3];
+        X4   = w[4] + ks[4];
+        X5   = w[5] + ks[5] + ts[0];
+        X6   = w[6] + ks[6] + ts[1];
+        X7   = w[7] + ks[7];
+
+        blkPtr += SKEIN_512_BLOCK_BYTES;
+
+        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);
+        /* run the rounds */
+#define Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                  \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2; \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4; \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6; \
+
+#if SKEIN_UNROLL_512 == 0                       
+#define R512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)      /* unrolled */  \
+    Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rNum,Xptr);
+
+#define I512(R)                                                     \
+    X0   += ks[((R)+1) % 9];   /* inject the key schedule value */  \
+    X1   += ks[((R)+2) % 9];                                        \
+    X2   += ks[((R)+3) % 9];                                        \
+    X3   += ks[((R)+4) % 9];                                        \
+    X4   += ks[((R)+5) % 9];                                        \
+    X5   += ks[((R)+6) % 9] + ts[((R)+1) % 3];                      \
+    X6   += ks[((R)+7) % 9] + ts[((R)+2) % 3];                      \
+    X7   += ks[((R)+8) % 9] +     (R)+1;                            \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+#else                                       /* looping version */
+#define R512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
+    Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rNum,Xptr);
+
+#define I512(R)                                                     \
+    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
+    X1   += ks[r+(R)+1];                                            \
+    X2   += ks[r+(R)+2];                                            \
+    X3   += ks[r+(R)+3];                                            \
+    X4   += ks[r+(R)+4];                                            \
+    X5   += ks[r+(R)+5] + ts[r+(R)+0];                              \
+    X6   += ks[r+(R)+6] + ts[r+(R)+1];                              \
+    X7   += ks[r+(R)+7] +    r+(R)   ;                              \
+    ks[r +       (R)+8] = ks[r+(R)-1];  /* rotate key schedule */   \
+    ts[r +       (R)+2] = ts[r+(R)-1];                              \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+
+    for (r=1;r < 2*RCNT;r+=2*SKEIN_UNROLL_512)   /* loop thru it */
+#endif                         /* end of looped code definitions */
+        {
+#define R512_8_rounds(R)  /* do 8 full rounds */  \
+        R512(0,1,2,3,4,5,6,7,R_512_0,8*(R)+ 1);   \
+        R512(2,1,4,7,6,5,0,3,R_512_1,8*(R)+ 2);   \
+        R512(4,1,6,3,0,5,2,7,R_512_2,8*(R)+ 3);   \
+        R512(6,1,0,7,2,5,4,3,R_512_3,8*(R)+ 4);   \
+        I512(2*(R));                              \
+        R512(0,1,2,3,4,5,6,7,R_512_4,8*(R)+ 5);   \
+        R512(2,1,4,7,6,5,0,3,R_512_5,8*(R)+ 6);   \
+        R512(4,1,6,3,0,5,2,7,R_512_6,8*(R)+ 7);   \
+        R512(6,1,0,7,2,5,4,3,R_512_7,8*(R)+ 8);   \
+        I512(2*(R)+1);        /* and key injection */
+
+        R512_8_rounds( 0);
+
+#define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
+
+  #if   R512_Unroll_R( 1)
+        R512_8_rounds( 1);
+  #endif
+  #if   R512_Unroll_R( 2)
+        R512_8_rounds( 2);
+  #endif
+  #if   R512_Unroll_R( 3)
+        R512_8_rounds( 3);
+  #endif
+  #if   R512_Unroll_R( 4)
+        R512_8_rounds( 4);
+  #endif
+  #if   R512_Unroll_R( 5)
+        R512_8_rounds( 5);
+  #endif
+  #if   R512_Unroll_R( 6)
+        R512_8_rounds( 6);
+  #endif
+  #if   R512_Unroll_R( 7)
+        R512_8_rounds( 7);
+  #endif
+  #if   R512_Unroll_R( 8)
+        R512_8_rounds( 8);
+  #endif
+  #if   R512_Unroll_R( 9)
+        R512_8_rounds( 9);
+  #endif
+  #if   R512_Unroll_R(10)
+        R512_8_rounds(10);
+  #endif
+  #if   R512_Unroll_R(11)
+        R512_8_rounds(11);
+  #endif
+  #if   R512_Unroll_R(12)
+        R512_8_rounds(12);
+  #endif
+  #if   R512_Unroll_R(13)
+        R512_8_rounds(13);
+  #endif
+  #if   R512_Unroll_R(14)
+        R512_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_512 > 14)
+#error  "need more unrolling in Skein_512_Process_Block"
+  #endif
+        }
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = X0 ^ w[0];
+        ctx->X[1] = X1 ^ w[1];
+        ctx->X[2] = X2 ^ w[2];
+        ctx->X[3] = X3 ^ w[3];
+        ctx->X[4] = X4 ^ w[4];
+        ctx->X[5] = X5 ^ w[5];
+        ctx->X[6] = X6 ^ w[6];
+        ctx->X[7] = X7 ^ w[7];
+        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein_512_Process_Block_CodeSize(void)
+    {
+    return ((u08b_t *) Skein_512_Process_Block_CodeSize) -
+           ((u08b_t *) Skein_512_Process_Block);
+    }
+uint_t Skein_512_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_512;
+    }
+#endif
+#endif
+
+/*****************************  Skein1024 ******************************/
+#if !(SKEIN_USE_ASM & 1024)
+void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+    { /* do it in C, always looping (unrolled is bigger AND slower!) */
+    enum
+        {
+        WCNT = SKEIN1024_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
+
+#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
+#else
+#define SKEIN_UNROLL_1024 (0)
+#endif
+
+#if (SKEIN_UNROLL_1024 != 0)
+#if (RCNT % SKEIN_UNROLL_1024)
+#error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+
+    u64b_t  X00,X01,X02,X03,X04,X05,X06,X07,    /* local copy of vars, for speed */
+            X08,X09,X10,X11,X12,X13,X14,X15;
+    u64b_t  w [WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64b_t *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+    Xptr[ 0] = &X00;  Xptr[ 1] = &X01;  Xptr[ 2] = &X02;  Xptr[ 3] = &X03;
+    Xptr[ 4] = &X04;  Xptr[ 5] = &X05;  Xptr[ 6] = &X06;  Xptr[ 7] = &X07;
+    Xptr[ 8] = &X08;  Xptr[ 9] = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
+    Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
+#endif
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[ 0] = ctx->X[ 0];
+        ks[ 1] = ctx->X[ 1];
+        ks[ 2] = ctx->X[ 2];
+        ks[ 3] = ctx->X[ 3];
+        ks[ 4] = ctx->X[ 4];
+        ks[ 5] = ctx->X[ 5];
+        ks[ 6] = ctx->X[ 6];
+        ks[ 7] = ctx->X[ 7];
+        ks[ 8] = ctx->X[ 8];
+        ks[ 9] = ctx->X[ 9];
+        ks[10] = ctx->X[10];
+        ks[11] = ctx->X[11];
+        ks[12] = ctx->X[12];
+        ks[13] = ctx->X[13];
+        ks[14] = ctx->X[14];
+        ks[15] = ctx->X[15];
+        ks[16] = ks[ 0] ^ ks[ 1] ^ ks[ 2] ^ ks[ 3] ^
+                 ks[ 4] ^ ks[ 5] ^ ks[ 6] ^ ks[ 7] ^
+                 ks[ 8] ^ ks[ 9] ^ ks[10] ^ ks[11] ^
+                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
+
+        ts[2]  = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+
+        X00    = w[ 0] + ks[ 0];                 /* do the first full key injection */
+        X01    = w[ 1] + ks[ 1];
+        X02    = w[ 2] + ks[ 2];
+        X03    = w[ 3] + ks[ 3];
+        X04    = w[ 4] + ks[ 4];
+        X05    = w[ 5] + ks[ 5];
+        X06    = w[ 6] + ks[ 6];
+        X07    = w[ 7] + ks[ 7];
+        X08    = w[ 8] + ks[ 8];
+        X09    = w[ 9] + ks[ 9];
+        X10    = w[10] + ks[10];
+        X11    = w[11] + ks[11];
+        X12    = w[12] + ks[12];
+        X13    = w[13] + ks[13] + ts[0];
+        X14    = w[14] + ks[14] + ts[1];
+        X15    = w[15] + ks[15];
+
+        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);
+
+#define Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rNum) \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0;   \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2;   \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4;   \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6;   \
+    X##p8 += X##p9; X##p9 = RotL_64(X##p9,ROT##_4); X##p9 ^= X##p8;   \
+    X##pA += X##pB; X##pB = RotL_64(X##pB,ROT##_5); X##pB ^= X##pA;   \
+    X##pC += X##pD; X##pD = RotL_64(X##pD,ROT##_6); X##pD ^= X##pC;   \
+    X##pE += X##pF; X##pF = RotL_64(X##pF,ROT##_7); X##pF ^= X##pE;   \
+
+#if SKEIN_UNROLL_1024 == 0                      
+#define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
+    Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rn,Xptr);
+
+#define I1024(R)                                                      \
+    X00   += ks[((R)+ 1) % 17]; /* inject the key schedule value */   \
+    X01   += ks[((R)+ 2) % 17];                                       \
+    X02   += ks[((R)+ 3) % 17];                                       \
+    X03   += ks[((R)+ 4) % 17];                                       \
+    X04   += ks[((R)+ 5) % 17];                                       \
+    X05   += ks[((R)+ 6) % 17];                                       \
+    X06   += ks[((R)+ 7) % 17];                                       \
+    X07   += ks[((R)+ 8) % 17];                                       \
+    X08   += ks[((R)+ 9) % 17];                                       \
+    X09   += ks[((R)+10) % 17];                                       \
+    X10   += ks[((R)+11) % 17];                                       \
+    X11   += ks[((R)+12) % 17];                                       \
+    X12   += ks[((R)+13) % 17];                                       \
+    X13   += ks[((R)+14) % 17] + ts[((R)+1) % 3];                     \
+    X14   += ks[((R)+15) % 17] + ts[((R)+2) % 3];                     \
+    X15   += ks[((R)+16) % 17] +     (R)+1;                           \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr); 
+#else                                       /* looping version */
+#define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
+    Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rn,Xptr);
+
+#define I1024(R)                                                      \
+    X00   += ks[r+(R)+ 0];    /* inject the key schedule value */     \
+    X01   += ks[r+(R)+ 1];                                            \
+    X02   += ks[r+(R)+ 2];                                            \
+    X03   += ks[r+(R)+ 3];                                            \
+    X04   += ks[r+(R)+ 4];                                            \
+    X05   += ks[r+(R)+ 5];                                            \
+    X06   += ks[r+(R)+ 6];                                            \
+    X07   += ks[r+(R)+ 7];                                            \
+    X08   += ks[r+(R)+ 8];                                            \
+    X09   += ks[r+(R)+ 9];                                            \
+    X10   += ks[r+(R)+10];                                            \
+    X11   += ks[r+(R)+11];                                            \
+    X12   += ks[r+(R)+12];                                            \
+    X13   += ks[r+(R)+13] + ts[r+(R)+0];                              \
+    X14   += ks[r+(R)+14] + ts[r+(R)+1];                              \
+    X15   += ks[r+(R)+15] +    r+(R)   ;                              \
+    ks[r  +       (R)+16] = ks[r+(R)-1];  /* rotate key schedule */   \
+    ts[r  +       (R)+ 2] = ts[r+(R)-1];                              \
+    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+
+    for (r=1;r <= 2*RCNT;r+=2*SKEIN_UNROLL_1024)    /* loop thru it */
+#endif  
+        {
+#define R1024_8_rounds(R)    /* do 8 full rounds */                               \
+        R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_0,8*(R) + 1); \
+        R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_1,8*(R) + 2); \
+        R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_2,8*(R) + 3); \
+        R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_3,8*(R) + 4); \
+        I1024(2*(R));                                                             \
+        R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_4,8*(R) + 5); \
+        R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_5,8*(R) + 6); \
+        R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_6,8*(R) + 7); \
+        R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_7,8*(R) + 8); \
+        I1024(2*(R)+1);
+
+        R1024_8_rounds( 0);
+
+#define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
+
+  #if   R1024_Unroll_R( 1)
+        R1024_8_rounds( 1);
+  #endif
+  #if   R1024_Unroll_R( 2)
+        R1024_8_rounds( 2);
+  #endif
+  #if   R1024_Unroll_R( 3)
+        R1024_8_rounds( 3);
+  #endif
+  #if   R1024_Unroll_R( 4)
+        R1024_8_rounds( 4);
+  #endif
+  #if   R1024_Unroll_R( 5)
+        R1024_8_rounds( 5);
+  #endif
+  #if   R1024_Unroll_R( 6)
+        R1024_8_rounds( 6);
+  #endif
+  #if   R1024_Unroll_R( 7)
+        R1024_8_rounds( 7);
+  #endif
+  #if   R1024_Unroll_R( 8)
+        R1024_8_rounds( 8);
+  #endif
+  #if   R1024_Unroll_R( 9)
+        R1024_8_rounds( 9);
+  #endif
+  #if   R1024_Unroll_R(10)
+        R1024_8_rounds(10);
+  #endif
+  #if   R1024_Unroll_R(11)
+        R1024_8_rounds(11);
+  #endif
+  #if   R1024_Unroll_R(12)
+        R1024_8_rounds(12);
+  #endif
+  #if   R1024_Unroll_R(13)
+        R1024_8_rounds(13);
+  #endif
+  #if   R1024_Unroll_R(14)
+        R1024_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_1024 > 14)
+#error  "need more unrolling in Skein_1024_Process_Block"
+  #endif
+        }
+        /* do the final "feedforward" xor, update context chaining vars */
+
+        ctx->X[ 0] = X00 ^ w[ 0];
+        ctx->X[ 1] = X01 ^ w[ 1];
+        ctx->X[ 2] = X02 ^ w[ 2];
+        ctx->X[ 3] = X03 ^ w[ 3];
+        ctx->X[ 4] = X04 ^ w[ 4];
+        ctx->X[ 5] = X05 ^ w[ 5];
+        ctx->X[ 6] = X06 ^ w[ 6];
+        ctx->X[ 7] = X07 ^ w[ 7];
+        ctx->X[ 8] = X08 ^ w[ 8];
+        ctx->X[ 9] = X09 ^ w[ 9];
+        ctx->X[10] = X10 ^ w[10];
+        ctx->X[11] = X11 ^ w[11];
+        ctx->X[12] = X12 ^ w[12];
+        ctx->X[13] = X13 ^ w[13];
+        ctx->X[14] = X14 ^ w[14];
+        ctx->X[15] = X15 ^ w[15];
+
+        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+        
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        blkPtr += SKEIN1024_BLOCK_BYTES;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein1024_Process_Block_CodeSize(void)
+    {
+    return ((u08b_t *) Skein1024_Process_Block_CodeSize) -
+           ((u08b_t *) Skein1024_Process_Block);
+    }
+uint_t Skein1024_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_1024;
+    }
+#endif
+#endif
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
new file mode 100644
index 000000000000..8b43586f46bc
--- /dev/null
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -0,0 +1,1385 @@
+#include <threefishApi.h>
+#include <stdint.h>
+#include <string.h>
+
+
+void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+        {
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3],
+      b4 = input[4], b5 = input[5],
+      b6 = input[6], b7 = input[7],
+      b8 = input[8], b9 = input[9],
+      b10 = input[10], b11 = input[11],
+      b12 = input[12], b13 = input[13],
+      b14 = input[14], b15 = input[15];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+      k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+      k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+      k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+      k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+      k16 = keyCtx->key[16];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+            b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k3; b2 += b3 + k2; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k5; b4 += b5 + k4; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k7; b6 += b7 + k6; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k9; b8 += b9 + k8; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k11; b10 += b11 + k10; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k13 + t0; b12 += b13 + k12; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k15; b14 += b15 + k14 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k2; b0 += b1 + k1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k4; b2 += b3 + k3; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k6; b4 += b5 + k5; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k8; b6 += b7 + k7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k10; b8 += b9 + k9; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k12; b10 += b11 + k11; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k14 + t1; b12 += b13 + k13; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k16 + 1; b14 += b15 + k15 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k3; b0 += b1 + k2; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k5; b2 += b3 + k4; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k7; b4 += b5 + k6; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k9; b6 += b7 + k8; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k11; b8 += b9 + k10; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k13; b10 += b11 + k12; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k15 + t2; b12 += b13 + k14; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k0 + 2; b14 += b15 + k16 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k4; b0 += b1 + k3; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k6; b2 += b3 + k5; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k8; b4 += b5 + k7; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k10; b6 += b7 + k9; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k12; b8 += b9 + k11; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k14; b10 += b11 + k13; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k16 + t0; b12 += b13 + k15; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k1 + 3; b14 += b15 + k0 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k5; b0 += b1 + k4; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k7; b2 += b3 + k6; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k9; b4 += b5 + k8; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k11; b6 += b7 + k10; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k13; b8 += b9 + k12; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k15; b10 += b11 + k14; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k0 + t1; b12 += b13 + k16; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k2 + 4; b14 += b15 + k1 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k6; b0 += b1 + k5; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k8; b2 += b3 + k7; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k10; b4 += b5 + k9; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k12; b6 += b7 + k11; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k14; b8 += b9 + k13; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k16; b10 += b11 + k15; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k1 + t2; b12 += b13 + k0; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k3 + 5; b14 += b15 + k2 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k7; b0 += b1 + k6; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k9; b2 += b3 + k8; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k11; b4 += b5 + k10; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k13; b6 += b7 + k12; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k15; b8 += b9 + k14; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k0; b10 += b11 + k16; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k2 + t0; b12 += b13 + k1; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k4 + 6; b14 += b15 + k3 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k8; b0 += b1 + k7; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k10; b2 += b3 + k9; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k12; b4 += b5 + k11; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k14; b6 += b7 + k13; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k16; b8 += b9 + k15; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k1; b10 += b11 + k0; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k3 + t1; b12 += b13 + k2; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k5 + 7; b14 += b15 + k4 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k9; b0 += b1 + k8; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k11; b2 += b3 + k10; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k13; b4 += b5 + k12; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k15; b6 += b7 + k14; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k0; b8 += b9 + k16; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k2; b10 += b11 + k1; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k4 + t2; b12 += b13 + k3; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k6 + 8; b14 += b15 + k5 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k10; b0 += b1 + k9; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k12; b2 += b3 + k11; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k14; b4 += b5 + k13; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k16; b6 += b7 + k15; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k1; b8 += b9 + k0; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k3; b10 += b11 + k2; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k5 + t0; b12 += b13 + k4; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k7 + 9; b14 += b15 + k6 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k11; b0 += b1 + k10; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k13; b2 += b3 + k12; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k15; b4 += b5 + k14; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k0; b6 += b7 + k16; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k2; b8 += b9 + k1; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k4; b10 += b11 + k3; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k6 + t1; b12 += b13 + k5; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k8 + 10; b14 += b15 + k7 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k12; b0 += b1 + k11; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k14; b2 += b3 + k13; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k16; b4 += b5 + k15; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k1; b6 += b7 + k0; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k3; b8 += b9 + k2; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k5; b10 += b11 + k4; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k7 + t2; b12 += b13 + k6; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k9 + 11; b14 += b15 + k8 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k13; b0 += b1 + k12; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k15; b2 += b3 + k14; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k0; b4 += b5 + k16; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k2; b6 += b7 + k1; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k4; b8 += b9 + k3; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k6; b10 += b11 + k5; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k8 + t0; b12 += b13 + k7; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k10 + 12; b14 += b15 + k9 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k14; b0 += b1 + k13; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k16; b2 += b3 + k15; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k1; b4 += b5 + k0; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k3; b6 += b7 + k2; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k5; b8 += b9 + k4; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k7; b10 += b11 + k6; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k9 + t1; b12 += b13 + k8; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k11 + 13; b14 += b15 + k10 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k15; b0 += b1 + k14; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k0; b2 += b3 + k16; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k2; b4 += b5 + k1; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k4; b6 += b7 + k3; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k6; b8 += b9 + k5; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k8; b10 += b11 + k7; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k10 + t2; b12 += b13 + k9; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k12 + 14; b14 += b15 + k11 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k16; b0 += b1 + k15; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k1; b2 += b3 + k0; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k3; b4 += b5 + k2; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k5; b6 += b7 + k4; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k7; b8 += b9 + k6; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k9; b10 += b11 + k8; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k11 + t0; b12 += b13 + k10; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k13 + 15; b14 += b15 + k12 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k0; b0 += b1 + k16; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k2; b2 += b3 + k1; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k4; b4 += b5 + k3; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k6; b6 += b7 + k5; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k8; b8 += b9 + k7; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k10; b10 += b11 + k9; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k12 + t1; b12 += b13 + k11; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k14 + 16; b14 += b15 + k13 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k1; b0 += b1 + k0; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k3; b2 += b3 + k2; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k5; b4 += b5 + k4; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k7; b6 += b7 + k6; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k9; b8 += b9 + k8; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k11; b10 += b11 + k10; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k13 + t2; b12 += b13 + k12; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k15 + 17; b14 += b15 + k14 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+            b1 += k2; b0 += b1 + k1; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+            b3 += k4; b2 += b3 + k3; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+            b5 += k6; b4 += b5 + k5; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+            b7 += k8; b6 += b7 + k7; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+            b9 += k10; b8 += b9 + k9; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+            b11 += k12; b10 += b11 + k11; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+            b13 += k14 + t0; b12 += b13 + k13; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+            b15 += k16 + 18; b14 += b15 + k15 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+            b1 += k3; b0 += b1 + k2; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+            b3 += k5; b2 += b3 + k4; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+            b5 += k7; b4 += b5 + k6; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+            b7 += k9; b6 += b7 + k8; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+            b9 += k11; b8 += b9 + k10; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+            b11 += k13; b10 += b11 + k12; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+            b13 += k15 + t1; b12 += b13 + k14; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+            b15 += k0 + 19; b14 += b15 + k16 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+            output[0] = b0 + k3;
+            output[1] = b1 + k4;
+            output[2] = b2 + k5;
+            output[3] = b3 + k6;
+            output[4] = b4 + k7;
+            output[5] = b5 + k8;
+            output[6] = b6 + k9;
+            output[7] = b7 + k10;
+            output[8] = b8 + k11;
+            output[9] = b9 + k12;
+            output[10] = b10 + k13;
+            output[11] = b11 + k14;
+            output[12] = b12 + k15;
+            output[13] = b13 + k16 + t2;
+            output[14] = b14 + k0 + t0;
+            output[15] = b15 + k1 + 20;
+        }
+
+void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+{
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3],
+      b4 = input[4], b5 = input[5],
+      b6 = input[6], b7 = input[7],
+      b8 = input[8], b9 = input[9],
+      b10 = input[10], b11 = input[11],
+      b12 = input[12], b13 = input[13],
+      b14 = input[14], b15 = input[15];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+      k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+      k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+      k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+      k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+      k16 = keyCtx->key[16];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+    uint64_t tmp;
+
+            b0 -= k3;
+            b1 -= k4;
+            b2 -= k5;
+            b3 -= k6;
+            b4 -= k7;
+            b5 -= k8;
+            b6 -= k9;
+            b7 -= k10;
+            b8 -= k11;
+            b9 -= k12;
+            b10 -= k13;
+            b11 -= k14;
+            b12 -= k15;
+            b13 -= k16 + t2;
+            b14 -= k0 + t0;
+            b15 -= k1 + 20;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k16 + t2; b15 -= k0 + 19;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k14; b13 -= k15 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k12; b11 -= k13;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k10; b9 -= k11;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k8; b7 -= k9;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k6; b5 -= k7;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k4; b3 -= k5;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k2; b1 -= k3;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k15 + t1; b15 -= k16 + 18;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k13; b13 -= k14 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k11; b11 -= k12;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k9; b9 -= k10;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k7; b7 -= k8;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k5; b5 -= k6;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k3; b3 -= k4;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k1; b1 -= k2;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k14 + t0; b15 -= k15 + 17;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k12; b13 -= k13 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k10; b11 -= k11;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k8; b9 -= k9;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k6; b7 -= k7;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k4; b5 -= k5;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k2; b3 -= k3;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k0; b1 -= k1;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k13 + t2; b15 -= k14 + 16;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k11; b13 -= k12 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k9; b11 -= k10;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k7; b9 -= k8;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k5; b7 -= k6;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k3; b5 -= k4;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k1; b3 -= k2;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k16; b1 -= k0;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k12 + t1; b15 -= k13 + 15;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k10; b13 -= k11 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k8; b11 -= k9;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k6; b9 -= k7;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k4; b7 -= k5;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k2; b5 -= k3;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k0; b3 -= k1;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k15; b1 -= k16;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k11 + t0; b15 -= k12 + 14;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k9; b13 -= k10 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k7; b11 -= k8;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k5; b9 -= k6;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k3; b7 -= k4;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k1; b5 -= k2;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k16; b3 -= k0;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k14; b1 -= k15;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k10 + t2; b15 -= k11 + 13;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k8; b13 -= k9 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k6; b11 -= k7;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k4; b9 -= k5;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k2; b7 -= k3;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k0; b5 -= k1;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k15; b3 -= k16;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k13; b1 -= k14;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k9 + t1; b15 -= k10 + 12;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k7; b13 -= k8 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k5; b11 -= k6;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k3; b9 -= k4;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k1; b7 -= k2;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k16; b5 -= k0;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k14; b3 -= k15;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k12; b1 -= k13;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k8 + t0; b15 -= k9 + 11;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k6; b13 -= k7 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k4; b11 -= k5;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k2; b9 -= k3;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k0; b7 -= k1;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k15; b5 -= k16;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k13; b3 -= k14;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k11; b1 -= k12;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k7 + t2; b15 -= k8 + 10;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k5; b13 -= k6 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k3; b11 -= k4;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k1; b9 -= k2;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k16; b7 -= k0;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k14; b5 -= k15;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k12; b3 -= k13;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k10; b1 -= k11;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k6 + t1; b15 -= k7 + 9;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k4; b13 -= k5 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k2; b11 -= k3;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k0; b9 -= k1;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k15; b7 -= k16;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k13; b5 -= k14;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k11; b3 -= k12;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k9; b1 -= k10;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k5 + t0; b15 -= k6 + 8;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k3; b13 -= k4 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k1; b11 -= k2;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k16; b9 -= k0;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k14; b7 -= k15;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k12; b5 -= k13;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k10; b3 -= k11;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k8; b1 -= k9;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k4 + t2; b15 -= k5 + 7;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k2; b13 -= k3 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k0; b11 -= k1;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k15; b9 -= k16;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k13; b7 -= k14;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k11; b5 -= k12;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k9; b3 -= k10;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k7; b1 -= k8;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k3 + t1; b15 -= k4 + 6;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k1; b13 -= k2 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k16; b11 -= k0;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k14; b9 -= k15;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k12; b7 -= k13;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k10; b5 -= k11;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k8; b3 -= k9;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k6; b1 -= k7;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k2 + t0; b15 -= k3 + 5;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k0; b13 -= k1 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k15; b11 -= k16;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k13; b9 -= k14;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k11; b7 -= k12;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k9; b5 -= k10;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k7; b3 -= k8;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k5; b1 -= k6;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k1 + t2; b15 -= k2 + 4;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k16; b13 -= k0 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k14; b11 -= k15;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k12; b9 -= k13;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k10; b7 -= k11;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k8; b5 -= k9;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k6; b3 -= k7;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k4; b1 -= k5;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k0 + t1; b15 -= k1 + 3;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k15; b13 -= k16 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k13; b11 -= k14;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k11; b9 -= k12;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k9; b7 -= k10;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k7; b5 -= k8;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k5; b3 -= k6;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k3; b1 -= k4;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k16 + t0; b15 -= k0 + 2;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k14; b13 -= k15 + t2;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k12; b11 -= k13;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k10; b9 -= k11;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k8; b7 -= k9;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k6; b5 -= k7;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k4; b3 -= k5;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k2; b1 -= k3;
+            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k15 + t2; b15 -= k16 + 1;
+            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k13; b13 -= k14 + t1;
+            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k11; b11 -= k12;
+            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k9; b9 -= k10;
+            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k7; b7 -= k8;
+            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k5; b5 -= k6;
+            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k3; b3 -= k4;
+            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k1; b1 -= k2;
+            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k14 + t1; b15 -= k15;
+            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k12; b13 -= k13 + t0;
+            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k10; b11 -= k11;
+            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k8; b9 -= k9;
+            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k6; b7 -= k7;
+            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k4; b5 -= k5;
+            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k2; b3 -= k3;
+            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k0; b1 -= k1;
+
+            output[15] = b15;
+            output[14] = b14;
+            output[13] = b13;
+            output[12] = b12;
+            output[11] = b11;
+            output[10] = b10;
+            output[9] = b9;
+            output[8] = b8;
+            output[7] = b7;
+            output[6] = b6;
+            output[5] = b5;
+            output[4] = b4;
+            output[3] = b3;
+            output[2] = b2;
+            output[1] = b1;
+            output[0] = b0;
+}
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
new file mode 100644
index 000000000000..db2b81978c91
--- /dev/null
+++ b/drivers/staging/skein/threefish256Block.c
@@ -0,0 +1,349 @@
+#include <threefishApi.h>
+#include <stdint.h>
+#include <string.h>
+
+
+void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+  {
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+    b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k3; b2 += b3 + k2 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k4 + 1; b2 += b3 + k3 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k0 + 2; b2 += b3 + k4 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k4 + t0; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k1 + 3; b2 += b3 + k0 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k0 + t1; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k2 + 4; b2 += b3 + k1 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k1 + t2; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k3 + 5; b2 += b3 + k2 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k2 + t0; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k4 + 6; b2 += b3 + k3 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k3 + t1; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k0 + 7; b2 += b3 + k4 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k4 + t2; b0 += b1 + k3; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k1 + 8; b2 += b3 + k0 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k0 + t0; b0 += b1 + k4; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k2 + 9; b2 += b3 + k1 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k1 + t1; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k3 + 10; b2 += b3 + k2 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k2 + t2; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k4 + 11; b2 += b3 + k3 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k3 + t0; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k0 + 12; b2 += b3 + k4 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k4 + t1; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k1 + 13; b2 += b3 + k0 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k0 + t2; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k2 + 14; b2 += b3 + k1 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k3 + 15; b2 += b3 + k2 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+    b3 += k4 + 16; b2 += b3 + k3 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+    b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+    b3 += k0 + 17; b2 += b3 + k4 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+    output[0] = b0 + k3;
+    output[1] = b1 + k4 + t0;
+    output[2] = b2 + k0 + t1;
+    output[3] = b3 + k1 + 18;
+  }
+
+void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+  {
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+    uint64_t tmp;
+
+    b0 -= k3;
+    b1 -= k4 + t0;
+    b2 -= k0 + t1;
+    b3 -= k1 + 18;
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t0; b3 -= k0 + 17;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t2; b3 -= k4 + 16;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t1; b3 -= k3 + 15;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t0; b3 -= k2 + 14;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t2; b3 -= k1 + 13;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t1; b3 -= k0 + 12;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t0; b3 -= k4 + 11;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t2; b3 -= k3 + 10;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k4; b1 -= k0 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k1 + t1; b3 -= k2 + 9;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k3; b1 -= k4 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k0 + t0; b3 -= k1 + 8;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t2; b3 -= k0 + 7;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t1; b3 -= k4 + 6;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t0; b3 -= k3 + 5;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t2; b3 -= k2 + 4;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t1; b3 -= k1 + 3;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t2;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t0; b3 -= k0 + 2;
+
+    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t1;
+    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t2; b3 -= k4 + 1;
+    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t0;
+    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t1; b3 -= k3;
+
+    output[0] = b0;
+    output[1] = b1;
+    output[2] = b2;
+    output[3] = b3;
+  }
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
new file mode 100644
index 000000000000..4fe708fea066
--- /dev/null
+++ b/drivers/staging/skein/threefish512Block.c
@@ -0,0 +1,643 @@
+#include <threefishApi.h>
+#include <stdint.h>
+#include <string.h>
+
+
+void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+    {
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3],
+      b4 = input[4], b5 = input[5],
+      b6 = input[6], b7 = input[7];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+      k8 = keyCtx->key[8];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+        b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k3; b2 += b3 + k2; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k7; b6 += b7 + k6 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k2; b0 += b1 + k1; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k4; b2 += b3 + k3; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k8 + 1; b6 += b7 + k7 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k3; b0 += b1 + k2; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k5; b2 += b3 + k4; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k0 + 2; b6 += b7 + k8 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k4; b0 += b1 + k3; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k6; b2 += b3 + k5; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k1 + 3; b6 += b7 + k0 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k5; b0 += b1 + k4; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k7; b2 += b3 + k6; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k2 + 4; b6 += b7 + k1 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k6; b0 += b1 + k5; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k8; b2 += b3 + k7; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k3 + 5; b6 += b7 + k2 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k7; b0 += b1 + k6; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k0; b2 += b3 + k8; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k4 + 6; b6 += b7 + k3 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k8; b0 += b1 + k7; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k1; b2 += b3 + k0; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k5 + 7; b6 += b7 + k4 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k0; b0 += b1 + k8; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k2; b2 += b3 + k1; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k6 + 8; b6 += b7 + k5 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k1; b0 += b1 + k0; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k3; b2 += b3 + k2; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k7 + 9; b6 += b7 + k6 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k2; b0 += b1 + k1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k4; b2 += b3 + k3; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k8 + 10; b6 += b7 + k7 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k3; b0 += b1 + k2; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k5; b2 += b3 + k4; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k0 + 11; b6 += b7 + k8 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k4; b0 += b1 + k3; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k6; b2 += b3 + k5; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k1 + 12; b6 += b7 + k0 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k5; b0 += b1 + k4; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k7; b2 += b3 + k6; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k2 + 13; b6 += b7 + k1 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k6; b0 += b1 + k5; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k8; b2 += b3 + k7; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k3 + 14; b6 += b7 + k2 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k7; b0 += b1 + k6; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k0; b2 += b3 + k8; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k4 + 15; b6 += b7 + k3 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+        b1 += k8; b0 += b1 + k7; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+        b3 += k1; b2 += b3 + k0; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+        b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+        b7 += k5 + 16; b6 += b7 + k4 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+        b1 += k0; b0 += b1 + k8; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+        b3 += k2; b2 += b3 + k1; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+        b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+        b7 += k6 + 17; b6 += b7 + k5 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+        output[0] = b0 + k0;
+        output[1] = b1 + k1;
+        output[2] = b2 + k2;
+        output[3] = b3 + k3;
+        output[4] = b4 + k4;
+        output[5] = b5 + k5 + t0;
+        output[6] = b6 + k6 + t1;
+        output[7] = b7 + k7 + 18;
+    }
+
+void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+    {
+
+    uint64_t b0 = input[0], b1 = input[1],
+      b2 = input[2], b3 = input[3],
+      b4 = input[4], b5 = input[5],
+      b6 = input[6], b7 = input[7];
+    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+      k8 = keyCtx->key[8];
+    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+      t2 = keyCtx->tweak[2];
+
+      uint64_t tmp;
+
+        b0 -= k0;
+        b1 -= k1;
+        b2 -= k2;
+        b3 -= k3;
+        b4 -= k4;
+        b5 -= k5 + t0;
+        b6 -= k6 + t1;
+        b7 -= k7 + 18;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k5 + t0; b7 -= k6 + 17;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k3; b5 -= k4 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k1; b3 -= k2;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k8; b1 -= k0;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k4 + t2; b7 -= k5 + 16;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k2; b5 -= k3 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k0; b3 -= k1;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k7; b1 -= k8;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k3 + t1; b7 -= k4 + 15;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k1; b5 -= k2 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k8; b3 -= k0;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k6; b1 -= k7;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k2 + t0; b7 -= k3 + 14;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k0; b5 -= k1 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k7; b3 -= k8;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k5; b1 -= k6;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k1 + t2; b7 -= k2 + 13;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k8; b5 -= k0 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k6; b3 -= k7;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k4; b1 -= k5;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k0 + t1; b7 -= k1 + 12;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k7; b5 -= k8 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k5; b3 -= k6;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k3; b1 -= k4;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k8 + t0; b7 -= k0 + 11;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k6; b5 -= k7 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k4; b3 -= k5;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k2; b1 -= k3;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k7 + t2; b7 -= k8 + 10;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k5; b5 -= k6 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k3; b3 -= k4;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k1; b1 -= k2;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k6 + t1; b7 -= k7 + 9;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k4; b5 -= k5 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k2; b3 -= k3;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k0; b1 -= k1;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k5 + t0; b7 -= k6 + 8;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k3; b5 -= k4 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k1; b3 -= k2;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k8; b1 -= k0;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k4 + t2; b7 -= k5 + 7;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k2; b5 -= k3 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k0; b3 -= k1;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k7; b1 -= k8;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k3 + t1; b7 -= k4 + 6;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k1; b5 -= k2 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k8; b3 -= k0;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k6; b1 -= k7;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k2 + t0; b7 -= k3 + 5;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k0; b5 -= k1 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k7; b3 -= k8;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k5; b1 -= k6;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k1 + t2; b7 -= k2 + 4;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k8; b5 -= k0 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k6; b3 -= k7;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k4; b1 -= k5;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k0 + t1; b7 -= k1 + 3;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k7; b5 -= k8 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k5; b3 -= k6;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k3; b1 -= k4;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k8 + t0; b7 -= k0 + 2;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k6; b5 -= k7 + t2;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k4; b3 -= k5;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k2; b1 -= k3;
+        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k7 + t2; b7 -= k8 + 1;
+        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k5; b5 -= k6 + t1;
+        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k3; b3 -= k4;
+        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k1; b1 -= k2;
+        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k6 + t1; b7 -= k7;
+        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k4; b5 -= k5 + t0;
+        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k2; b3 -= k3;
+        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k0; b1 -= k1;
+
+    output[0] = b0;
+    output[1] = b1;
+    output[2] = b2;
+    output[3] = b3;
+
+        output[7] = b7;
+        output[6] = b6;
+        output[5] = b5;
+        output[4] = b4;
+}
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
new file mode 100644
index 000000000000..5afa0338aef4
--- /dev/null
+++ b/drivers/staging/skein/threefishApi.c
@@ -0,0 +1,79 @@
+
+
+#include <threefishApi.h>
+#include <stdlib.h>
+#include <string.h>
+
+void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
+                     uint64_t* keyData, uint64_t* tweak)
+{
+    int keyWords = stateSize / 64;
+    int i;
+    uint64_t parity = KeyScheduleConst;
+
+    keyCtx->tweak[0] = tweak[0];
+    keyCtx->tweak[1] = tweak[1];
+    keyCtx->tweak[2] = tweak[0] ^ tweak[1];
+
+    for (i = 0; i < keyWords; i++) {
+        keyCtx->key[i] = keyData[i];
+        parity ^= keyData[i];
+    }
+    keyCtx->key[i] = parity;
+    keyCtx->stateSize = stateSize;
+}
+
+void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
+                                uint8_t* out)
+{
+    u64b_t plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+    u64b_t cipher[SKEIN_MAX_STATE_WORDS];
+    
+    Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
+    threefishEncryptBlockWords(keyCtx, plain, cipher);
+    Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
+}
+
+void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
+                                uint64_t* out)
+{
+    switch (keyCtx->stateSize) {
+        case Threefish256:
+            threefishEncrypt256(keyCtx, in, out);
+            break;
+        case Threefish512:
+            threefishEncrypt512(keyCtx, in, out);
+            break;
+        case Threefish1024:
+            threefishEncrypt1024(keyCtx, in, out);
+            break;
+    }
+}
+
+void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
+                                uint8_t* out)
+{
+    u64b_t plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+    u64b_t cipher[SKEIN_MAX_STATE_WORDS];
+    
+    Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
+    threefishDecryptBlockWords(keyCtx, cipher, plain);
+    Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
+}
+
+void threefishDecryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
+                                uint64_t* out)
+{
+    switch (keyCtx->stateSize) {
+        case Threefish256:
+            threefishDecrypt256(keyCtx, in, out);
+            break;
+        case Threefish512:
+            threefishDecrypt512(keyCtx, in, out);
+            break;
+        case Threefish1024:
+            threefishDecrypt1024(keyCtx, in, out);
+            break;
+    }
+}
+
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 02/21] staging: crypto: skein: allow building statically
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
  2014-03-24  1:48   ` [PATCH V2 01/21] staging: crypto: skein: import code from Skein3Fish.git Jason Cooper
@ 2014-03-24  1:48   ` Jason Cooper
  2014-03-24  2:32     ` [PATCH V3 " Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 03/21] staging: crypto: skein: remove brg_*.h includes Jason Cooper
                     ` (18 subsequent siblings)
  20 siblings, 1 reply; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:48 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

These are the minimum changes required to get the code to build
statically in the kernel.  It's necessary to do this first so that we
can empirically determine that future cleanup patches aren't changing
the generated object code.

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
Changes since RFC:

 - rebased onto staging-next caused conflicts in Kconfig and Makefile, fixed.


 drivers/staging/Kconfig                      |  2 +
 drivers/staging/Makefile                     |  1 +
 drivers/staging/skein/CMakeLists.txt         | 27 -------------
 drivers/staging/skein/Kconfig                | 32 ++++++++++++++++
 drivers/staging/skein/Makefile               | 13 +++++++
 drivers/staging/skein/include/brg_types.h    | 57 ----------------------------
 drivers/staging/skein/include/skein.h        | 10 -----
 drivers/staging/skein/include/skeinApi.h     |  2 +-
 drivers/staging/skein/include/skein_port.h   | 16 +-------
 drivers/staging/skein/include/threefishApi.h |  2 +-
 drivers/staging/skein/skein.c                |  2 +-
 drivers/staging/skein/skeinApi.c             |  4 +-
 drivers/staging/skein/skeinBlockNo3F.c       |  2 +-
 drivers/staging/skein/skein_block.c          |  2 +-
 drivers/staging/skein/threefish1024Block.c   |  3 +-
 drivers/staging/skein/threefish256Block.c    |  3 +-
 drivers/staging/skein/threefish512Block.c    |  3 +-
 drivers/staging/skein/threefishApi.c         |  3 +-
 18 files changed, 59 insertions(+), 125 deletions(-)
 delete mode 100755 drivers/staging/skein/CMakeLists.txt
 create mode 100644 drivers/staging/skein/Kconfig
 create mode 100644 drivers/staging/skein/Makefile

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 47cf17543008..b78f669b7ed8 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -144,6 +144,8 @@ source "drivers/staging/gs_fpgaboot/Kconfig"
 
 source "drivers/staging/nokia_h4p/Kconfig"
 
+source "drivers/staging/skein/Kconfig"
+
 source "drivers/staging/unisys/Kconfig"
 
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index d12f6189db46..eec54a9f53e8 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -64,4 +64,5 @@ obj-$(CONFIG_DGAP)			+= dgap/
 obj-$(CONFIG_MTD_SPINAND_MT29F)	+= mt29f_spinand/
 obj-$(CONFIG_GS_FPGABOOT)	+= gs_fpgaboot/
 obj-$(CONFIG_BT_NOKIA_H4P)	+= nokia_h4p/
+obj-$(CONFIG_CRYPTO_SKEIN) += skein/
 obj-$(CONFIG_UNISYSSPAR)	+= unisys/
diff --git a/drivers/staging/skein/CMakeLists.txt b/drivers/staging/skein/CMakeLists.txt
deleted file mode 100755
index 604aaa394cb1..000000000000
--- a/drivers/staging/skein/CMakeLists.txt
+++ /dev/null
@@ -1,27 +0,0 @@
-cmake_minimum_required (VERSION 2.6)
-
-include_directories (${CMAKE_CURRENT_SOURCE_DIR}/include)
-
-# set(skeinBlock_src skein_block.c)
-set(skeinBlock_src skeinBlockNo3F.c)
-
-set(skein_src 
-    ${skeinBlock_src}
-    skein.c
-    skeinApi.c
-    )
-
-set(threefish_src
-    threefishApi.c
-    threefish256Block.c
-    threefish512Block.c
-    threefish1024Block.c
-    )
-set(s3f_src ${skein_src} ${threefish_src})
-
-add_library(skein3fish SHARED ${s3f_src})
-set_target_properties(skein3fish PROPERTIES VERSION ${VERSION} SOVERSION ${SOVERSION})
-target_link_libraries(skein3fish ${LIBS})
-
-install(TARGETS skein3fish DESTINATION ${LIBDIRNAME})
-
diff --git a/drivers/staging/skein/Kconfig b/drivers/staging/skein/Kconfig
new file mode 100644
index 000000000000..8f5a72a90ced
--- /dev/null
+++ b/drivers/staging/skein/Kconfig
@@ -0,0 +1,32 @@
+config CRYPTO_SKEIN
+	bool "Skein digest algorithm"
+	depends on (X86 || UML_X86) && 64BIT
+	select CRYPTO_THREEFISH
+	select CRYPTO_HASH
+	help
+	  Skein secure hash algorithm is one of 5 finalists from the NIST SHA3
+	  competition.
+
+	  Skein is optimized for modern, 64bit processors and is highly
+	  customizable.  See:
+
+	  http://www.skein-hash.info/sites/default/files/skein1.3.pdf
+
+	  for more information.  This module depends on the threefish block
+	  cipher module.
+
+config CRYPTO_THREEFISH
+	bool "Threefish tweakable block cipher"
+	depends on (X86 || UML_X86) && 64BIT
+	select CRYPTO_ALGAPI
+	help
+	  Threefish cipher algorithm is the tweakable block cipher underneath
+	  the Skein family of secure hash algorithms.  Skein is one of 5
+	  finalists from the NIST SHA3 competition.
+
+	  Skein is optimized for modern, 64bit processors and is highly
+	  customizable.  See:
+
+	  http://www.skein-hash.info/sites/default/files/skein1.3.pdf
+
+	  for more information.
diff --git a/drivers/staging/skein/Makefile b/drivers/staging/skein/Makefile
new file mode 100644
index 000000000000..2bb386e1e58c
--- /dev/null
+++ b/drivers/staging/skein/Makefile
@@ -0,0 +1,13 @@
+#
+# Makefile for the skein secure hash algorithm
+#
+subdir-ccflags-y := -I$(src)/include/
+
+obj-$(CONFIG_CRYPTO_SKEIN) +=   skein.o \
+				skeinApi.o \
+				skein_block.o
+
+obj-$(CONFIG_CRYPTO_THREEFISH) += threefish1024Block.o \
+				  threefish256Block.o \
+				  threefish512Block.o \
+				  threefishApi.o
diff --git a/drivers/staging/skein/include/brg_types.h b/drivers/staging/skein/include/brg_types.h
index 6db737d71b9e..3d9fe0df5238 100644
--- a/drivers/staging/skein/include/brg_types.h
+++ b/drivers/staging/skein/include/brg_types.h
@@ -46,83 +46,26 @@
 extern "C" {
 #endif
 
-#include <limits.h>
-
 #ifndef BRG_UI8
 #  define BRG_UI8
-#  if UCHAR_MAX == 255u
      typedef unsigned char uint_8t;
-#  else
-#    error Please define uint_8t as an 8-bit unsigned integer type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI16
 #  define BRG_UI16
-#  if USHRT_MAX == 65535u
      typedef unsigned short uint_16t;
-#  else
-#    error Please define uint_16t as a 16-bit unsigned short type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI32
 #  define BRG_UI32
-#  if UINT_MAX == 4294967295u
 #    define li_32(h) 0x##h##u
      typedef unsigned int uint_32t;
-#  elif ULONG_MAX == 4294967295u
-#    define li_32(h) 0x##h##ul
-     typedef unsigned long uint_32t;
-#  elif defined( _CRAY )
-#    error This code needs 32-bit data types, which Cray machines do not provide
-#  else
-#    error Please define uint_32t as a 32-bit unsigned integer type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI64
-#  if defined( __BORLANDC__ ) && !defined( __MSDOS__ )
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ui64
-     typedef unsigned __int64 uint_64t;
-#  elif defined( _MSC_VER ) && ( _MSC_VER < 1300 )    /* 1300 == VC++ 7.0 */
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ui64
-     typedef unsigned __int64 uint_64t;
-#  elif defined( __sun ) && defined(ULONG_MAX) && ULONG_MAX == 0xfffffffful
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ull
-     typedef unsigned long long uint_64t;
-#  elif defined( UINT_MAX ) && UINT_MAX > 4294967295u
-#    if UINT_MAX == 18446744073709551615u
-#      define BRG_UI64
-#      define li_64(h) 0x##h##u
-       typedef unsigned int uint_64t;
-#    endif
-#  elif defined( ULONG_MAX ) && ULONG_MAX > 4294967295u
-#    if ULONG_MAX == 18446744073709551615ul
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ul
-       typedef unsigned long uint_64t;
-#    endif
-#  elif defined( ULLONG_MAX ) && ULLONG_MAX > 4294967295u
-#    if ULLONG_MAX == 18446744073709551615ull
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ull
-       typedef unsigned long long uint_64t;
-#    endif
-#  elif defined( ULONG_LONG_MAX ) && ULONG_LONG_MAX > 4294967295u
-#    if ULONG_LONG_MAX == 18446744073709551615ull
 #      define BRG_UI64
 #      define li_64(h) 0x##h##ull
        typedef unsigned long long uint_64t;
-#    endif
-#  elif defined(__GNUC__)  /* DLW: avoid mingw problem with -ansi */
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ull
-       typedef unsigned long long uint_64t;
-#  endif
 #endif
 
 #if defined( NEED_UINT_64T ) && !defined( BRG_UI64 )
diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index cb613fa09d9e..315cdcd14413 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -261,18 +261,8 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define Skein_Show_Key(bits,ctx,key,keyBytes)
 #endif
 
-#ifndef SKEIN_ERR_CHECK        /* run-time checks (e.g., bad params, uninitialized context)? */
 #define Skein_Assert(x,retCode)/* default: ignore all Asserts, for performance */
 #define Skein_assert(x)
-#elif   defined(SKEIN_ASSERT)
-#include <assert.h>     
-#define Skein_Assert(x,retCode) assert(x) 
-#define Skein_assert(x)         assert(x) 
-#else
-#include <assert.h>     
-#define Skein_Assert(x,retCode) { if (!(x)) return retCode; } /*  caller  error */
-#define Skein_assert(x)         assert(x)                     /* internal error */
-#endif
 
 /*****************************************************************
 ** Skein block function constants (shared across Ref and Opt code)
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 19c3225460fc..734d27b79f01 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -78,8 +78,8 @@ OTHER DEALINGS IN THE SOFTWARE.
  * 
  */
 
+#include <linux/types.h>
 #include <skein.h>
-#include <stdint.h>
 
 #ifdef __cplusplus
 extern "C"
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
index 18d892553c8d..1c68070358ce 100644
--- a/drivers/staging/skein/include/skein_port.h
+++ b/drivers/staging/skein/include/skein_port.h
@@ -44,24 +44,10 @@ typedef uint_64t        u64b_t;             /* 64-bit unsigned integer */
  * platform-specific code instead (e.g., for big-endian CPUs).
  *
  */
-#ifndef SKEIN_NEED_SWAP /* compile-time "override" for endianness? */
-
-#include <brg_endian.h>              /* get endianness selection */
-#if   PLATFORM_BYTE_ORDER == IS_BIG_ENDIAN
-    /* here for big-endian CPUs */
-#define SKEIN_NEED_SWAP   (1)
-#elif PLATFORM_BYTE_ORDER == IS_LITTLE_ENDIAN
-    /* here for x86 and x86-64 CPUs (and other detected little-endian CPUs) */
 #define SKEIN_NEED_SWAP   (0)
-#if   PLATFORM_MUST_ALIGN == 0              /* ok to use "fast" versions? */
+/* below two prototype assume we are handed aligned data */
 #define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
 #define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
-#endif
-#else
-#error "Skein needs endianness setting!"
-#endif
-
-#endif /* ifndef SKEIN_NEED_SWAP */
 
 /*
  ******************************************************************
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 85afd72fe987..dae270cf71d3 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -28,8 +28,8 @@
 @endcode
  */
 
+#include <linux/types.h>
 #include <skein.h>
-#include <stdint.h>
 
 #define KeyScheduleConst 0x1BD11BDAA9FC1A22L
 
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index f0b176ac1dc7..3fae6fdf7c75 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -10,7 +10,7 @@
 
 #define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
 
-#include <string.h>       /* get the memcpy/memset functions */
+#include <linux/string.h>       /* get the memcpy/memset functions */
 #include <skein.h> /* get the Skein API definitions   */
 #include <skein_iv.h>    /* get precomputed IVs */
 
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 7b963758d32c..579b92efbf65 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -24,10 +24,8 @@ OTHER DEALINGS IN THE SOFTWARE.
 
 */
 
-#define SKEIN_ERR_CHECK 1
+#include <linux/string.h>
 #include <skeinApi.h>
-#include <string.h>
-#include <stdio.h>
 
 int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size)
 {
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 4ad6c50360e7..6a19ceb17d0f 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -1,5 +1,5 @@
 
-#include <string.h>
+#include <linux/string.h>
 #include <skein.h>
 #include <threefishApi.h>
 
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 86724a2443b5..b5be41af6d17 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -14,7 +14,7 @@
 **
 ************************************************************************/
 
-#include <string.h>
+#include <linux/string.h>
 #include <skein.h>
 
 #ifndef SKEIN_USE_ASM
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 8b43586f46bc..58a8c26a1f6f 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index db2b81978c91..a7e06f905186 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 4fe708fea066..3cbfcd9af5c9 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 5afa0338aef4..968d3d21fe61 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -1,8 +1,7 @@
 
 
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdlib.h>
-#include <string.h>
 
 void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
                      uint64_t* keyData, uint64_t* tweak)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 03/21] staging: crypto: skein: remove brg_*.h includes
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
  2014-03-24  1:48   ` [PATCH V2 01/21] staging: crypto: skein: import code from Skein3Fish.git Jason Cooper
  2014-03-24  1:48   ` [PATCH V2 02/21] staging: crypto: skein: allow building statically Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 04/21] staging: crypto: skein: remove skein_port.h Jason Cooper
                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/brg_endian.h | 148 -----------------------------
 drivers/staging/skein/include/brg_types.h  | 131 -------------------------
 drivers/staging/skein/include/skein_port.h |   6 +-
 3 files changed, 2 insertions(+), 283 deletions(-)
 delete mode 100644 drivers/staging/skein/include/brg_endian.h
 delete mode 100644 drivers/staging/skein/include/brg_types.h

diff --git a/drivers/staging/skein/include/brg_endian.h b/drivers/staging/skein/include/brg_endian.h
deleted file mode 100644
index c03c7c5d1eb4..000000000000
--- a/drivers/staging/skein/include/brg_endian.h
+++ /dev/null
@@ -1,148 +0,0 @@
-/*
- ---------------------------------------------------------------------------
- Copyright (c) 2003, Dr Brian Gladman, Worcester, UK.   All rights reserved.
-
- LICENSE TERMS
-
- The free distribution and use of this software in both source and binary
- form is allowed (with or without changes) provided that:
-
-   1. distributions of this source code include the above copyright
-      notice, this list of conditions and the following disclaimer;
-
-   2. distributions in binary form include the above copyright
-      notice, this list of conditions and the following disclaimer
-      in the documentation and/or other associated materials;
-
-   3. the copyright holder's name is not used to endorse products
-      built using this software without specific written permission.
-
- ALTERNATIVELY, provided that this notice is retained in full, this product
- may be distributed under the terms of the GNU General Public License (GPL),
- in which case the provisions of the GPL apply INSTEAD OF those given above.
-
- DISCLAIMER
-
- This software is provided 'as is' with no explicit or implied warranties
- in respect of its properties, including, but not limited to, correctness
- and/or fitness for purpose.
- ---------------------------------------------------------------------------
- Issue 20/10/2006
-*/
-
-#ifndef BRG_ENDIAN_H
-#define BRG_ENDIAN_H
-
-#define IS_BIG_ENDIAN      4321 /* byte 0 is most significant (mc68k) */
-#define IS_LITTLE_ENDIAN   1234 /* byte 0 is least significant (i386) */
-
-/* Include files where endian defines and byteswap functions may reside */
-#if defined( __FreeBSD__ ) || defined( __OpenBSD__ ) || defined( __NetBSD__ )
-#  include <sys/endian.h>
-#elif defined( BSD ) && ( BSD >= 199103 ) || defined( __APPLE__ ) || \
-      defined( __CYGWIN32__ ) || defined( __DJGPP__ ) || defined( __osf__ )
-#  include <machine/endian.h>
-#elif defined( __linux__ ) || defined( __GNUC__ ) || defined( __GNU_LIBRARY__ )
-#  if !defined( __MINGW32__ ) && !defined(AVR)
-#    include <endian.h>
-#    if !defined( __BEOS__ )
-#      include <byteswap.h>
-#    endif
-#  endif
-#endif
-
-/* Now attempt to set the define for platform byte order using any  */
-/* of the four forms SYMBOL, _SYMBOL, __SYMBOL & __SYMBOL__, which  */
-/* seem to encompass most endian symbol definitions                 */
-
-#if defined( BIG_ENDIAN ) && defined( LITTLE_ENDIAN )
-#  if defined( BYTE_ORDER ) && BYTE_ORDER == BIG_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#  elif defined( BYTE_ORDER ) && BYTE_ORDER == LITTLE_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#  endif
-#elif defined( BIG_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#elif defined( LITTLE_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-
-#if defined( _BIG_ENDIAN ) && defined( _LITTLE_ENDIAN )
-#  if defined( _BYTE_ORDER ) && _BYTE_ORDER == _BIG_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#  elif defined( _BYTE_ORDER ) && _BYTE_ORDER == _LITTLE_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#  endif
-#elif defined( _BIG_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#elif defined( _LITTLE_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-
-#if defined( __BIG_ENDIAN ) && defined( __LITTLE_ENDIAN )
-#  if defined( __BYTE_ORDER ) && __BYTE_ORDER == __BIG_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#  elif defined( __BYTE_ORDER ) && __BYTE_ORDER == __LITTLE_ENDIAN
-#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#  endif
-#elif defined( __BIG_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#elif defined( __LITTLE_ENDIAN )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-
-#if defined( __BIG_ENDIAN__ ) && defined( __LITTLE_ENDIAN__ )
-#  if defined( __BYTE_ORDER__ ) && __BYTE_ORDER__ == __BIG_ENDIAN__
-#    define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#  elif defined( __BYTE_ORDER__ ) && __BYTE_ORDER__ == __LITTLE_ENDIAN__
-#    define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#  endif
-#elif defined( __BIG_ENDIAN__ )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#elif defined( __LITTLE_ENDIAN__ )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-
-/*  if the platform byte order could not be determined, then try to */
-/*  set this define using common machine defines                    */
-#if !defined(PLATFORM_BYTE_ORDER)
-
-#if   defined( __alpha__ ) || defined( __alpha ) || defined( i386 )       || \
-      defined( __i386__ )  || defined( _M_I86 )  || defined( _M_IX86 )    || \
-      defined( __OS2__ )   || defined( sun386 )  || defined( __TURBOC__ ) || \
-      defined( vax )       || defined( vms )     || defined( VMS )        || \
-      defined( __VMS )     || defined( _M_X64 )  || defined( AVR )
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-
-#elif defined( AMIGA )   || defined( applec )    || defined( __AS400__ )  || \
-      defined( _CRAY )   || defined( __hppa )    || defined( __hp9000 )   || \
-      defined( ibm370 )  || defined( mc68000 )   || defined( m68k )       || \
-      defined( __MRC__ ) || defined( __MVS__ )   || defined( __MWERKS__ ) || \
-      defined( sparc )   || defined( __sparc)    || defined( SYMANTEC_C ) || \
-      defined( __VOS__ ) || defined( __TIGCC__ ) || defined( __TANDEM )   || \
-      defined( THINK_C ) || defined( __VMCMS__ )
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-
-#elif 0     /* **** EDIT HERE IF NECESSARY **** */
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#elif 0     /* **** EDIT HERE IF NECESSARY **** */
-#  define PLATFORM_BYTE_ORDER IS_BIG_ENDIAN
-#else
-#  error Please edit lines 126 or 128 in brg_endian.h to set the platform byte order
-#endif
-#endif
-
-/* special handler for IA64, which may be either endianness (?)  */
-/* here we assume little-endian, but this may need to be changed */
-#if defined(__ia64) || defined(__ia64__) || defined(_M_IA64)
-#  define PLATFORM_MUST_ALIGN (1)
-#ifndef PLATFORM_BYTE_ORDER
-#  define PLATFORM_BYTE_ORDER IS_LITTLE_ENDIAN
-#endif
-#endif
-
-#ifndef   PLATFORM_MUST_ALIGN
-#  define PLATFORM_MUST_ALIGN (0)
-#endif
-
-#endif  /* ifndef BRG_ENDIAN_H */
diff --git a/drivers/staging/skein/include/brg_types.h b/drivers/staging/skein/include/brg_types.h
deleted file mode 100644
index 3d9fe0df5238..000000000000
--- a/drivers/staging/skein/include/brg_types.h
+++ /dev/null
@@ -1,131 +0,0 @@
-/*
- ---------------------------------------------------------------------------
- Copyright (c) 1998-2006, Brian Gladman, Worcester, UK. All rights reserved.
-
- LICENSE TERMS
-
- The free distribution and use of this software in both source and binary
- form is allowed (with or without changes) provided that:
-
-   1. distributions of this source code include the above copyright
-      notice, this list of conditions and the following disclaimer;
-
-   2. distributions in binary form include the above copyright
-      notice, this list of conditions and the following disclaimer
-      in the documentation and/or other associated materials;
-
-   3. the copyright holder's name is not used to endorse products
-      built using this software without specific written permission.
-
- ALTERNATIVELY, provided that this notice is retained in full, this product
- may be distributed under the terms of the GNU General Public License (GPL),
- in which case the provisions of the GPL apply INSTEAD OF those given above.
-
- DISCLAIMER
-
- This software is provided 'as is' with no explicit or implied warranties
- in respect of its properties, including, but not limited to, correctness
- and/or fitness for purpose.
- ---------------------------------------------------------------------------
- Issue 09/09/2006
-
- The unsigned integer types defined here are of the form uint_<nn>t where
- <nn> is the length of the type; for example, the unsigned 32-bit type is
- 'uint_32t'.  These are NOT the same as the 'C99 integer types' that are
- defined in the inttypes.h and stdint.h headers since attempts to use these
- types have shown that support for them is still highly variable.  However,
- since the latter are of the form uint<nn>_t, a regular expression search
- and replace (in VC++ search on 'uint_{:z}t' and replace with 'uint\1_t')
- can be used to convert the types used here to the C99 standard types.
-*/
-
-#ifndef BRG_TYPES_H
-#define BRG_TYPES_H
-
-#if defined(__cplusplus)
-extern "C" {
-#endif
-
-#ifndef BRG_UI8
-#  define BRG_UI8
-     typedef unsigned char uint_8t;
-#endif
-
-#ifndef BRG_UI16
-#  define BRG_UI16
-     typedef unsigned short uint_16t;
-#endif
-
-#ifndef BRG_UI32
-#  define BRG_UI32
-#    define li_32(h) 0x##h##u
-     typedef unsigned int uint_32t;
-#endif
-
-#ifndef BRG_UI64
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ull
-       typedef unsigned long long uint_64t;
-#endif
-
-#if defined( NEED_UINT_64T ) && !defined( BRG_UI64 )
-#  error Please define uint_64t as an unsigned 64 bit type in brg_types.h
-#endif
-
-#ifndef RETURN_VALUES
-#  define RETURN_VALUES
-#  if defined( DLL_EXPORT )
-#    if defined( _MSC_VER ) || defined ( __INTEL_COMPILER )
-#      define VOID_RETURN    __declspec( dllexport ) void __stdcall
-#      define INT_RETURN     __declspec( dllexport ) int  __stdcall
-#    elif defined( __GNUC__ )
-#      define VOID_RETURN    __declspec( __dllexport__ ) void
-#      define INT_RETURN     __declspec( __dllexport__ ) int
-#    else
-#      error Use of the DLL is only available on the Microsoft, Intel and GCC compilers
-#    endif
-#  elif defined( DLL_IMPORT )
-#    if defined( _MSC_VER ) || defined ( __INTEL_COMPILER )
-#      define VOID_RETURN    __declspec( dllimport ) void __stdcall
-#      define INT_RETURN     __declspec( dllimport ) int  __stdcall
-#    elif defined( __GNUC__ )
-#      define VOID_RETURN    __declspec( __dllimport__ ) void
-#      define INT_RETURN     __declspec( __dllimport__ ) int
-#    else
-#      error Use of the DLL is only available on the Microsoft, Intel and GCC compilers
-#    endif
-#  elif defined( __WATCOMC__ )
-#    define VOID_RETURN  void __cdecl
-#    define INT_RETURN   int  __cdecl
-#  else
-#    define VOID_RETURN  void
-#    define INT_RETURN   int
-#  endif
-#endif
-
-/*  These defines are used to declare buffers in a way that allows
-    faster operations on longer variables to be used.  In all these
-    defines 'size' must be a power of 2 and >= 8
-
-    dec_unit_type(size,x)       declares a variable 'x' of length 
-                                'size' bits
-
-    dec_bufr_type(size,bsize,x) declares a buffer 'x' of length 'bsize' 
-                                bytes defined as an array of variables
-                                each of 'size' bits (bsize must be a 
-                                multiple of size / 8)
-
-    ptr_cast(x,size)            casts a pointer to a pointer to a 
-                                varaiable of length 'size' bits
-*/
-
-#define ui_type(size)               uint_##size##t
-#define dec_unit_type(size,x)       typedef ui_type(size) x
-#define dec_bufr_type(size,bsize,x) typedef ui_type(size) x[bsize / (size >> 3)]
-#define ptr_cast(x,size)            ((ui_type(size)*)(x))
-
-#if defined(__cplusplus)
-}
-#endif
-
-#endif
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
index 1c68070358ce..e78c976dccc5 100644
--- a/drivers/staging/skein/include/skein_port.h
+++ b/drivers/staging/skein/include/skein_port.h
@@ -15,11 +15,9 @@
 ** 
 ********************************************************************/
 
-#include <brg_types.h>                      /* get integer type definitions */
-
 typedef unsigned int    uint_t;             /* native unsigned integer */
-typedef uint_8t         u08b_t;             /*  8-bit unsigned integer */
-typedef uint_64t        u64b_t;             /* 64-bit unsigned integer */
+typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
+typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
 
 #ifndef RotL_64
 #define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 04/21] staging: crypto: skein: remove skein_port.h
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (2 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 03/21] staging: crypto: skein: remove brg_*.h includes Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 05/21] staging: crypto: skein: remove __cplusplus and an unneeded stddef.h Jason Cooper
                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h      |  14 +++-
 drivers/staging/skein/include/skein_port.h | 108 -----------------------------
 drivers/staging/skein/skein.c              |  21 ------
 3 files changed, 13 insertions(+), 130 deletions(-)
 delete mode 100644 drivers/staging/skein/include/skein_port.h

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 315cdcd14413..211aca1b1036 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -33,7 +33,19 @@ extern "C"
 #endif
 
 #include <stddef.h>                          /* get size_t definition */
-#include <skein_port.h>               /* get platform-specific definitions */
+
+typedef unsigned int    uint_t;             /* native unsigned integer */
+typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
+typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
+
+#ifndef RotL_64
+#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
+#endif
+
+/* below two prototype assume we are handed aligned data */
+#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
+#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
+#define Skein_Swap64(w64)  (w64)
 
 enum
     {
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
deleted file mode 100644
index e78c976dccc5..000000000000
--- a/drivers/staging/skein/include/skein_port.h
+++ /dev/null
@@ -1,108 +0,0 @@
-#ifndef _SKEIN_PORT_H_
-#define _SKEIN_PORT_H_
-/*******************************************************************
-**
-** Platform-specific definitions for Skein hash function.
-**
-** Source code author: Doug Whiting, 2008.
-**
-** This algorithm and source code is released to the public domain.
-**
-** Many thanks to Brian Gladman for his portable header files.
-**
-** To port Skein to an "unsupported" platform, change the definitions
-** in this file appropriately.
-** 
-********************************************************************/
-
-typedef unsigned int    uint_t;             /* native unsigned integer */
-typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
-typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
-
-#ifndef RotL_64
-#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
-#endif
-
-/*
- * Skein is "natively" little-endian (unlike SHA-xxx), for optimal
- * performance on x86 CPUs.  The Skein code requires the following
- * definitions for dealing with endianness:
- *
- *    SKEIN_NEED_SWAP:  0 for little-endian, 1 for big-endian
- *    Skein_Put64_LSB_First
- *    Skein_Get64_LSB_First
- *    Skein_Swap64
- *
- * If SKEIN_NEED_SWAP is defined at compile time, it is used here
- * along with the portable versions of Put64/Get64/Swap64, which 
- * are slow in general.
- *
- * Otherwise, an "auto-detect" of endianness is attempted below.
- * If the default handling doesn't work well, the user may insert
- * platform-specific code instead (e.g., for big-endian CPUs).
- *
- */
-#define SKEIN_NEED_SWAP   (0)
-/* below two prototype assume we are handed aligned data */
-#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
-#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
-
-/*
- ******************************************************************
- *      Provide any definitions still needed.
- ******************************************************************
- */
-#ifndef Skein_Swap64  /* swap for big-endian, nop for little-endian */
-#if     SKEIN_NEED_SWAP
-#define Skein_Swap64(w64)                       \
-  ( (( ((u64b_t)(w64))       & 0xFF) << 56) |   \
-    (((((u64b_t)(w64)) >> 8) & 0xFF) << 48) |   \
-    (((((u64b_t)(w64)) >>16) & 0xFF) << 40) |   \
-    (((((u64b_t)(w64)) >>24) & 0xFF) << 32) |   \
-    (((((u64b_t)(w64)) >>32) & 0xFF) << 24) |   \
-    (((((u64b_t)(w64)) >>40) & 0xFF) << 16) |   \
-    (((((u64b_t)(w64)) >>48) & 0xFF) <<  8) |   \
-    (((((u64b_t)(w64)) >>56) & 0xFF)      ) )
-#else
-#define Skein_Swap64(w64)  (w64)
-#endif
-#endif  /* ifndef Skein_Swap64 */
-
-
-#ifndef Skein_Put64_LSB_First
-void    Skein_Put64_LSB_First(u08b_t *dst,const u64b_t *src,size_t bCnt)
-#ifdef  SKEIN_PORT_CODE /* instantiate the function code here? */
-    { /* this version is fully portable (big-endian or little-endian), but slow */
-    size_t n;
-
-    for (n=0;n<bCnt;n++)
-        dst[n] = (u08b_t) (src[n>>3] >> (8*(n&7)));
-    }
-#else
-    ;    /* output only the function prototype */
-#endif
-#endif   /* ifndef Skein_Put64_LSB_First */
-
-
-#ifndef Skein_Get64_LSB_First
-void    Skein_Get64_LSB_First(u64b_t *dst,const u08b_t *src,size_t wCnt)
-#ifdef  SKEIN_PORT_CODE /* instantiate the function code here? */
-    { /* this version is fully portable (big-endian or little-endian), but slow */
-    size_t n;
-
-    for (n=0;n<8*wCnt;n+=8)
-        dst[n/8] = (((u64b_t) src[n  ])      ) +
-                   (((u64b_t) src[n+1]) <<  8) +
-                   (((u64b_t) src[n+2]) << 16) +
-                   (((u64b_t) src[n+3]) << 24) +
-                   (((u64b_t) src[n+4]) << 32) +
-                   (((u64b_t) src[n+5]) << 40) +
-                   (((u64b_t) src[n+6]) << 48) +
-                   (((u64b_t) src[n+7]) << 56) ;
-    }
-#else
-    ;    /* output only the function prototype */
-#endif
-#endif   /* ifndef Skein_Get64_LSB_First */
-
-#endif   /* ifndef _SKEIN_PORT_H_ */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 3fae6fdf7c75..44468b6701ab 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -102,13 +102,6 @@ int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
         Skein_256_Update(ctx,key,keyBytes);     /* hash the key */
         Skein_256_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
         memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
-#if SKEIN_NEED_SWAP
-        {
-            uint_t i;
-            for (i=0;i<SKEIN_256_STATE_WORDS;i++)   /* convert key bytes to context words */
-                ctx->X[i] = Skein_Swap64(ctx->X[i]);
-        }
-#endif
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
@@ -297,13 +290,6 @@ int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
         Skein_512_Update(ctx,key,keyBytes);     /* hash the key */
         Skein_512_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
         memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
-#if SKEIN_NEED_SWAP
-        {
-            uint_t i;
-            for (i=0;i<SKEIN_512_STATE_WORDS;i++)   /* convert key bytes to context words */
-                ctx->X[i] = Skein_Swap64(ctx->X[i]);
-        }
-#endif
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
@@ -489,13 +475,6 @@ int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
         Skein1024_Update(ctx,key,keyBytes);     /* hash the key */
         Skein1024_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
         memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
-#if SKEIN_NEED_SWAP
-        {
-            uint_t i;
-            for (i=0;i<SKEIN1024_STATE_WORDS;i++)   /* convert key bytes to context words */
-                ctx->X[i] = Skein_Swap64(ctx->X[i]);
-        }
-#endif
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 05/21] staging: crypto: skein: remove __cplusplus and an unneeded stddef.h
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (3 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 04/21] staging: crypto: skein: remove skein_port.h Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 06/21] staging: crypto: skein: remove unneeded typedefs Jason Cooper
                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 11 -----------
 drivers/staging/skein/include/skeinApi.h     |  9 ---------
 drivers/staging/skein/include/threefishApi.h |  9 ---------
 3 files changed, 29 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 211aca1b1036..b1e55b08d150 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -27,13 +27,6 @@
 **                                1: return SKEIN_FAIL to flag errors
 **
 ***************************************************************************/
-#ifdef __cplusplus
-extern "C"
-{
-#endif
-
-#include <stddef.h>                          /* get size_t definition */
-
 typedef unsigned int    uint_t;             /* native unsigned integer */
 typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
 typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
@@ -322,8 +315,4 @@ enum
 #define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS    ) + 5) % 10) + 5))
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif  /* ifndef _SKEIN_H_ */
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 734d27b79f01..f55c67e81f2b 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -81,11 +81,6 @@ OTHER DEALINGS IN THE SOFTWARE.
 #include <linux/types.h>
 #include <skein.h>
 
-#ifdef __cplusplus
-extern "C"
-{
-#endif
-
     /**
      * Which Skein size to use
      */
@@ -229,10 +224,6 @@ extern "C"
      */
     int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash);
 
-#ifdef __cplusplus
-}
-#endif
-
 /**
  * @}
  */
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index dae270cf71d3..aaecfe822142 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -33,11 +33,6 @@
 
 #define KeyScheduleConst 0x1BD11BDAA9FC1A22L
 
-#ifdef __cplusplus
-extern "C"
-{
-#endif
-
     /**
      * Which Threefish size to use
      */
@@ -157,10 +152,6 @@ extern "C"
     void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
     void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
     void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-#ifdef __cplusplus
-}
-#endif
-
 /**
  * @}
  */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 06/21] staging: crypto: skein: remove unneeded typedefs
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (4 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 05/21] staging: crypto: skein: remove __cplusplus and an unneeded stddef.h Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 07/21] staging: crypto: skein: remove all typedef {struct,enum} Jason Cooper
                     ` (14 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 73 ++++++++++-----------
 drivers/staging/skein/include/skeinApi.h     |  4 +-
 drivers/staging/skein/include/skein_iv.h     | 26 ++++----
 drivers/staging/skein/include/threefishApi.h |  6 +-
 drivers/staging/skein/skein.c                | 96 ++++++++++++++--------------
 drivers/staging/skein/skeinApi.c             | 24 +++----
 drivers/staging/skein/skeinBlockNo3F.c       | 30 ++++-----
 drivers/staging/skein/skein_block.c          | 54 ++++++++--------
 drivers/staging/skein/threefishApi.c         |  8 +--
 9 files changed, 159 insertions(+), 162 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index b1e55b08d150..12c5c8d612b0 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -27,9 +27,6 @@
 **                                1: return SKEIN_FAIL to flag errors
 **
 ***************************************************************************/
-typedef unsigned int    uint_t;             /* native unsigned integer */
-typedef uint8_t         u08b_t;             /*  8-bit unsigned integer */
-typedef uint64_t        u64b_t;             /* 64-bit unsigned integer */
 
 #ifndef RotL_64
 #define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
@@ -70,28 +67,28 @@ typedef struct
     {
     size_t  hashBitLen;                      /* size of hash result, in bits */
     size_t  bCnt;                            /* current byte count in buffer b[] */
-    u64b_t  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+    u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
     } Skein_Ctxt_Hdr_t;
 
 typedef struct                               /*  256-bit Skein hash context structure */
     {
     Skein_Ctxt_Hdr_t h;                      /* common header context variables */
-    u64b_t  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
-    u08b_t  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
     } Skein_256_Ctxt_t;
 
 typedef struct                               /*  512-bit Skein hash context structure */
     {
     Skein_Ctxt_Hdr_t h;                      /* common header context variables */
-    u64b_t  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
-    u08b_t  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
     } Skein_512_Ctxt_t;
 
 typedef struct                               /* 1024-bit Skein hash context structure */
     {
     Skein_Ctxt_Hdr_t h;                      /* common header context variables */
-    u64b_t  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
-    u08b_t  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
     } Skein1024_Ctxt_t;
 
 /*   Skein APIs for (incremental) "straight hashing" */
@@ -99,13 +96,13 @@ int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
 int  Skein_512_Init  (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
 int  Skein1024_Init  (Skein1024_Ctxt_t *ctx, size_t hashBitLen);
 
-int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
-int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
-int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt);
+int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
 
-int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u8 * hashVal);
+int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u8 * hashVal);
 
 /*
 **   Skein APIs for "extended" initialization: MAC keys, tree hashing.
@@ -121,26 +118,26 @@ int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 **              to precompute the MAC IV, then a copy of the context saved and
 **              reused for each new MAC computation.
 **/
-int  Skein_256_InitExt(Skein_256_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
-int  Skein_512_InitExt(Skein_512_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
-int  Skein1024_InitExt(Skein1024_Ctxt_t *ctx, size_t hashBitLen, u64b_t treeInfo, const u08b_t *key, size_t keyBytes);
+int  Skein_256_InitExt(Skein_256_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_512_InitExt(Skein_512_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein1024_InitExt(Skein1024_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
 
 /*
 **   Skein APIs for MAC and tree hash:
 **      Final_Pad:  pad, do final block, but no OUTPUT type
 **      Output:     do just the output stage
 */
-int  Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 * hashVal);
+int  Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 * hashVal);
 
 #ifndef SKEIN_TREE_HASH
 #define SKEIN_TREE_HASH (1)
 #endif
 #if  SKEIN_TREE_HASH
-int  Skein_256_Output   (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein_512_Output   (Skein_512_Ctxt_t *ctx, u08b_t * hashVal);
-int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
+int  Skein_256_Output   (Skein_256_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_512_Output   (Skein_512_Ctxt_t *ctx, u8 * hashVal);
+int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u8 * hashVal);
 #endif
 
 /*****************************************************************
@@ -161,13 +158,13 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
                                 
 /* tweak word T[1]: flag bit definition(s) */
-#define SKEIN_T1_FLAG_FIRST     (((u64b_t)  1 ) << SKEIN_T1_POS_FIRST)
-#define SKEIN_T1_FLAG_FINAL     (((u64b_t)  1 ) << SKEIN_T1_POS_FINAL)
-#define SKEIN_T1_FLAG_BIT_PAD   (((u64b_t)  1 ) << SKEIN_T1_POS_BIT_PAD)
+#define SKEIN_T1_FLAG_FIRST     (((u64)  1 ) << SKEIN_T1_POS_FIRST)
+#define SKEIN_T1_FLAG_FINAL     (((u64)  1 ) << SKEIN_T1_POS_FINAL)
+#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1 ) << SKEIN_T1_POS_BIT_PAD)
                                 
 /* tweak word T[1]: tree level bit field mask */
-#define SKEIN_T1_TREE_LVL_MASK  (((u64b_t)0x7F) << SKEIN_T1_POS_TREE_LVL)
-#define SKEIN_T1_TREE_LEVEL(n)  (((u64b_t) (n)) << SKEIN_T1_POS_TREE_LVL)
+#define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
+#define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
 
 /* tweak word T[1]: block type field */
 #define SKEIN_BLK_TYPE_KEY      ( 0)                    /* key, for MAC and KDF */
@@ -180,7 +177,7 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
 #define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
 
-#define SKEIN_T1_BLK_TYPE(T)   (((u64b_t) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
+#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
 #define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
 #define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
 #define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
@@ -200,7 +197,7 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
 #endif
 
-#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64b_t) (hi32)) << 32))
+#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64) (hi32)) << 32))
 #define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION,SKEIN_ID_STRING_LE)
 #define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA,0xA9FC1A22)
 
@@ -211,14 +208,14 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define SKEIN_CFG_TREE_NODE_SIZE_POS  ( 8)
 #define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
 
-#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
-#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
-#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
+#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
+#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
+#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
 
 #define SKEIN_CFG_TREE_INFO(leaf,node,maxLvl)                   \
-    ( (((u64b_t)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
-      (((u64b_t)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
-      (((u64b_t)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )
+    ( (((u64)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+      (((u64)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+      (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )
 
 #define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0,0,0) /* use as treeInfo in InitExt() call for sequential processing */
 
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index f55c67e81f2b..fb4a7c8e7f7a 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -99,8 +99,8 @@ OTHER DEALINGS IN THE SOFTWARE.
      * structures as well.
      */
     typedef struct SkeinCtx {
-        u64b_t skeinSize;
-        u64b_t  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
+        u64 skeinSize;
+        u64  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
         union {
             Skein_Ctxt_Hdr_t h;
             Skein_256_Ctxt_t s256;
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index 555ea619500b..94ac2f7cde76 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -20,7 +20,7 @@
 #define MK_64 SKEIN_MK_64
 
 /* blkSize =  256 bits. hashSize =  128 bits */
-const u64b_t SKEIN_256_IV_128[] =
+const u64 SKEIN_256_IV_128[] =
     {
     MK_64(0xE1111906,0x964D7260),
     MK_64(0x883DAAA7,0x7C8D811C),
@@ -29,7 +29,7 @@ const u64b_t SKEIN_256_IV_128[] =
     };
 
 /* blkSize =  256 bits. hashSize =  160 bits */
-const u64b_t SKEIN_256_IV_160[] =
+const u64 SKEIN_256_IV_160[] =
     {
     MK_64(0x14202314,0x72825E98),
     MK_64(0x2AC4E9A2,0x5A77E590),
@@ -38,7 +38,7 @@ const u64b_t SKEIN_256_IV_160[] =
     };
 
 /* blkSize =  256 bits. hashSize =  224 bits */
-const u64b_t SKEIN_256_IV_224[] =
+const u64 SKEIN_256_IV_224[] =
     {
     MK_64(0xC6098A8C,0x9AE5EA0B),
     MK_64(0x876D5686,0x08C5191C),
@@ -47,7 +47,7 @@ const u64b_t SKEIN_256_IV_224[] =
     };
 
 /* blkSize =  256 bits. hashSize =  256 bits */
-const u64b_t SKEIN_256_IV_256[] =
+const u64 SKEIN_256_IV_256[] =
     {
     MK_64(0xFC9DA860,0xD048B449),
     MK_64(0x2FCA6647,0x9FA7D833),
@@ -56,7 +56,7 @@ const u64b_t SKEIN_256_IV_256[] =
     };
 
 /* blkSize =  512 bits. hashSize =  128 bits */
-const u64b_t SKEIN_512_IV_128[] =
+const u64 SKEIN_512_IV_128[] =
     {
     MK_64(0xA8BC7BF3,0x6FBF9F52),
     MK_64(0x1E9872CE,0xBD1AF0AA),
@@ -69,7 +69,7 @@ const u64b_t SKEIN_512_IV_128[] =
     };
 
 /* blkSize =  512 bits. hashSize =  160 bits */
-const u64b_t SKEIN_512_IV_160[] =
+const u64 SKEIN_512_IV_160[] =
     {
     MK_64(0x28B81A2A,0xE013BD91),
     MK_64(0xC2F11668,0xB5BDF78F),
@@ -82,7 +82,7 @@ const u64b_t SKEIN_512_IV_160[] =
     };
 
 /* blkSize =  512 bits. hashSize =  224 bits */
-const u64b_t SKEIN_512_IV_224[] =
+const u64 SKEIN_512_IV_224[] =
     {
     MK_64(0xCCD06162,0x48677224),
     MK_64(0xCBA65CF3,0xA92339EF),
@@ -95,7 +95,7 @@ const u64b_t SKEIN_512_IV_224[] =
     };
 
 /* blkSize =  512 bits. hashSize =  256 bits */
-const u64b_t SKEIN_512_IV_256[] =
+const u64 SKEIN_512_IV_256[] =
     {
     MK_64(0xCCD044A1,0x2FDB3E13),
     MK_64(0xE8359030,0x1A79A9EB),
@@ -108,7 +108,7 @@ const u64b_t SKEIN_512_IV_256[] =
     };
 
 /* blkSize =  512 bits. hashSize =  384 bits */
-const u64b_t SKEIN_512_IV_384[] =
+const u64 SKEIN_512_IV_384[] =
     {
     MK_64(0xA3F6C6BF,0x3A75EF5F),
     MK_64(0xB0FEF9CC,0xFD84FAA4),
@@ -121,7 +121,7 @@ const u64b_t SKEIN_512_IV_384[] =
     };
 
 /* blkSize =  512 bits. hashSize =  512 bits */
-const u64b_t SKEIN_512_IV_512[] =
+const u64 SKEIN_512_IV_512[] =
     {
     MK_64(0x4903ADFF,0x749C51CE),
     MK_64(0x0D95DE39,0x9746DF03),
@@ -134,7 +134,7 @@ const u64b_t SKEIN_512_IV_512[] =
     };
 
 /* blkSize = 1024 bits. hashSize =  384 bits */
-const u64b_t SKEIN1024_IV_384[] =
+const u64 SKEIN1024_IV_384[] =
     {
     MK_64(0x5102B6B8,0xC1894A35),
     MK_64(0xFEEBC9E3,0xFE8AF11A),
@@ -155,7 +155,7 @@ const u64b_t SKEIN1024_IV_384[] =
     };
 
 /* blkSize = 1024 bits. hashSize =  512 bits */
-const u64b_t SKEIN1024_IV_512[] =
+const u64 SKEIN1024_IV_512[] =
     {
     MK_64(0xCAEC0E5D,0x7C1B1B18),
     MK_64(0xA01B0E04,0x5F03E802),
@@ -176,7 +176,7 @@ const u64b_t SKEIN1024_IV_512[] =
     };
 
 /* blkSize = 1024 bits. hashSize = 1024 bits */
-const u64b_t SKEIN1024_IV_1024[] =
+const u64 SKEIN1024_IV_1024[] =
     {
     MK_64(0xD593DA07,0x41E72355),
     MK_64(0x15B5E511,0xAC73E00C),
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index aaecfe822142..0123a575b606 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -51,9 +51,9 @@
      * structures as well.
      */
     typedef struct ThreefishKey {
-        u64b_t stateSize;
-        u64b_t key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
-        u64b_t tweak[3];
+        u64 stateSize;
+        u64 key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
+        u64 tweak[3];
     } ThreefishKey_t;
 
     /**
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 44468b6701ab..b225642efa4a 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -16,9 +16,9 @@
 
 /*****************************************************************/
 /* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
@@ -30,8 +30,8 @@ int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
 {
     union
     {
-        u08b_t  b[SKEIN_256_STATE_BYTES];
-        u64b_t  w[SKEIN_256_STATE_WORDS];
+        u8  b[SKEIN_256_STATE_BYTES];
+        u64  w[SKEIN_256_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -76,12 +76,12 @@ int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
-        u08b_t  b[SKEIN_256_STATE_BYTES];
-        u64b_t  w[SKEIN_256_STATE_WORDS];
+        u8  b[SKEIN_256_STATE_BYTES];
+        u64  w[SKEIN_256_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -126,7 +126,7 @@ int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -174,10 +174,10 @@ int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein_256_Final(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_256_Final(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN_256_STATE_WORDS];
+    u64 X[SKEIN_256_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
@@ -194,9 +194,9 @@ int Skein_256_Final(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_256_BLOCK_BYTES)
             n  = SKEIN_256_BLOCK_BYTES;
@@ -217,8 +217,8 @@ int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
 {
     union
     {
-        u08b_t  b[SKEIN_512_STATE_BYTES];
-        u64b_t  w[SKEIN_512_STATE_WORDS];
+        u8  b[SKEIN_512_STATE_BYTES];
+        u64  w[SKEIN_512_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -264,12 +264,12 @@ int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
-        u08b_t  b[SKEIN_512_STATE_BYTES];
-        u64b_t  w[SKEIN_512_STATE_WORDS];
+        u8  b[SKEIN_512_STATE_BYTES];
+        u64  w[SKEIN_512_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -314,7 +314,7 @@ int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -362,10 +362,10 @@ int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein_512_Final(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_512_Final(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN_512_STATE_WORDS];
+    u64 X[SKEIN_512_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
@@ -382,9 +382,9 @@ int Skein_512_Final(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_512_BLOCK_BYTES)
             n  = SKEIN_512_BLOCK_BYTES;
@@ -405,8 +405,8 @@ int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
 {
     union
     {
-        u08b_t  b[SKEIN1024_STATE_BYTES];
-        u64b_t  w[SKEIN1024_STATE_WORDS];
+        u8  b[SKEIN1024_STATE_BYTES];
+        u64  w[SKEIN1024_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -449,12 +449,12 @@ int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, const u08b_t *key, size_t keyBytes)
+int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
-        u08b_t  b[SKEIN1024_STATE_BYTES];
-        u64b_t  w[SKEIN1024_STATE_WORDS];
+        u8  b[SKEIN1024_STATE_BYTES];
+        u64  w[SKEIN1024_STATE_WORDS];
     } cfg;                              /* config block */
 
     Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
@@ -499,7 +499,7 @@ int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64b_t treeInfo, c
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt)
+int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -547,10 +547,10 @@ int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u08b_t *msg, size_t msgByteCnt
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein1024_Final(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN1024_STATE_WORDS];
+    u64 X[SKEIN1024_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
@@ -567,9 +567,9 @@ int Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN1024_BLOCK_BYTES)
             n  = SKEIN1024_BLOCK_BYTES;
@@ -585,7 +585,7 @@ int Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -601,7 +601,7 @@ int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -617,7 +617,7 @@ int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -634,10 +634,10 @@ int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
 #if SKEIN_TREE_HASH
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein_256_Output(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_256_Output(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN_256_STATE_WORDS];
+    u64 X[SKEIN_256_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
@@ -648,9 +648,9 @@ int Skein_256_Output(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_256_BLOCK_BYTES)
             n  = SKEIN_256_BLOCK_BYTES;
@@ -663,10 +663,10 @@ int Skein_256_Output(Skein_256_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein_512_Output(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein_512_Output(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN_512_STATE_WORDS];
+    u64 X[SKEIN_512_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
@@ -677,9 +677,9 @@ int Skein_512_Output(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_512_BLOCK_BYTES)
             n  = SKEIN_512_BLOCK_BYTES;
@@ -692,10 +692,10 @@ int Skein_512_Output(Skein_512_Ctxt_t *ctx, u08b_t *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein1024_Output(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
+int Skein1024_Output(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
-    u64b_t X[SKEIN1024_STATE_WORDS];
+    u64 X[SKEIN1024_STATE_WORDS];
     Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
@@ -706,9 +706,9 @@ int Skein1024_Output(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
     memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
     for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
     {
-        ((u64b_t *)ctx->b)[0]= Skein_Swap64((u64b_t) i); /* build the counter block */
+        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
         Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t)); /* run "counter mode" */
+        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN1024_BLOCK_BYTES)
             n  = SKEIN1024_BLOCK_BYTES;
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 579b92efbf65..ef021086bc61 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -41,7 +41,7 @@ int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
     size_t Xlen = 0;
-    u64b_t*  X = NULL;
+    u64*  X = NULL;
     uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
     Skein_Assert(ctx, SKEIN_FAIL);
@@ -82,7 +82,7 @@ int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
                  size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
-    u64b_t*  X = NULL;
+    u64*  X = NULL;
     size_t Xlen = 0;
     uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
@@ -97,18 +97,18 @@ int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
     case Skein256:
         ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
                                 treeInfo,
-                                (const u08b_t*)key, keyLen);
+                                (const u8*)key, keyLen);
 
         break;
     case Skein512:
         ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
                                 treeInfo,
-                                (const u08b_t*)key, keyLen);
+                                (const u8*)key, keyLen);
         break;
     case Skein1024:
         ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
                                 treeInfo,
-                                (const u08b_t*)key, keyLen);
+                                (const u8*)key, keyLen);
 
         break;
     }
@@ -122,7 +122,7 @@ int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
 void skeinReset(SkeinCtx_t* ctx)
 {
     size_t Xlen = 0;
-    u64b_t*  X = NULL;
+    u64*  X = NULL;
 
     /*
      * The following two lines rely of the fact that the real Skein contexts are
@@ -146,13 +146,13 @@ int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
 
     switch (ctx->skeinSize) {
     case Skein256:
-        ret = Skein_256_Update(&ctx->m.s256, (const u08b_t*)msg, msgByteCnt);
+        ret = Skein_256_Update(&ctx->m.s256, (const u8*)msg, msgByteCnt);
         break;
     case Skein512:
-        ret = Skein_512_Update(&ctx->m.s512, (const u08b_t*)msg, msgByteCnt);
+        ret = Skein_512_Update(&ctx->m.s512, (const u8*)msg, msgByteCnt);
         break;
     case Skein1024:
-        ret = Skein1024_Update(&ctx->m.s1024, (const u08b_t*)msg, msgByteCnt);
+        ret = Skein1024_Update(&ctx->m.s1024, (const u8*)msg, msgByteCnt);
         break;
     }
     return ret;
@@ -206,13 +206,13 @@ int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash)
 
     switch (ctx->skeinSize) {
     case Skein256:
-        ret = Skein_256_Final(&ctx->m.s256, (u08b_t*)hash);
+        ret = Skein_256_Final(&ctx->m.s256, (u8*)hash);
         break;
     case Skein512:
-        ret = Skein_512_Final(&ctx->m.s512, (u08b_t*)hash);
+        ret = Skein_512_Final(&ctx->m.s512, (u8*)hash);
         break;
     case Skein1024:
-        ret = Skein1024_Final(&ctx->m.s1024, (u08b_t*)hash);
+        ret = Skein1024_Final(&ctx->m.s1024, (u8*)hash);
         break;
     }
     return ret;
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 6a19ceb17d0f..56c56b8ebd7e 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -5,21 +5,21 @@
 
 
 /*****************************  Skein_256 ******************************/
-void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u08b_t *blkPtr,
+void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u8 *blkPtr,
                              size_t blkCnt, size_t byteCntAdd)
 {
     ThreefishKey_t key;
-    u64b_t tweak[2];
+    u64 tweak[2];
     int i;
-    u64b_t  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
-    u64b_t words[3];
+    u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+    u64 words[3];
 
     Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
     tweak[0] = ctx->h.T[0];
     tweak[1] = ctx->h.T[1];
 
     do  {
-        u64b_t carry = byteCntAdd;
+        u64 carry = byteCntAdd;
 
         words[0] = tweak[0] & 0xffffffffL;
         words[1] = ((tweak[0] >> 32) & 0xffffffffL);
@@ -55,21 +55,21 @@ void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u08b_t *blkPtr,
     ctx->h.T[1] = tweak[1];
 }
 
-void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u08b_t *blkPtr,
+void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u8 *blkPtr,
                              size_t blkCnt, size_t byteCntAdd)
 {
     ThreefishKey_t key;
-    u64b_t tweak[2];
+    u64 tweak[2];
     int i;
-    u64b_t words[3];
-    u64b_t  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+    u64 words[3];
+    u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
 
     Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
     tweak[0] = ctx->h.T[0];
     tweak[1] = ctx->h.T[1];
 
     do  {
-        u64b_t carry = byteCntAdd;
+        u64 carry = byteCntAdd;
 
         words[0] = tweak[0] & 0xffffffffL;
         words[1] = ((tweak[0] >> 32) & 0xffffffffL);
@@ -109,21 +109,21 @@ void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u08b_t *blkPtr,
     ctx->h.T[1] = tweak[1];
 }
 
-void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx, const u08b_t *blkPtr,
+void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx, const u8 *blkPtr,
                               size_t blkCnt, size_t byteCntAdd)
 {
     ThreefishKey_t key;
-    u64b_t tweak[2];
+    u64 tweak[2];
     int i;
-    u64b_t words[3];
-    u64b_t  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+    u64 words[3];
+    u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
 
     Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
     tweak[0] = ctx->h.T[0];
     tweak[1] = ctx->h.T[1];
 
     do  {
-        u64b_t carry = byteCntAdd;
+        u64 carry = byteCntAdd;
 
         words[0] = tweak[0] & 0xffffffffL;
         words[1] = ((tweak[0] >> 32) & 0xffffffffL);
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index b5be41af6d17..98e884292044 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -39,7 +39,7 @@
 
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C */
     enum
         {
@@ -59,14 +59,14 @@ void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
 #endif
     size_t  r;
-    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64b_t  X0,X1,X2,X3;                        /* local copy of context vars, for speed */
-    u64b_t  w [WCNT];                           /* local copy of input block */
+    u64  X0,X1,X2,X3;                        /* local copy of context vars, for speed */
+    u64  w [WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64b_t *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+    const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
     Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
 #endif
     Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
@@ -212,10 +212,10 @@ void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein_256_Process_Block_CodeSize(void)
     {
-    return ((u08b_t *) Skein_256_Process_Block_CodeSize) -
-           ((u08b_t *) Skein_256_Process_Block);
+    return ((u8 *) Skein_256_Process_Block_CodeSize) -
+           ((u8 *) Skein_256_Process_Block);
     }
-uint_t Skein_256_Unroll_Cnt(void)
+unsigned int Skein_256_Unroll_Cnt(void)
     {
     return SKEIN_UNROLL_256;
     }
@@ -224,7 +224,7 @@ uint_t Skein_256_Unroll_Cnt(void)
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C */
     enum
         {
@@ -244,14 +244,14 @@ void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
 #endif
     size_t  r;
-    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64b_t  X0,X1,X2,X3,X4,X5,X6,X7;            /* local copy of vars, for speed */
-    u64b_t  w [WCNT];                           /* local copy of input block */
+    u64  X0,X1,X2,X3,X4,X5,X6,X7;            /* local copy of vars, for speed */
+    u64  w [WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64b_t *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+    const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
     Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
     Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
 #endif
@@ -420,10 +420,10 @@ void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein_512_Process_Block_CodeSize(void)
     {
-    return ((u08b_t *) Skein_512_Process_Block_CodeSize) -
-           ((u08b_t *) Skein_512_Process_Block);
+    return ((u8 *) Skein_512_Process_Block_CodeSize) -
+           ((u8 *) Skein_512_Process_Block);
     }
-uint_t Skein_512_Unroll_Cnt(void)
+unsigned int Skein_512_Unroll_Cnt(void)
     {
     return SKEIN_UNROLL_512;
     }
@@ -432,7 +432,7 @@ uint_t Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C, always looping (unrolled is bigger AND slower!) */
     enum
         {
@@ -452,16 +452,16 @@ void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
 #endif
     size_t  r;
-    u64b_t  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64b_t  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
 
-    u64b_t  X00,X01,X02,X03,X04,X05,X06,X07,    /* local copy of vars, for speed */
+    u64  X00,X01,X02,X03,X04,X05,X06,X07,    /* local copy of vars, for speed */
             X08,X09,X10,X11,X12,X13,X14,X15;
-    u64b_t  w [WCNT];                           /* local copy of input block */
+    u64  w [WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64b_t *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+    const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
     Xptr[ 0] = &X00;  Xptr[ 1] = &X01;  Xptr[ 2] = &X02;  Xptr[ 3] = &X03;
     Xptr[ 4] = &X04;  Xptr[ 5] = &X05;  Xptr[ 6] = &X06;  Xptr[ 7] = &X07;
     Xptr[ 8] = &X08;  Xptr[ 9] = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
@@ -678,10 +678,10 @@ void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,size_t b
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein1024_Process_Block_CodeSize(void)
     {
-    return ((u08b_t *) Skein1024_Process_Block_CodeSize) -
-           ((u08b_t *) Skein1024_Process_Block);
+    return ((u8 *) Skein1024_Process_Block_CodeSize) -
+           ((u8 *) Skein1024_Process_Block);
     }
-uint_t Skein1024_Unroll_Cnt(void)
+unsigned int Skein1024_Unroll_Cnt(void)
     {
     return SKEIN_UNROLL_1024;
     }
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 968d3d21fe61..ed19ee9e3425 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -25,8 +25,8 @@ void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
 void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
                                 uint8_t* out)
 {
-    u64b_t plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
-    u64b_t cipher[SKEIN_MAX_STATE_WORDS];
+    u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+    u64 cipher[SKEIN_MAX_STATE_WORDS];
     
     Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
     threefishEncryptBlockWords(keyCtx, plain, cipher);
@@ -52,8 +52,8 @@ void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
 void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
                                 uint8_t* out)
 {
-    u64b_t plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
-    u64b_t cipher[SKEIN_MAX_STATE_WORDS];
+    u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+    u64 cipher[SKEIN_MAX_STATE_WORDS];
     
     Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
     threefishDecryptBlockWords(keyCtx, cipher, plain);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 07/21] staging: crypto: skein: remove all typedef {struct,enum}
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (5 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 06/21] staging: crypto: skein: remove unneeded typedefs Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 08/21] staging: crypto: skein: use u8, u64 vice uint*_t Jason Cooper
                     ` (13 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 58 ++++++++++++++--------------
 drivers/staging/skein/include/skeinApi.h     | 32 +++++++--------
 drivers/staging/skein/include/threefishApi.h | 32 +++++++--------
 drivers/staging/skein/skein.c                | 42 ++++++++++----------
 drivers/staging/skein/skeinApi.c             | 16 ++++----
 drivers/staging/skein/skeinBlockNo3F.c       | 12 +++---
 drivers/staging/skein/skein_block.c          |  6 +--
 drivers/staging/skein/threefish1024Block.c   |  4 +-
 drivers/staging/skein/threefish256Block.c    |  4 +-
 drivers/staging/skein/threefish512Block.c    |  4 +-
 drivers/staging/skein/threefishApi.c         | 10 ++---
 11 files changed, 110 insertions(+), 110 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 12c5c8d612b0..77b712e73253 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -63,46 +63,46 @@ enum
 #define  SKEIN_512_BLOCK_BYTES ( 8*SKEIN_512_STATE_WORDS)
 #define  SKEIN1024_BLOCK_BYTES ( 8*SKEIN1024_STATE_WORDS)
 
-typedef struct
+struct skein_ctx_hdr
     {
     size_t  hashBitLen;                      /* size of hash result, in bits */
     size_t  bCnt;                            /* current byte count in buffer b[] */
     u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
-    } Skein_Ctxt_Hdr_t;
+    };
 
-typedef struct                               /*  256-bit Skein hash context structure */
+struct skein_256_ctx                               /*  256-bit Skein hash context structure */
     {
-    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    struct skein_ctx_hdr h;                      /* common header context variables */
     u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
     u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    } Skein_256_Ctxt_t;
+    };
 
-typedef struct                               /*  512-bit Skein hash context structure */
+struct skein_512_ctx                             /*  512-bit Skein hash context structure */
     {
-    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    struct skein_ctx_hdr h;                      /* common header context variables */
     u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
     u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    } Skein_512_Ctxt_t;
+    };
 
-typedef struct                               /* 1024-bit Skein hash context structure */
+struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
     {
-    Skein_Ctxt_Hdr_t h;                      /* common header context variables */
+    struct skein_ctx_hdr h;                      /* common header context variables */
     u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
     u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    } Skein1024_Ctxt_t;
+    };
 
 /*   Skein APIs for (incremental) "straight hashing" */
-int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
-int  Skein_512_Init  (Skein_512_Ctxt_t *ctx, size_t hashBitLen);
-int  Skein1024_Init  (Skein1024_Ctxt_t *ctx, size_t hashBitLen);
+int  Skein_256_Init  (struct skein_256_ctx *ctx, size_t hashBitLen);
+int  Skein_512_Init  (struct skein_512_ctx *ctx, size_t hashBitLen);
+int  Skein1024_Init  (struct skein1024_ctx *ctx, size_t hashBitLen);
 
-int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 
-int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u8 * hashVal);
-int  Skein_512_Final (Skein_512_Ctxt_t *ctx, u8 * hashVal);
-int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_256_Final (struct skein_256_ctx *ctx, u8 * hashVal);
+int  Skein_512_Final (struct skein_512_ctx *ctx, u8 * hashVal);
+int  Skein1024_Final (struct skein1024_ctx *ctx, u8 * hashVal);
 
 /*
 **   Skein APIs for "extended" initialization: MAC keys, tree hashing.
@@ -118,26 +118,26 @@ int  Skein1024_Final (Skein1024_Ctxt_t *ctx, u8 * hashVal);
 **              to precompute the MAC IV, then a copy of the context saved and
 **              reused for each new MAC computation.
 **/
-int  Skein_256_InitExt(Skein_256_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein_512_InitExt(Skein_512_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein1024_InitExt(Skein1024_Ctxt_t *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
 
 /*
 **   Skein APIs for MAC and tree hash:
 **      Final_Pad:  pad, do final block, but no OUTPUT type
 **      Output:     do just the output stage
 */
-int  Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 * hashVal);
-int  Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 * hashVal);
-int  Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 * hashVal);
+int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 * hashVal);
+int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 * hashVal);
 
 #ifndef SKEIN_TREE_HASH
 #define SKEIN_TREE_HASH (1)
 #endif
 #if  SKEIN_TREE_HASH
-int  Skein_256_Output   (Skein_256_Ctxt_t *ctx, u8 * hashVal);
-int  Skein_512_Output   (Skein_512_Ctxt_t *ctx, u8 * hashVal);
-int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u8 * hashVal);
+int  Skein_256_Output   (struct skein_256_ctx *ctx, u8 * hashVal);
+int  Skein_512_Output   (struct skein_512_ctx *ctx, u8 * hashVal);
+int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 * hashVal);
 #endif
 
 /*****************************************************************
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index fb4a7c8e7f7a..548c639431de 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -47,7 +47,7 @@ OTHER DEALINGS IN THE SOFTWARE.
  * #include <skeinApi.h>
  * 
  * ...
- * SkeinCtx_t ctx;             // a Skein hash or MAC context
+ * struct skein_ctx ctx;             // a Skein hash or MAC context
  * 
  * // prepare context, here for a Skein with a state size of 512 bits.
  * skeinCtxPrepare(&ctx, Skein512);
@@ -84,11 +84,11 @@ OTHER DEALINGS IN THE SOFTWARE.
     /**
      * Which Skein size to use
      */
-    typedef enum SkeinSize {
+    enum skein_size {
         Skein256 = 256,     /*!< Skein with 256 bit state */
         Skein512 = 512,     /*!< Skein with 512 bit state */
         Skein1024 = 1024    /*!< Skein with 1024 bit state */
-    } SkeinSize_t;
+    };
 
     /**
      * Context for Skein.
@@ -98,16 +98,16 @@ OTHER DEALINGS IN THE SOFTWARE.
      * variables. If Skein implementation changes this, then adapt these
      * structures as well.
      */
-    typedef struct SkeinCtx {
+    struct skein_ctx {
         u64 skeinSize;
         u64  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
         union {
-            Skein_Ctxt_Hdr_t h;
-            Skein_256_Ctxt_t s256;
-            Skein_512_Ctxt_t s512;
-            Skein1024_Ctxt_t s1024;
+            struct skein_ctx_hdr h;
+            struct skein_256_ctx s256;
+            struct skein_512_ctx s512;
+            struct skein1024_ctx s1024;
         } m;
-    } SkeinCtx_t;
+    };
 
     /**
      * Prepare a Skein context.
@@ -123,7 +123,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size);
+    int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size);
 
     /**
      * Initialize a Skein context.
@@ -139,7 +139,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     SKEIN_SUCESS of SKEIN_FAIL
      * @see skeinReset
      */
-    int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen);
+    int skeinInit(struct skein_ctx* ctx, size_t hashBitLen);
 
     /**
      * Resets a Skein context for further use.
@@ -151,7 +151,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @param ctx
      *     Pointer to a pre-initialized Skein MAC context
      */
-    void skeinReset(SkeinCtx_t* ctx);
+    void skeinReset(struct skein_ctx* ctx);
     
     /**
      * Initializes a Skein context for MAC usage.
@@ -173,7 +173,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
+    int skeinMacInit(struct skein_ctx* ctx, const uint8_t *key, size_t keyLen,
                      size_t hashBitLen);
 
     /**
@@ -188,7 +188,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     Success or error code.
      */
-    int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
+    int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
                     size_t msgByteCnt);
 
     /**
@@ -204,7 +204,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @param msgBitCnt
      *     Length of the message in @b bits.
      */
-    int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
+    int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
                         size_t msgBitCnt);
 
     /**
@@ -222,7 +222,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     Success or error code.
      * @see skeinReset
      */
-    int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash);
+    int skeinFinal(struct skein_ctx* ctx, uint8_t* hash);
 
 /**
  * @}
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 0123a575b606..4c1cd81f30c4 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -18,7 +18,7 @@
  * 
 @code
     // Threefish cipher context data
-    ThreefishKey_t keyCtx;
+    struct threefish_key keyCtx;
 
     // Initialize the context
     threefishSetKey(&keyCtx, Threefish512, key, tweak);
@@ -36,11 +36,11 @@
     /**
      * Which Threefish size to use
      */
-    typedef enum ThreefishSize {
+    enum threefish_size {
         Threefish256 = 256,     /*!< Skein with 256 bit state */
         Threefish512 = 512,     /*!< Skein with 512 bit state */
         Threefish1024 = 1024    /*!< Skein with 1024 bit state */
-    } ThreefishSize_t;
+    };
     
     /**
      * Context for Threefish key and tweak words.
@@ -50,11 +50,11 @@
      * variables. If Skein implementation changes this, the adapt these
      * structures as well.
      */
-    typedef struct ThreefishKey {
+    struct threefish_key {
         u64 stateSize;
         u64 key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
         u64 tweak[3];
-    } ThreefishKey_t;
+    };
 
     /**
      * Set Threefish key and tweak data.
@@ -72,7 +72,7 @@
      * @param tweak
      *     Pointer to the two tweak words (word has 64 bits).
      */
-    void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize, uint64_t* keyData, uint64_t* tweak);
+    void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize, uint64_t* keyData, uint64_t* tweak);
     
     /**
      * Encrypt Threefisch block (bytes).
@@ -89,7 +89,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in, uint8_t* out);
+    void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in, uint8_t* out);
     
     /**
      * Encrypt Threefisch block (words).
@@ -108,7 +108,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in, uint64_t* out);
+    void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in, uint64_t* out);
 
     /**
      * Decrypt Threefisch block (bytes).
@@ -125,7 +125,7 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in, uint8_t* out);
+    void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in, uint8_t* out);
 
     /**
      * Decrypt Threefisch block (words).
@@ -144,14 +144,14 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in, uint64_t* out);
+    void threefishDecryptBlockWords(struct threefish_key* keyCtx, uint64_t* in, uint64_t* out);
 
-    void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index b225642efa4a..2bed7c163316 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -16,9 +16,9 @@
 
 /*****************************************************************/
 /* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
@@ -26,7 +26,7 @@ void    Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t bl
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a straight hashing operation  */
-int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
+int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 {
     union
     {
@@ -76,7 +76,7 @@ int Skein_256_Init(Skein_256_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein_256_InitExt(struct skein_256_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -126,7 +126,7 @@ int Skein_256_InitExt(Skein_256_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, cons
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -174,7 +174,7 @@ int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein_256_Final(Skein_256_Ctxt_t *ctx, u8 *hashVal)
+int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN_256_STATE_WORDS];
@@ -213,7 +213,7 @@ int Skein_256_Final(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a straight hashing operation  */
-int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
+int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 {
     union
     {
@@ -264,7 +264,7 @@ int Skein_512_Init(Skein_512_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein_512_InitExt(struct skein_512_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -314,7 +314,7 @@ int Skein_512_InitExt(Skein_512_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, cons
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -362,7 +362,7 @@ int Skein_512_Update(Skein_512_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein_512_Final(Skein_512_Ctxt_t *ctx, u8 *hashVal)
+int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN_512_STATE_WORDS];
@@ -401,7 +401,7 @@ int Skein_512_Final(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a straight hashing operation  */
-int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
+int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 {
     union
     {
@@ -449,7 +449,7 @@ int Skein1024_Init(Skein1024_Ctxt_t *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein1024_InitExt(struct skein1024_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -499,7 +499,7 @@ int Skein1024_InitExt(Skein1024_Ctxt_t *ctx,size_t hashBitLen,u64 treeInfo, cons
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
     size_t n;
 
@@ -547,7 +547,7 @@ int Skein1024_Update(Skein1024_Ctxt_t *ctx, const u8 *msg, size_t msgByteCnt)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
-int Skein1024_Final(Skein1024_Ctxt_t *ctx, u8 *hashVal)
+int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN1024_STATE_WORDS];
@@ -585,7 +585,7 @@ int Skein1024_Final(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 *hashVal)
+int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -601,7 +601,7 @@ int Skein_256_Final_Pad(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 *hashVal)
+int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -617,7 +617,7 @@ int Skein_512_Final_Pad(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 *hashVal)
+int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 {
     Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
 
@@ -634,7 +634,7 @@ int Skein1024_Final_Pad(Skein1024_Ctxt_t *ctx, u8 *hashVal)
 #if SKEIN_TREE_HASH
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein_256_Output(Skein_256_Ctxt_t *ctx, u8 *hashVal)
+int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN_256_STATE_WORDS];
@@ -663,7 +663,7 @@ int Skein_256_Output(Skein_256_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein_512_Output(Skein_512_Ctxt_t *ctx, u8 *hashVal)
+int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN_512_STATE_WORDS];
@@ -692,7 +692,7 @@ int Skein_512_Output(Skein_512_Ctxt_t *ctx, u8 *hashVal)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
-int Skein1024_Output(Skein1024_Ctxt_t *ctx, u8 *hashVal)
+int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 {
     size_t i,n,byteCnt;
     u64 X[SKEIN1024_STATE_WORDS];
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index ef021086bc61..ce5c5ae575e7 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -27,17 +27,17 @@ OTHER DEALINGS IN THE SOFTWARE.
 #include <linux/string.h>
 #include <skeinApi.h>
 
-int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size)
+int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size)
 {
     Skein_Assert(ctx && size, SKEIN_FAIL);
 
-    memset(ctx ,0, sizeof(SkeinCtx_t));
+    memset(ctx ,0, sizeof(struct skein_ctx));
     ctx->skeinSize = size;
 
     return SKEIN_SUCCESS;
 }
 
-int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen)
+int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
     size_t Xlen = 0;
@@ -78,7 +78,7 @@ int skeinInit(SkeinCtx_t* ctx, size_t hashBitLen)
     return ret;
 }
 
-int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
+int skeinMacInit(struct skein_ctx* ctx, const uint8_t *key, size_t keyLen,
                  size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
@@ -119,7 +119,7 @@ int skeinMacInit(SkeinCtx_t* ctx, const uint8_t *key, size_t keyLen,
     return ret;
 }
 
-void skeinReset(SkeinCtx_t* ctx)
+void skeinReset(struct skein_ctx* ctx)
 {
     size_t Xlen = 0;
     u64*  X = NULL;
@@ -138,7 +138,7 @@ void skeinReset(SkeinCtx_t* ctx)
     Skein_Start_New_Type(&ctx->m, MSG);
 }
 
-int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
+int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
                 size_t msgByteCnt)
 {
     int ret = SKEIN_FAIL;
@@ -159,7 +159,7 @@ int skeinUpdate(SkeinCtx_t *ctx, const uint8_t *msg,
 
 }
 
-int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
+int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
                     size_t msgBitCnt)
 {
     /*
@@ -199,7 +199,7 @@ int skeinUpdateBits(SkeinCtx_t *ctx, const uint8_t *msg,
     return SKEIN_SUCCESS;
 }
 
-int skeinFinal(SkeinCtx_t* ctx, uint8_t* hash)
+int skeinFinal(struct skein_ctx* ctx, uint8_t* hash)
 {
     int ret = SKEIN_FAIL;
     Skein_Assert(ctx, SKEIN_FAIL);
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 56c56b8ebd7e..02e68dbab0d4 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -5,10 +5,10 @@
 
 
 /*****************************  Skein_256 ******************************/
-void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u8 *blkPtr,
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
                              size_t blkCnt, size_t byteCntAdd)
 {
-    ThreefishKey_t key;
+    struct threefish_key key;
     u64 tweak[2];
     int i;
     u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
@@ -55,10 +55,10 @@ void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx, const u8 *blkPtr,
     ctx->h.T[1] = tweak[1];
 }
 
-void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u8 *blkPtr,
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
                              size_t blkCnt, size_t byteCntAdd)
 {
-    ThreefishKey_t key;
+    struct threefish_key key;
     u64 tweak[2];
     int i;
     u64 words[3];
@@ -109,10 +109,10 @@ void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx, const u8 *blkPtr,
     ctx->h.T[1] = tweak[1];
 }
 
-void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx, const u8 *blkPtr,
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
                               size_t blkCnt, size_t byteCntAdd)
 {
-    ThreefishKey_t key;
+    struct threefish_key key;
     u64 tweak[2];
     int i;
     u64 words[3];
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 98e884292044..179bde121380 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -39,7 +39,7 @@
 
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(Skein_256_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C */
     enum
         {
@@ -224,7 +224,7 @@ unsigned int Skein_256_Unroll_Cnt(void)
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(Skein_512_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C */
     enum
         {
@@ -432,7 +432,7 @@ unsigned int Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
     { /* do it in C, always looping (unrolled is bigger AND slower!) */
     enum
         {
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 58a8c26a1f6f..738ec523406b 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
         {
 
     uint64_t b0 = input[0], b1 = input[1],
@@ -684,7 +684,7 @@ void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* out
             output[15] = b15 + k1 + 20;
         }
 
-void threefishDecrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
 {
 
     uint64_t b0 = input[0], b1 = input[1],
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index a7e06f905186..b81cb3a65b04 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
   {
 
     uint64_t b0 = input[0], b1 = input[1],
@@ -172,7 +172,7 @@ void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* outp
     output[3] = b3 + k1 + 18;
   }
 
-void threefishDecrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
   {
     uint64_t b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3];
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 3cbfcd9af5c9..7eed6aeb3742 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
     {
 
     uint64_t b0 = input[0], b1 = input[1],
@@ -316,7 +316,7 @@ void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* outp
         output[7] = b7 + k7 + 18;
     }
 
-void threefishDecrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
     {
 
     uint64_t b0 = input[0], b1 = input[1],
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index ed19ee9e3425..5cd3eb9bd9f2 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -3,7 +3,7 @@
 #include <linux/string.h>
 #include <threefishApi.h>
 
-void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
+void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize,
                      uint64_t* keyData, uint64_t* tweak)
 {
     int keyWords = stateSize / 64;
@@ -22,7 +22,7 @@ void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
     keyCtx->stateSize = stateSize;
 }
 
-void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
+void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
                                 uint8_t* out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
@@ -33,7 +33,7 @@ void threefishEncryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
     Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
 }
 
-void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
+void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
                                 uint64_t* out)
 {
     switch (keyCtx->stateSize) {
@@ -49,7 +49,7 @@ void threefishEncryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
     }
 }
 
-void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
+void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
                                 uint8_t* out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
@@ -60,7 +60,7 @@ void threefishDecryptBlockBytes(ThreefishKey_t* keyCtx, uint8_t* in,
     Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
 }
 
-void threefishDecryptBlockWords(ThreefishKey_t* keyCtx, uint64_t* in,
+void threefishDecryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
                                 uint64_t* out)
 {
     switch (keyCtx->stateSize) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 08/21] staging: crypto: skein: use u8, u64 vice uint*_t
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (6 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 07/21] staging: crypto: skein: remove all typedef {struct,enum} Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 09/21] staging: crypto: skein: fixup pointer whitespace Jason Cooper
                     ` (12 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skeinApi.h     |  8 ++++----
 drivers/staging/skein/include/threefishApi.h | 22 +++++++++++-----------
 drivers/staging/skein/skeinApi.c             | 22 +++++++++++-----------
 drivers/staging/skein/threefish1024Block.c   | 18 +++++++++---------
 drivers/staging/skein/threefish256Block.c    | 18 +++++++++---------
 drivers/staging/skein/threefish512Block.c    | 18 +++++++++---------
 drivers/staging/skein/threefishApi.c         | 20 ++++++++++----------
 7 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 548c639431de..4ad294f7945d 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -173,7 +173,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinMacInit(struct skein_ctx* ctx, const uint8_t *key, size_t keyLen,
+    int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
                      size_t hashBitLen);
 
     /**
@@ -188,7 +188,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     Success or error code.
      */
-    int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
+    int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
                     size_t msgByteCnt);
 
     /**
@@ -204,7 +204,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @param msgBitCnt
      *     Length of the message in @b bits.
      */
-    int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
+    int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
                         size_t msgBitCnt);
 
     /**
@@ -222,7 +222,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     Success or error code.
      * @see skeinReset
      */
-    int skeinFinal(struct skein_ctx* ctx, uint8_t* hash);
+    int skeinFinal(struct skein_ctx* ctx, u8* hash);
 
 /**
  * @}
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 4c1cd81f30c4..194e313b6b62 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -72,7 +72,7 @@
      * @param tweak
      *     Pointer to the two tweak words (word has 64 bits).
      */
-    void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize, uint64_t* keyData, uint64_t* tweak);
+    void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize, u64* keyData, u64* tweak);
     
     /**
      * Encrypt Threefisch block (bytes).
@@ -89,7 +89,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in, uint8_t* out);
+    void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in, u8* out);
     
     /**
      * Encrypt Threefisch block (words).
@@ -108,7 +108,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in, uint64_t* out);
+    void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in, u64* out);
 
     /**
      * Decrypt Threefisch block (bytes).
@@ -125,7 +125,7 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in, uint8_t* out);
+    void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in, u8* out);
 
     /**
      * Decrypt Threefisch block (words).
@@ -144,14 +144,14 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockWords(struct threefish_key* keyCtx, uint64_t* in, uint64_t* out);
+    void threefishDecryptBlockWords(struct threefish_key* keyCtx, u64* in, u64* out);
 
-    void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
-    void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output);
+    void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishDecrypt256(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishDecrypt512(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishDecrypt1024(struct threefish_key* keyCtx, u64* input, u64* output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index ce5c5ae575e7..6bd2da0eaa5f 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -42,7 +42,7 @@ int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
     int ret = SKEIN_FAIL;
     size_t Xlen = 0;
     u64*  X = NULL;
-    uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+    u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
     Skein_Assert(ctx, SKEIN_FAIL);
     /*
@@ -78,13 +78,13 @@ int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
     return ret;
 }
 
-int skeinMacInit(struct skein_ctx* ctx, const uint8_t *key, size_t keyLen,
+int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
                  size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
     u64*  X = NULL;
     size_t Xlen = 0;
-    uint64_t treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+    u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
     Skein_Assert(ctx, SKEIN_FAIL);
 
@@ -138,7 +138,7 @@ void skeinReset(struct skein_ctx* ctx)
     Skein_Start_New_Type(&ctx->m, MSG);
 }
 
-int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
+int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
                 size_t msgByteCnt)
 {
     int ret = SKEIN_FAIL;
@@ -159,7 +159,7 @@ int skeinUpdate(struct skein_ctx *ctx, const uint8_t *msg,
 
 }
 
-int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
+int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
                     size_t msgBitCnt)
 {
     /*
@@ -168,8 +168,8 @@ int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
      * arithmetic.
      */
     size_t length;
-    uint8_t mask;
-    uint8_t* up;
+    u8 mask;
+    u8* up;
 
     /* only the final Update() call is allowed do partial bytes, else assert an error */
     Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
@@ -186,20 +186,20 @@ int skeinUpdateBits(struct skein_ctx *ctx, const uint8_t *msg,
      * Skein's real partial block buffer.
      * If this layout ever changes we have to adapt this as well.
      */
-    up = (uint8_t*)ctx->m.s256.X + ctx->skeinSize / 8;
+    up = (u8*)ctx->m.s256.X + ctx->skeinSize / 8;
 
     Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
 
     /* now "pad" the final partial byte the way NIST likes */
     length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
     Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
-    mask = (uint8_t) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
-    up[length-1]  = (uint8_t)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
+    mask = (u8) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
+    up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
 
     return SKEIN_SUCCESS;
 }
 
-int skeinFinal(struct skein_ctx* ctx, uint8_t* hash)
+int skeinFinal(struct skein_ctx* ctx, u8* hash)
 {
     int ret = SKEIN_FAIL;
     Skein_Assert(ctx, SKEIN_FAIL);
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 738ec523406b..9e821fcdb067 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -2,10 +2,10 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
         {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3],
       b4 = input[4], b5 = input[5],
       b6 = input[6], b7 = input[7],
@@ -13,7 +13,7 @@ void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
       b10 = input[10], b11 = input[11],
       b12 = input[12], b13 = input[13],
       b14 = input[14], b15 = input[15];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4], k5 = keyCtx->key[5],
       k6 = keyCtx->key[6], k7 = keyCtx->key[7],
@@ -22,7 +22,7 @@ void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
       k12 = keyCtx->key[12], k13 = keyCtx->key[13],
       k14 = keyCtx->key[14], k15 = keyCtx->key[15],
       k16 = keyCtx->key[16];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
             b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
@@ -684,10 +684,10 @@ void threefishEncrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
             output[15] = b15 + k1 + 20;
         }
 
-void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
 {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3],
       b4 = input[4], b5 = input[5],
       b6 = input[6], b7 = input[7],
@@ -695,7 +695,7 @@ void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
       b10 = input[10], b11 = input[11],
       b12 = input[12], b13 = input[13],
       b14 = input[14], b15 = input[15];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4], k5 = keyCtx->key[5],
       k6 = keyCtx->key[6], k7 = keyCtx->key[7],
@@ -704,9 +704,9 @@ void threefishDecrypt1024(struct threefish_key* keyCtx, uint64_t* input, uint64_
       k12 = keyCtx->key[12], k13 = keyCtx->key[13],
       k14 = keyCtx->key[14], k15 = keyCtx->key[15],
       k16 = keyCtx->key[16];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
-    uint64_t tmp;
+    u64 tmp;
 
             b0 -= k3;
             b1 -= k4;
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index b81cb3a65b04..68ac4c50f01e 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -2,15 +2,15 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
   {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
     b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
@@ -172,17 +172,17 @@ void threefishEncrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t
     output[3] = b3 + k1 + 18;
   }
 
-void threefishDecrypt256(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
   {
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
-    uint64_t tmp;
+    u64 tmp;
 
     b0 -= k3;
     b1 -= k4 + t0;
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 7eed6aeb3742..e94bb93722df 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -2,19 +2,19 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
     {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3],
       b4 = input[4], b5 = input[5],
       b6 = input[6], b7 = input[7];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4], k5 = keyCtx->key[5],
       k6 = keyCtx->key[6], k7 = keyCtx->key[7],
       k8 = keyCtx->key[8];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
         b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
@@ -316,22 +316,22 @@ void threefishEncrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t
         output[7] = b7 + k7 + 18;
     }
 
-void threefishDecrypt512(struct threefish_key* keyCtx, uint64_t* input, uint64_t* output)
+void threefishDecrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
     {
 
-    uint64_t b0 = input[0], b1 = input[1],
+    u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3],
       b4 = input[4], b5 = input[5],
       b6 = input[6], b7 = input[7];
-    uint64_t k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
       k2 = keyCtx->key[2], k3 = keyCtx->key[3],
       k4 = keyCtx->key[4], k5 = keyCtx->key[5],
       k6 = keyCtx->key[6], k7 = keyCtx->key[7],
       k8 = keyCtx->key[8];
-    uint64_t t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
       t2 = keyCtx->tweak[2];
 
-      uint64_t tmp;
+      u64 tmp;
 
         b0 -= k0;
         b1 -= k1;
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 5cd3eb9bd9f2..37f96215159d 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -4,11 +4,11 @@
 #include <threefishApi.h>
 
 void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize,
-                     uint64_t* keyData, uint64_t* tweak)
+                     u64* keyData, u64* tweak)
 {
     int keyWords = stateSize / 64;
     int i;
-    uint64_t parity = KeyScheduleConst;
+    u64 parity = KeyScheduleConst;
 
     keyCtx->tweak[0] = tweak[0];
     keyCtx->tweak[1] = tweak[1];
@@ -22,8 +22,8 @@ void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize
     keyCtx->stateSize = stateSize;
 }
 
-void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
-                                uint8_t* out)
+void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in,
+                                u8* out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
     u64 cipher[SKEIN_MAX_STATE_WORDS];
@@ -33,8 +33,8 @@ void threefishEncryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
     Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
 }
 
-void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
-                                uint64_t* out)
+void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in,
+                                u64* out)
 {
     switch (keyCtx->stateSize) {
         case Threefish256:
@@ -49,8 +49,8 @@ void threefishEncryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
     }
 }
 
-void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
-                                uint8_t* out)
+void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in,
+                                u8* out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
     u64 cipher[SKEIN_MAX_STATE_WORDS];
@@ -60,8 +60,8 @@ void threefishDecryptBlockBytes(struct threefish_key* keyCtx, uint8_t* in,
     Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
 }
 
-void threefishDecryptBlockWords(struct threefish_key* keyCtx, uint64_t* in,
-                                uint64_t* out)
+void threefishDecryptBlockWords(struct threefish_key* keyCtx, u64* in,
+                                u64* out)
 {
     switch (keyCtx->stateSize) {
         case Threefish256:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 09/21] staging: crypto: skein: fixup pointer whitespace
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (7 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 08/21] staging: crypto: skein: use u8, u64 vice uint*_t Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 10/21] staging: crypto: skein: cleanup whitespace around operators/punc Jason Cooper
                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 18 +++++++++---------
 drivers/staging/skein/include/skeinApi.h     | 10 +++++-----
 drivers/staging/skein/include/threefishApi.h | 22 +++++++++++-----------
 drivers/staging/skein/skeinApi.c             | 18 +++++++++---------
 drivers/staging/skein/threefish1024Block.c   |  4 ++--
 drivers/staging/skein/threefish256Block.c    |  4 ++--
 drivers/staging/skein/threefish512Block.c    |  4 ++--
 drivers/staging/skein/threefishApi.c         | 20 ++++++++++----------
 8 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 77b712e73253..b7cd6c0cef2f 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -100,9 +100,9 @@ int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCn
 int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 
-int  Skein_256_Final (struct skein_256_ctx *ctx, u8 * hashVal);
-int  Skein_512_Final (struct skein_512_ctx *ctx, u8 * hashVal);
-int  Skein1024_Final (struct skein1024_ctx *ctx, u8 * hashVal);
+int  Skein_256_Final (struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final (struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final (struct skein1024_ctx *ctx, u8 *hashVal);
 
 /*
 **   Skein APIs for "extended" initialization: MAC keys, tree hashing.
@@ -127,17 +127,17 @@ int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInf
 **      Final_Pad:  pad, do final block, but no OUTPUT type
 **      Output:     do just the output stage
 */
-int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 * hashVal);
-int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 * hashVal);
-int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 * hashVal);
+int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal);
 
 #ifndef SKEIN_TREE_HASH
 #define SKEIN_TREE_HASH (1)
 #endif
 #if  SKEIN_TREE_HASH
-int  Skein_256_Output   (struct skein_256_ctx *ctx, u8 * hashVal);
-int  Skein_512_Output   (struct skein_512_ctx *ctx, u8 * hashVal);
-int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 * hashVal);
+int  Skein_256_Output   (struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Output   (struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
 #endif
 
 /*****************************************************************
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 4ad294f7945d..2c52797918cf 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -123,7 +123,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size);
+    int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size);
 
     /**
      * Initialize a Skein context.
@@ -139,7 +139,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     SKEIN_SUCESS of SKEIN_FAIL
      * @see skeinReset
      */
-    int skeinInit(struct skein_ctx* ctx, size_t hashBitLen);
+    int skeinInit(struct skein_ctx *ctx, size_t hashBitLen);
 
     /**
      * Resets a Skein context for further use.
@@ -151,7 +151,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @param ctx
      *     Pointer to a pre-initialized Skein MAC context
      */
-    void skeinReset(struct skein_ctx* ctx);
+    void skeinReset(struct skein_ctx *ctx);
     
     /**
      * Initializes a Skein context for MAC usage.
@@ -173,7 +173,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      * @return
      *     SKEIN_SUCESS of SKEIN_FAIL
      */
-    int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
+    int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
                      size_t hashBitLen);
 
     /**
@@ -222,7 +222,7 @@ OTHER DEALINGS IN THE SOFTWARE.
      *     Success or error code.
      * @see skeinReset
      */
-    int skeinFinal(struct skein_ctx* ctx, u8* hash);
+    int skeinFinal(struct skein_ctx *ctx, u8 *hash);
 
 /**
  * @}
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 194e313b6b62..1f9e6e14f50b 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -72,7 +72,7 @@
      * @param tweak
      *     Pointer to the two tweak words (word has 64 bits).
      */
-    void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize, u64* keyData, u64* tweak);
+    void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize, u64 *keyData, u64 *tweak);
     
     /**
      * Encrypt Threefisch block (bytes).
@@ -89,7 +89,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in, u8* out);
+    void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
     
     /**
      * Encrypt Threefisch block (words).
@@ -108,7 +108,7 @@
      * @param out
      *     Pointer to cipher buffer.
      */
-    void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in, u64* out);
+    void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
 
     /**
      * Decrypt Threefisch block (bytes).
@@ -125,7 +125,7 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in, u8* out);
+    void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
     /**
      * Decrypt Threefisch block (words).
@@ -144,14 +144,14 @@
      * @param out
      *     Pointer to plaintext buffer.
      */
-    void threefishDecryptBlockWords(struct threefish_key* keyCtx, u64* in, u64* out);
+    void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
 
-    void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishDecrypt256(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishDecrypt512(struct threefish_key* keyCtx, u64* input, u64* output);
-    void threefishDecrypt1024(struct threefish_key* keyCtx, u64* input, u64* output);
+    void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
+    void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 6bd2da0eaa5f..df92806c4ec4 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -27,7 +27,7 @@ OTHER DEALINGS IN THE SOFTWARE.
 #include <linux/string.h>
 #include <skeinApi.h>
 
-int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size)
+int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size)
 {
     Skein_Assert(ctx && size, SKEIN_FAIL);
 
@@ -37,11 +37,11 @@ int skeinCtxPrepare(struct skein_ctx* ctx, enum skein_size size)
     return SKEIN_SUCCESS;
 }
 
-int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
+int skeinInit(struct skein_ctx *ctx, size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
     size_t Xlen = 0;
-    u64*  X = NULL;
+    u64 *X = NULL;
     u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
     Skein_Assert(ctx, SKEIN_FAIL);
@@ -78,11 +78,11 @@ int skeinInit(struct skein_ctx* ctx, size_t hashBitLen)
     return ret;
 }
 
-int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
+int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
                  size_t hashBitLen)
 {
     int ret = SKEIN_FAIL;
-    u64*  X = NULL;
+    u64 *X = NULL;
     size_t Xlen = 0;
     u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
@@ -119,10 +119,10 @@ int skeinMacInit(struct skein_ctx* ctx, const u8 *key, size_t keyLen,
     return ret;
 }
 
-void skeinReset(struct skein_ctx* ctx)
+void skeinReset(struct skein_ctx *ctx)
 {
     size_t Xlen = 0;
-    u64*  X = NULL;
+    u64 *X = NULL;
 
     /*
      * The following two lines rely of the fact that the real Skein contexts are
@@ -169,7 +169,7 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
      */
     size_t length;
     u8 mask;
-    u8* up;
+    u8 *up;
 
     /* only the final Update() call is allowed do partial bytes, else assert an error */
     Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
@@ -199,7 +199,7 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
     return SKEIN_SUCCESS;
 }
 
-int skeinFinal(struct skein_ctx* ctx, u8* hash)
+int skeinFinal(struct skein_ctx *ctx, u8 *hash)
 {
     int ret = SKEIN_FAIL;
     Skein_Assert(ctx, SKEIN_FAIL);
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 9e821fcdb067..e3be37ea8024 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
         {
 
     u64 b0 = input[0], b1 = input[1],
@@ -684,7 +684,7 @@ void threefishEncrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
             output[15] = b15 + k1 + 20;
         }
 
-void threefishDecrypt1024(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
 {
 
     u64 b0 = input[0], b1 = input[1],
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index 68ac4c50f01e..09ea5099bc76 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
   {
 
     u64 b0 = input[0], b1 = input[1],
@@ -172,7 +172,7 @@ void threefishEncrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
     output[3] = b3 + k1 + 18;
   }
 
-void threefishDecrypt256(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
   {
     u64 b0 = input[0], b1 = input[1],
       b2 = input[2], b3 = input[3];
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index e94bb93722df..5262f5a8f21b 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -2,7 +2,7 @@
 #include <threefishApi.h>
 
 
-void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
     {
 
     u64 b0 = input[0], b1 = input[1],
@@ -316,7 +316,7 @@ void threefishEncrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
         output[7] = b7 + k7 + 18;
     }
 
-void threefishDecrypt512(struct threefish_key* keyCtx, u64* input, u64* output)
+void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
     {
 
     u64 b0 = input[0], b1 = input[1],
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 37f96215159d..53f46f6cb9ca 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -3,8 +3,8 @@
 #include <linux/string.h>
 #include <threefishApi.h>
 
-void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize,
-                     u64* keyData, u64* tweak)
+void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize,
+                     u64 *keyData, u64 *tweak)
 {
     int keyWords = stateSize / 64;
     int i;
@@ -22,8 +22,8 @@ void threefishSetKey(struct threefish_key* keyCtx, enum threefish_size stateSize
     keyCtx->stateSize = stateSize;
 }
 
-void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in,
-                                u8* out)
+void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
+                                u8 *out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
     u64 cipher[SKEIN_MAX_STATE_WORDS];
@@ -33,8 +33,8 @@ void threefishEncryptBlockBytes(struct threefish_key* keyCtx, u8* in,
     Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
 }
 
-void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in,
-                                u64* out)
+void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in,
+                                u64 *out)
 {
     switch (keyCtx->stateSize) {
         case Threefish256:
@@ -49,8 +49,8 @@ void threefishEncryptBlockWords(struct threefish_key* keyCtx, u64* in,
     }
 }
 
-void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in,
-                                u8* out)
+void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
+                                u8 *out)
 {
     u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
     u64 cipher[SKEIN_MAX_STATE_WORDS];
@@ -60,8 +60,8 @@ void threefishDecryptBlockBytes(struct threefish_key* keyCtx, u8* in,
     Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
 }
 
-void threefishDecryptBlockWords(struct threefish_key* keyCtx, u64* in,
-                                u64* out)
+void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in,
+                                u64 *out)
 {
     switch (keyCtx->stateSize) {
         case Threefish256:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 10/21] staging: crypto: skein: cleanup whitespace around operators/punc.
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (8 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 09/21] staging: crypto: skein: fixup pointer whitespace Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 11/21] staging: crypto: skein: dos2unix, remove executable perms Jason Cooper
                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h    | 168 +++++-----
 drivers/staging/skein/include/skein_iv.h | 224 +++++++-------
 drivers/staging/skein/skein.c            | 352 ++++++++++-----------
 drivers/staging/skein/skeinApi.c         |  22 +-
 drivers/staging/skein/skeinBlockNo3F.c   |  20 +-
 drivers/staging/skein/skein_block.c      | 513 +++++++++++++++----------------
 6 files changed, 648 insertions(+), 651 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index b7cd6c0cef2f..fef29ad64c93 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -29,12 +29,12 @@
 ***************************************************************************/
 
 #ifndef RotL_64
-#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
+#define RotL_64(x, N)    (((x) << (N)) | ((x) >> (64-(N))))
 #endif
 
 /* below two prototype assume we are handed aligned data */
-#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
-#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
+#define Skein_Put64_LSB_First(dst08, src64, bCnt) memcpy(dst08, src64, bCnt)
+#define Skein_Get64_LSB_First(dst64, src08, wCnt) memcpy(dst64, src08, 8*(wCnt))
 #define Skein_Swap64(w64)  (w64)
 
 enum
@@ -44,24 +44,24 @@ enum
     SKEIN_BAD_HASHLEN     =      2
     };
 
-#define  SKEIN_MODIFIER_WORDS  ( 2)          /* number of modifier (tweak) words */
+#define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
 
-#define  SKEIN_256_STATE_WORDS ( 4)
-#define  SKEIN_512_STATE_WORDS ( 8)
+#define  SKEIN_256_STATE_WORDS  (4)
+#define  SKEIN_512_STATE_WORDS  (8)
 #define  SKEIN1024_STATE_WORDS (16)
 #define  SKEIN_MAX_STATE_WORDS (16)
 
-#define  SKEIN_256_STATE_BYTES ( 8*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_STATE_BYTES ( 8*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_STATE_BYTES ( 8*SKEIN1024_STATE_WORDS)
+#define  SKEIN_256_STATE_BYTES  (8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BYTES  (8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BYTES  (8*SKEIN1024_STATE_WORDS)
 
 #define  SKEIN_256_STATE_BITS  (64*SKEIN_256_STATE_WORDS)
 #define  SKEIN_512_STATE_BITS  (64*SKEIN_512_STATE_WORDS)
 #define  SKEIN1024_STATE_BITS  (64*SKEIN1024_STATE_WORDS)
 
-#define  SKEIN_256_BLOCK_BYTES ( 8*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_BLOCK_BYTES ( 8*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_BLOCK_BYTES ( 8*SKEIN1024_STATE_WORDS)
+#define  SKEIN_256_BLOCK_BYTES  (8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_BLOCK_BYTES  (8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
 
 struct skein_ctx_hdr
     {
@@ -92,17 +92,17 @@ struct skein1024_ctx                              /* 1024-bit Skein hash context
     };
 
 /*   Skein APIs for (incremental) "straight hashing" */
-int  Skein_256_Init  (struct skein_256_ctx *ctx, size_t hashBitLen);
-int  Skein_512_Init  (struct skein_512_ctx *ctx, size_t hashBitLen);
-int  Skein1024_Init  (struct skein1024_ctx *ctx, size_t hashBitLen);
+int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
+int  Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen);
+int  Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen);
 
 int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
 
-int  Skein_256_Final (struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Final (struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Final (struct skein1024_ctx *ctx, u8 *hashVal);
+int  Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
 
 /*
 **   Skein APIs for "extended" initialization: MAC keys, tree hashing.
@@ -135,9 +135,9 @@ int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_TREE_HASH (1)
 #endif
 #if  SKEIN_TREE_HASH
-int  Skein_256_Output   (struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Output   (struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
+int  Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #endif
 
 /*****************************************************************
@@ -158,18 +158,18 @@ int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
                                 
 /* tweak word T[1]: flag bit definition(s) */
-#define SKEIN_T1_FLAG_FIRST     (((u64)  1 ) << SKEIN_T1_POS_FIRST)
-#define SKEIN_T1_FLAG_FINAL     (((u64)  1 ) << SKEIN_T1_POS_FINAL)
-#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1 ) << SKEIN_T1_POS_BIT_PAD)
+#define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
+#define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
+#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
                                 
 /* tweak word T[1]: tree level bit field mask */
 #define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
 #define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
 
 /* tweak word T[1]: block type field */
-#define SKEIN_BLK_TYPE_KEY      ( 0)                    /* key, for MAC and KDF */
-#define SKEIN_BLK_TYPE_CFG      ( 4)                    /* configuration block */
-#define SKEIN_BLK_TYPE_PERS     ( 8)                    /* personalization string */
+#define SKEIN_BLK_TYPE_KEY       (0)                    /* key, for MAC and KDF */
+#define SKEIN_BLK_TYPE_CFG       (4)                    /* configuration block */
+#define SKEIN_BLK_TYPE_PERS      (8)                    /* personalization string */
 #define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
 #define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
 #define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
@@ -197,73 +197,73 @@ int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
 #endif
 
-#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64) (hi32)) << 32))
-#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION,SKEIN_ID_STRING_LE)
-#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA,0xA9FC1A22)
+#define SKEIN_MK_64(hi32, lo32)  ((lo32) + (((u64) (hi32)) << 32))
+#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION, SKEIN_ID_STRING_LE)
+#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA, 0xA9FC1A22)
 
 #define SKEIN_CFG_STR_LEN       (4*8)
 
 /* bit field definitions in config block treeInfo word */
-#define SKEIN_CFG_TREE_LEAF_SIZE_POS  ( 0)
-#define SKEIN_CFG_TREE_NODE_SIZE_POS  ( 8)
+#define SKEIN_CFG_TREE_LEAF_SIZE_POS  (0)
+#define SKEIN_CFG_TREE_NODE_SIZE_POS  (8)
 #define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
 
 #define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
 #define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
 #define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
 
-#define SKEIN_CFG_TREE_INFO(leaf,node,maxLvl)                   \
-    ( (((u64)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
-      (((u64)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
-      (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )
+#define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
+    ((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+     (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+     (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
 
-#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0,0,0) /* use as treeInfo in InitExt() call for sequential processing */
+#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
 
 /*
 **   Skein macros for getting/setting tweak words, etc.
 **   These are useful for partial input bytes, hash tree init/update, etc.
 **/
-#define Skein_Get_Tweak(ctxPtr,TWK_NUM)         ((ctxPtr)->h.T[TWK_NUM])
-#define Skein_Set_Tweak(ctxPtr,TWK_NUM,tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal);}
+#define Skein_Get_Tweak(ctxPtr, TWK_NUM)          ((ctxPtr)->h.T[TWK_NUM])
+#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal); }
 
-#define Skein_Get_T0(ctxPtr)    Skein_Get_Tweak(ctxPtr,0)
-#define Skein_Get_T1(ctxPtr)    Skein_Get_Tweak(ctxPtr,1)
-#define Skein_Set_T0(ctxPtr,T0) Skein_Set_Tweak(ctxPtr,0,T0)
-#define Skein_Set_T1(ctxPtr,T1) Skein_Set_Tweak(ctxPtr,1,T1)
+#define Skein_Get_T0(ctxPtr)     Skein_Get_Tweak(ctxPtr, 0)
+#define Skein_Get_T1(ctxPtr)     Skein_Get_Tweak(ctxPtr, 1)
+#define Skein_Set_T0(ctxPtr, T0) Skein_Set_Tweak(ctxPtr, 0, T0)
+#define Skein_Set_T1(ctxPtr, T1) Skein_Set_Tweak(ctxPtr, 1, T1)
 
 /* set both tweak words at once */
-#define Skein_Set_T0_T1(ctxPtr,T0,T1)           \
+#define Skein_Set_T0_T1(ctxPtr, T0, T1)           \
     {                                           \
-    Skein_Set_T0(ctxPtr,(T0));                  \
-    Skein_Set_T1(ctxPtr,(T1));                  \
+    Skein_Set_T0(ctxPtr, (T0));                  \
+    Skein_Set_T1(ctxPtr, (T1));                  \
     }
 
-#define Skein_Set_Type(ctxPtr,BLK_TYPE)         \
-    Skein_Set_T1(ctxPtr,SKEIN_T1_BLK_TYPE_##BLK_TYPE)
+#define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
+    Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
 
 /* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
-#define Skein_Start_New_Type(ctxPtr,BLK_TYPE)   \
-    { Skein_Set_T0_T1(ctxPtr,0,SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt=0; }
+#define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
+    { Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
 
 #define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
 #define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
 
-#define Skein_Set_Tree_Level(hdr,height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height);}
+#define Skein_Set_Tree_Level(hdr, height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); }
 
 /*****************************************************************
 ** "Internal" Skein definitions for debugging and error checking
 ******************************************************************/
-#ifdef  SKEIN_DEBUG             /* examine/display intermediate values? */
+#ifdef SKEIN_DEBUG             /* examine/display intermediate values? */
 #include "skein_debug.h"
 #else                           /* default is no callouts */
-#define Skein_Show_Block(bits,ctx,X,blkPtr,wPtr,ksEvenPtr,ksOddPtr)
-#define Skein_Show_Round(bits,ctx,r,X)
-#define Skein_Show_R_Ptr(bits,ctx,r,X_ptr)
-#define Skein_Show_Final(bits,ctx,cnt,outPtr)
-#define Skein_Show_Key(bits,ctx,key,keyBytes)
+#define Skein_Show_Block(bits, ctx, X, blkPtr, wPtr, ksEvenPtr, ksOddPtr)
+#define Skein_Show_Round(bits, ctx, r, X)
+#define Skein_Show_R_Ptr(bits, ctx, r, X_ptr)
+#define Skein_Show_Final(bits, ctx, cnt, outPtr)
+#define Skein_Show_Key(bits, ctx, key, keyBytes)
 #endif
 
-#define Skein_Assert(x,retCode)/* default: ignore all Asserts, for performance */
+#define Skein_Assert(x, retCode)/* default: ignore all Asserts, for performance */
 #define Skein_assert(x)
 
 /*****************************************************************
@@ -272,34 +272,34 @@ int  Skein1024_Output   (struct skein1024_ctx *ctx, u8 *hashVal);
 enum    
     {   
         /* Skein_256 round rotation constants */
-    R_256_0_0=14, R_256_0_1=16,
-    R_256_1_0=52, R_256_1_1=57,
-    R_256_2_0=23, R_256_2_1=40,
-    R_256_3_0= 5, R_256_3_1=37,
-    R_256_4_0=25, R_256_4_1=33,
-    R_256_5_0=46, R_256_5_1=12,
-    R_256_6_0=58, R_256_6_1=22,
-    R_256_7_0=32, R_256_7_1=32,
+    R_256_0_0 = 14, R_256_0_1 = 16,
+    R_256_1_0 = 52, R_256_1_1 = 57,
+    R_256_2_0 = 23, R_256_2_1 = 40,
+    R_256_3_0 =  5, R_256_3_1 = 37,
+    R_256_4_0 = 25, R_256_4_1 = 33,
+    R_256_5_0 = 46, R_256_5_1 = 12,
+    R_256_6_0 = 58, R_256_6_1 = 22,
+    R_256_7_0 = 32, R_256_7_1 = 32,
 
         /* Skein_512 round rotation constants */
-    R_512_0_0=46, R_512_0_1=36, R_512_0_2=19, R_512_0_3=37,
-    R_512_1_0=33, R_512_1_1=27, R_512_1_2=14, R_512_1_3=42,
-    R_512_2_0=17, R_512_2_1=49, R_512_2_2=36, R_512_2_3=39,
-    R_512_3_0=44, R_512_3_1= 9, R_512_3_2=54, R_512_3_3=56,
-    R_512_4_0=39, R_512_4_1=30, R_512_4_2=34, R_512_4_3=24,
-    R_512_5_0=13, R_512_5_1=50, R_512_5_2=10, R_512_5_3=17,
-    R_512_6_0=25, R_512_6_1=29, R_512_6_2=39, R_512_6_3=43,
-    R_512_7_0= 8, R_512_7_1=35, R_512_7_2=56, R_512_7_3=22,
+    R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
+    R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
+    R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
+    R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
+    R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
+    R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
+    R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
+    R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
 
         /* Skein1024 round rotation constants */
-    R1024_0_0=24, R1024_0_1=13, R1024_0_2= 8, R1024_0_3=47, R1024_0_4= 8, R1024_0_5=17, R1024_0_6=22, R1024_0_7=37,
-    R1024_1_0=38, R1024_1_1=19, R1024_1_2=10, R1024_1_3=55, R1024_1_4=49, R1024_1_5=18, R1024_1_6=23, R1024_1_7=52,
-    R1024_2_0=33, R1024_2_1= 4, R1024_2_2=51, R1024_2_3=13, R1024_2_4=34, R1024_2_5=41, R1024_2_6=59, R1024_2_7=17,
-    R1024_3_0= 5, R1024_3_1=20, R1024_3_2=48, R1024_3_3=41, R1024_3_4=47, R1024_3_5=28, R1024_3_6=16, R1024_3_7=25,
-    R1024_4_0=41, R1024_4_1= 9, R1024_4_2=37, R1024_4_3=31, R1024_4_4=12, R1024_4_5=47, R1024_4_6=44, R1024_4_7=30,
-    R1024_5_0=16, R1024_5_1=34, R1024_5_2=56, R1024_5_3=51, R1024_5_4= 4, R1024_5_5=53, R1024_5_6=42, R1024_5_7=41,
-    R1024_6_0=31, R1024_6_1=44, R1024_6_2=47, R1024_6_3=46, R1024_6_4=19, R1024_6_5=42, R1024_6_6=44, R1024_6_7=25,
-    R1024_7_0= 9, R1024_7_1=48, R1024_7_2=35, R1024_7_3=52, R1024_7_4=23, R1024_7_5=31, R1024_7_6=37, R1024_7_7=20
+    R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
+    R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
+    R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
+    R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
+    R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
+    R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
+    R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
+    R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
     };
 
 #ifndef SKEIN_ROUNDS
@@ -308,8 +308,8 @@ enum
 #define SKEIN1024_ROUNDS_TOTAL (80)
 #else                                        /* allow command-line define in range 8*(5..14)   */
 #define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
-#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/ 10) + 5) % 10) + 5))
-#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS    ) + 5) % 10) + 5))
+#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/10)  + 5) % 10) + 5))
+#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS)     + 5) % 10) + 5))
 #endif
 
 #endif  /* ifndef _SKEIN_H_ */
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index 94ac2f7cde76..aff9394551a0 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -22,178 +22,178 @@
 /* blkSize =  256 bits. hashSize =  128 bits */
 const u64 SKEIN_256_IV_128[] =
     {
-    MK_64(0xE1111906,0x964D7260),
-    MK_64(0x883DAAA7,0x7C8D811C),
-    MK_64(0x10080DF4,0x91960F7A),
-    MK_64(0xCCF7DDE5,0xB45BC1C2)
+    MK_64(0xE1111906, 0x964D7260),
+    MK_64(0x883DAAA7, 0x7C8D811C),
+    MK_64(0x10080DF4, 0x91960F7A),
+    MK_64(0xCCF7DDE5, 0xB45BC1C2)
     };
 
 /* blkSize =  256 bits. hashSize =  160 bits */
 const u64 SKEIN_256_IV_160[] =
     {
-    MK_64(0x14202314,0x72825E98),
-    MK_64(0x2AC4E9A2,0x5A77E590),
-    MK_64(0xD47A5856,0x8838D63E),
-    MK_64(0x2DD2E496,0x8586AB7D)
+    MK_64(0x14202314, 0x72825E98),
+    MK_64(0x2AC4E9A2, 0x5A77E590),
+    MK_64(0xD47A5856, 0x8838D63E),
+    MK_64(0x2DD2E496, 0x8586AB7D)
     };
 
 /* blkSize =  256 bits. hashSize =  224 bits */
 const u64 SKEIN_256_IV_224[] =
     {
-    MK_64(0xC6098A8C,0x9AE5EA0B),
-    MK_64(0x876D5686,0x08C5191C),
-    MK_64(0x99CB88D7,0xD7F53884),
-    MK_64(0x384BDDB1,0xAEDDB5DE)
+    MK_64(0xC6098A8C, 0x9AE5EA0B),
+    MK_64(0x876D5686, 0x08C5191C),
+    MK_64(0x99CB88D7, 0xD7F53884),
+    MK_64(0x384BDDB1, 0xAEDDB5DE)
     };
 
 /* blkSize =  256 bits. hashSize =  256 bits */
 const u64 SKEIN_256_IV_256[] =
     {
-    MK_64(0xFC9DA860,0xD048B449),
-    MK_64(0x2FCA6647,0x9FA7D833),
-    MK_64(0xB33BC389,0x6656840F),
-    MK_64(0x6A54E920,0xFDE8DA69)
+    MK_64(0xFC9DA860, 0xD048B449),
+    MK_64(0x2FCA6647, 0x9FA7D833),
+    MK_64(0xB33BC389, 0x6656840F),
+    MK_64(0x6A54E920, 0xFDE8DA69)
     };
 
 /* blkSize =  512 bits. hashSize =  128 bits */
 const u64 SKEIN_512_IV_128[] =
     {
-    MK_64(0xA8BC7BF3,0x6FBF9F52),
-    MK_64(0x1E9872CE,0xBD1AF0AA),
-    MK_64(0x309B1790,0xB32190D3),
-    MK_64(0xBCFBB854,0x3F94805C),
-    MK_64(0x0DA61BCD,0x6E31B11B),
-    MK_64(0x1A18EBEA,0xD46A32E3),
-    MK_64(0xA2CC5B18,0xCE84AA82),
-    MK_64(0x6982AB28,0x9D46982D)
+    MK_64(0xA8BC7BF3, 0x6FBF9F52),
+    MK_64(0x1E9872CE, 0xBD1AF0AA),
+    MK_64(0x309B1790, 0xB32190D3),
+    MK_64(0xBCFBB854, 0x3F94805C),
+    MK_64(0x0DA61BCD, 0x6E31B11B),
+    MK_64(0x1A18EBEA, 0xD46A32E3),
+    MK_64(0xA2CC5B18, 0xCE84AA82),
+    MK_64(0x6982AB28, 0x9D46982D)
     };
 
 /* blkSize =  512 bits. hashSize =  160 bits */
 const u64 SKEIN_512_IV_160[] =
     {
-    MK_64(0x28B81A2A,0xE013BD91),
-    MK_64(0xC2F11668,0xB5BDF78F),
-    MK_64(0x1760D8F3,0xF6A56F12),
-    MK_64(0x4FB74758,0x8239904F),
-    MK_64(0x21EDE07F,0x7EAF5056),
-    MK_64(0xD908922E,0x63ED70B8),
-    MK_64(0xB8EC76FF,0xECCB52FA),
-    MK_64(0x01A47BB8,0xA3F27A6E)
+    MK_64(0x28B81A2A, 0xE013BD91),
+    MK_64(0xC2F11668, 0xB5BDF78F),
+    MK_64(0x1760D8F3, 0xF6A56F12),
+    MK_64(0x4FB74758, 0x8239904F),
+    MK_64(0x21EDE07F, 0x7EAF5056),
+    MK_64(0xD908922E, 0x63ED70B8),
+    MK_64(0xB8EC76FF, 0xECCB52FA),
+    MK_64(0x01A47BB8, 0xA3F27A6E)
     };
 
 /* blkSize =  512 bits. hashSize =  224 bits */
 const u64 SKEIN_512_IV_224[] =
     {
-    MK_64(0xCCD06162,0x48677224),
-    MK_64(0xCBA65CF3,0xA92339EF),
-    MK_64(0x8CCD69D6,0x52FF4B64),
-    MK_64(0x398AED7B,0x3AB890B4),
-    MK_64(0x0F59D1B1,0x457D2BD0),
-    MK_64(0x6776FE65,0x75D4EB3D),
-    MK_64(0x99FBC70E,0x997413E9),
-    MK_64(0x9E2CFCCF,0xE1C41EF7)
+    MK_64(0xCCD06162, 0x48677224),
+    MK_64(0xCBA65CF3, 0xA92339EF),
+    MK_64(0x8CCD69D6, 0x52FF4B64),
+    MK_64(0x398AED7B, 0x3AB890B4),
+    MK_64(0x0F59D1B1, 0x457D2BD0),
+    MK_64(0x6776FE65, 0x75D4EB3D),
+    MK_64(0x99FBC70E, 0x997413E9),
+    MK_64(0x9E2CFCCF, 0xE1C41EF7)
     };
 
 /* blkSize =  512 bits. hashSize =  256 bits */
 const u64 SKEIN_512_IV_256[] =
     {
-    MK_64(0xCCD044A1,0x2FDB3E13),
-    MK_64(0xE8359030,0x1A79A9EB),
-    MK_64(0x55AEA061,0x4F816E6F),
-    MK_64(0x2A2767A4,0xAE9B94DB),
-    MK_64(0xEC06025E,0x74DD7683),
-    MK_64(0xE7A436CD,0xC4746251),
-    MK_64(0xC36FBAF9,0x393AD185),
-    MK_64(0x3EEDBA18,0x33EDFC13)
+    MK_64(0xCCD044A1, 0x2FDB3E13),
+    MK_64(0xE8359030, 0x1A79A9EB),
+    MK_64(0x55AEA061, 0x4F816E6F),
+    MK_64(0x2A2767A4, 0xAE9B94DB),
+    MK_64(0xEC06025E, 0x74DD7683),
+    MK_64(0xE7A436CD, 0xC4746251),
+    MK_64(0xC36FBAF9, 0x393AD185),
+    MK_64(0x3EEDBA18, 0x33EDFC13)
     };
 
 /* blkSize =  512 bits. hashSize =  384 bits */
 const u64 SKEIN_512_IV_384[] =
     {
-    MK_64(0xA3F6C6BF,0x3A75EF5F),
-    MK_64(0xB0FEF9CC,0xFD84FAA4),
-    MK_64(0x9D77DD66,0x3D770CFE),
-    MK_64(0xD798CBF3,0xB468FDDA),
-    MK_64(0x1BC4A666,0x8A0E4465),
-    MK_64(0x7ED7D434,0xE5807407),
-    MK_64(0x548FC1AC,0xD4EC44D6),
-    MK_64(0x266E1754,0x6AA18FF8)
+    MK_64(0xA3F6C6BF, 0x3A75EF5F),
+    MK_64(0xB0FEF9CC, 0xFD84FAA4),
+    MK_64(0x9D77DD66, 0x3D770CFE),
+    MK_64(0xD798CBF3, 0xB468FDDA),
+    MK_64(0x1BC4A666, 0x8A0E4465),
+    MK_64(0x7ED7D434, 0xE5807407),
+    MK_64(0x548FC1AC, 0xD4EC44D6),
+    MK_64(0x266E1754, 0x6AA18FF8)
     };
 
 /* blkSize =  512 bits. hashSize =  512 bits */
 const u64 SKEIN_512_IV_512[] =
     {
-    MK_64(0x4903ADFF,0x749C51CE),
-    MK_64(0x0D95DE39,0x9746DF03),
-    MK_64(0x8FD19341,0x27C79BCE),
-    MK_64(0x9A255629,0xFF352CB1),
-    MK_64(0x5DB62599,0xDF6CA7B0),
-    MK_64(0xEABE394C,0xA9D5C3F4),
-    MK_64(0x991112C7,0x1A75B523),
-    MK_64(0xAE18A40B,0x660FCC33)
+    MK_64(0x4903ADFF, 0x749C51CE),
+    MK_64(0x0D95DE39, 0x9746DF03),
+    MK_64(0x8FD19341, 0x27C79BCE),
+    MK_64(0x9A255629, 0xFF352CB1),
+    MK_64(0x5DB62599, 0xDF6CA7B0),
+    MK_64(0xEABE394C, 0xA9D5C3F4),
+    MK_64(0x991112C7, 0x1A75B523),
+    MK_64(0xAE18A40B, 0x660FCC33)
     };
 
 /* blkSize = 1024 bits. hashSize =  384 bits */
 const u64 SKEIN1024_IV_384[] =
     {
-    MK_64(0x5102B6B8,0xC1894A35),
-    MK_64(0xFEEBC9E3,0xFE8AF11A),
-    MK_64(0x0C807F06,0xE32BED71),
-    MK_64(0x60C13A52,0xB41A91F6),
-    MK_64(0x9716D35D,0xD4917C38),
-    MK_64(0xE780DF12,0x6FD31D3A),
-    MK_64(0x797846B6,0xC898303A),
-    MK_64(0xB172C2A8,0xB3572A3B),
-    MK_64(0xC9BC8203,0xA6104A6C),
-    MK_64(0x65909338,0xD75624F4),
-    MK_64(0x94BCC568,0x4B3F81A0),
-    MK_64(0x3EBBF51E,0x10ECFD46),
-    MK_64(0x2DF50F0B,0xEEB08542),
-    MK_64(0x3B5A6530,0x0DBC6516),
-    MK_64(0x484B9CD2,0x167BBCE1),
-    MK_64(0x2D136947,0xD4CBAFEA)
+    MK_64(0x5102B6B8, 0xC1894A35),
+    MK_64(0xFEEBC9E3, 0xFE8AF11A),
+    MK_64(0x0C807F06, 0xE32BED71),
+    MK_64(0x60C13A52, 0xB41A91F6),
+    MK_64(0x9716D35D, 0xD4917C38),
+    MK_64(0xE780DF12, 0x6FD31D3A),
+    MK_64(0x797846B6, 0xC898303A),
+    MK_64(0xB172C2A8, 0xB3572A3B),
+    MK_64(0xC9BC8203, 0xA6104A6C),
+    MK_64(0x65909338, 0xD75624F4),
+    MK_64(0x94BCC568, 0x4B3F81A0),
+    MK_64(0x3EBBF51E, 0x10ECFD46),
+    MK_64(0x2DF50F0B, 0xEEB08542),
+    MK_64(0x3B5A6530, 0x0DBC6516),
+    MK_64(0x484B9CD2, 0x167BBCE1),
+    MK_64(0x2D136947, 0xD4CBAFEA)
     };
 
 /* blkSize = 1024 bits. hashSize =  512 bits */
 const u64 SKEIN1024_IV_512[] =
     {
-    MK_64(0xCAEC0E5D,0x7C1B1B18),
-    MK_64(0xA01B0E04,0x5F03E802),
-    MK_64(0x33840451,0xED912885),
-    MK_64(0x374AFB04,0xEAEC2E1C),
-    MK_64(0xDF25A0E2,0x813581F7),
-    MK_64(0xE4004093,0x8B12F9D2),
-    MK_64(0xA662D539,0xC2ED39B6),
-    MK_64(0xFA8B85CF,0x45D8C75A),
-    MK_64(0x8316ED8E,0x29EDE796),
-    MK_64(0x053289C0,0x2E9F91B8),
-    MK_64(0xC3F8EF1D,0x6D518B73),
-    MK_64(0xBDCEC3C4,0xD5EF332E),
-    MK_64(0x549A7E52,0x22974487),
-    MK_64(0x67070872,0x5B749816),
-    MK_64(0xB9CD28FB,0xF0581BD1),
-    MK_64(0x0E2940B8,0x15804974)
+    MK_64(0xCAEC0E5D, 0x7C1B1B18),
+    MK_64(0xA01B0E04, 0x5F03E802),
+    MK_64(0x33840451, 0xED912885),
+    MK_64(0x374AFB04, 0xEAEC2E1C),
+    MK_64(0xDF25A0E2, 0x813581F7),
+    MK_64(0xE4004093, 0x8B12F9D2),
+    MK_64(0xA662D539, 0xC2ED39B6),
+    MK_64(0xFA8B85CF, 0x45D8C75A),
+    MK_64(0x8316ED8E, 0x29EDE796),
+    MK_64(0x053289C0, 0x2E9F91B8),
+    MK_64(0xC3F8EF1D, 0x6D518B73),
+    MK_64(0xBDCEC3C4, 0xD5EF332E),
+    MK_64(0x549A7E52, 0x22974487),
+    MK_64(0x67070872, 0x5B749816),
+    MK_64(0xB9CD28FB, 0xF0581BD1),
+    MK_64(0x0E2940B8, 0x15804974)
     };
 
 /* blkSize = 1024 bits. hashSize = 1024 bits */
 const u64 SKEIN1024_IV_1024[] =
     {
-    MK_64(0xD593DA07,0x41E72355),
-    MK_64(0x15B5E511,0xAC73E00C),
-    MK_64(0x5180E5AE,0xBAF2C4F0),
-    MK_64(0x03BD41D3,0xFCBCAFAF),
-    MK_64(0x1CAEC6FD,0x1983A898),
-    MK_64(0x6E510B8B,0xCDD0589F),
-    MK_64(0x77E2BDFD,0xC6394ADA),
-    MK_64(0xC11E1DB5,0x24DCB0A3),
-    MK_64(0xD6D14AF9,0xC6329AB5),
-    MK_64(0x6A9B0BFC,0x6EB67E0D),
-    MK_64(0x9243C60D,0xCCFF1332),
-    MK_64(0x1A1F1DDE,0x743F02D4),
-    MK_64(0x0996753C,0x10ED0BB8),
-    MK_64(0x6572DD22,0xF2B4969A),
-    MK_64(0x61FD3062,0xD00A579A),
-    MK_64(0x1DE0536E,0x8682E539)
+    MK_64(0xD593DA07, 0x41E72355),
+    MK_64(0x15B5E511, 0xAC73E00C),
+    MK_64(0x5180E5AE, 0xBAF2C4F0),
+    MK_64(0x03BD41D3, 0xFCBCAFAF),
+    MK_64(0x1CAEC6FD, 0x1983A898),
+    MK_64(0x6E510B8B, 0xCDD0589F),
+    MK_64(0x77E2BDFD, 0xC6394ADA),
+    MK_64(0xC11E1DB5, 0x24DCB0A3),
+    MK_64(0xD6D14AF9, 0xC6329AB5),
+    MK_64(0x6A9B0BFC, 0x6EB67E0D),
+    MK_64(0x9243C60D, 0xCCFF1332),
+    MK_64(0x1A1F1DDE, 0x743F02D4),
+    MK_64(0x0996753C, 0x10ED0BB8),
+    MK_64(0x6572DD22, 0xF2B4969A),
+    MK_64(0x61FD3062, 0xD00A579A),
+    MK_64(0x1DE0536E, 0x8682E539)
     };
 
 #endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 2bed7c163316..0ea0a6aeb168 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -16,9 +16,9 @@
 
 /*****************************************************************/
 /* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
-void    Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd);
+void    Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void    Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
@@ -34,41 +34,41 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
         u64  w[SKEIN_256_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
     ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
     switch (hashBitLen)
     {             /* use pre-computed values, where available */
     case  256:
-        memcpy(ctx->X,SKEIN_256_IV_256,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
         break;
     case  224:
-        memcpy(ctx->X,SKEIN_256_IV_224,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
         break;
     case  160:
-        memcpy(ctx->X,SKEIN_256_IV_160,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
         break;
     case  128:
-        memcpy(ctx->X,SKEIN_256_IV_128,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
         break;
     default:
         /* here if there is no precomputed IV value available */
         /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
 
         cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
         cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
         cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
 
         /* compute the initial chaining values from config block */
-        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_256_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
         break;
     }
     /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
     return SKEIN_SUCCESS;
 }
@@ -76,7 +76,7 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(struct skein_256_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -84,42 +84,42 @@ int Skein_256_InitExt(struct skein_256_ctx *ctx,size_t hashBitLen,u64 treeInfo,
         u64  w[SKEIN_256_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
     /* compute the initial chaining values ctx->X[], based on key */
     if (keyBytes == 0)                          /* is there a key? */
     {
-        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
     }
     else                                        /* here to pre-process a key */
     {
         Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
         /* do a mini-Init right here */
-        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_256_Update(ctx,key,keyBytes);     /* hash the key */
-        Skein_256_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx,CFG_FINAL);
+    Skein_Start_New_Type(ctx, CFG_FINAL);
 
-    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
     cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
     cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
     cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
 
-    Skein_Show_Key(256,&ctx->h,key,keyBytes);
+    Skein_Show_Key(256, &ctx->h, key, keyBytes);
 
     /* compute the initial chaining values from config block */
-    Skein_256_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+    Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
     /* The chaining vars ctx->X are now initialized */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);
+    Skein_Start_New_Type(ctx, MSG);
 
     return SKEIN_SUCCESS;
 }
@@ -130,7 +130,7 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt
 {
     size_t n;
 
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* process full blocks, if any */
     if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
@@ -141,20 +141,20 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt
             if (n)
             {
                 Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
                 msgByteCnt  -= n;
                 msg         += n;
                 ctx->h.bCnt += n;
             }
             Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
-            Skein_256_Process_Block(ctx,ctx->b,1,SKEIN_256_BLOCK_BYTES);
+            Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
             ctx->h.bCnt = 0;
         }
         /* now process any remaining full blocks, directly from input message data */
         if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
         {
             n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_256_Process_Block(ctx,msg,n,SKEIN_256_BLOCK_BYTES);
+            Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
             msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
             msg        += n * SKEIN_256_BLOCK_BYTES;
         }
@@ -165,7 +165,7 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt
     if (msgByteCnt)
     {
         Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
         ctx->h.bCnt += msgByteCnt;
     }
 
@@ -176,33 +176,33 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt
 /* finalize the hash computation and output the result */
 int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
     if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
 
-    Skein_256_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_256_BLOCK_BYTES)
             n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -221,42 +221,42 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
         u64  w[SKEIN_512_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
     ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
     switch (hashBitLen)
     {             /* use pre-computed values, where available */
     case  512:
-        memcpy(ctx->X,SKEIN_512_IV_512,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
         break;
     case  384:
-        memcpy(ctx->X,SKEIN_512_IV_384,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
         break;
     case  256:
-        memcpy(ctx->X,SKEIN_512_IV_256,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
         break;
     case  224:
-        memcpy(ctx->X,SKEIN_512_IV_224,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
         break;
     default:
         /* here if there is no precomputed IV value available */
         /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
 
         cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
         cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
         cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
 
         /* compute the initial chaining values from config block */
-        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_512_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
         break;
     }
 
     /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
     return SKEIN_SUCCESS;
 }
@@ -264,7 +264,7 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(struct skein_512_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -272,42 +272,42 @@ int Skein_512_InitExt(struct skein_512_ctx *ctx,size_t hashBitLen,u64 treeInfo,
         u64  w[SKEIN_512_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
     /* compute the initial chaining values ctx->X[], based on key */
     if (keyBytes == 0)                          /* is there a key? */
     {
-        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
     }
     else                                        /* here to pre-process a key */
     {
         Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
         /* do a mini-Init right here */
-        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_512_Update(ctx,key,keyBytes);     /* hash the key */
-        Skein_512_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx,CFG_FINAL);
+    Skein_Start_New_Type(ctx, CFG_FINAL);
 
-    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
     cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
     cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
     cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
 
-    Skein_Show_Key(512,&ctx->h,key,keyBytes);
+    Skein_Show_Key(512, &ctx->h, key, keyBytes);
 
     /* compute the initial chaining values from config block */
-    Skein_512_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+    Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
     /* The chaining vars ctx->X are now initialized */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);
+    Skein_Start_New_Type(ctx, MSG);
 
     return SKEIN_SUCCESS;
 }
@@ -318,7 +318,7 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt
 {
     size_t n;
 
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* process full blocks, if any */
     if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
@@ -329,20 +329,20 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt
             if (n)
             {
                 Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
                 msgByteCnt  -= n;
                 msg         += n;
                 ctx->h.bCnt += n;
             }
             Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
-            Skein_512_Process_Block(ctx,ctx->b,1,SKEIN_512_BLOCK_BYTES);
+            Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
             ctx->h.bCnt = 0;
         }
         /* now process any remaining full blocks, directly from input message data */
         if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
         {
             n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_512_Process_Block(ctx,msg,n,SKEIN_512_BLOCK_BYTES);
+            Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
             msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
             msg        += n * SKEIN_512_BLOCK_BYTES;
         }
@@ -353,7 +353,7 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt
     if (msgByteCnt)
     {
         Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
         ctx->h.bCnt += msgByteCnt;
     }
 
@@ -364,33 +364,33 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt
 /* finalize the hash computation and output the result */
 int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
     if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
 
-    Skein_512_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_512_BLOCK_BYTES)
             n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(512,&ctx->h,n,hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -409,39 +409,39 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
         u64  w[SKEIN1024_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
     ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
     switch (hashBitLen)
     {              /* use pre-computed values, where available */
     case  512:
-        memcpy(ctx->X,SKEIN1024_IV_512 ,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
         break;
     case  384:
-        memcpy(ctx->X,SKEIN1024_IV_384 ,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
         break;
     case 1024:
-        memcpy(ctx->X,SKEIN1024_IV_1024,sizeof(ctx->X));
+        memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
         break;
     default:
         /* here if there is no precomputed IV value available */
         /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx,CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
 
         cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
         cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
         cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3],0,sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
 
         /* compute the initial chaining values from config block */
-        memset(ctx->X,0,sizeof(ctx->X));            /* zero the chaining variables */
-        Skein1024_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
         break;
     }
 
     /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);              /* T0=0, T1= MSG type */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
     return SKEIN_SUCCESS;
 }
@@ -449,7 +449,7 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
 /* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(struct skein1024_ctx *ctx,size_t hashBitLen,u64 treeInfo, const u8 *key, size_t keyBytes)
+int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
     union
     {
@@ -457,42 +457,42 @@ int Skein1024_InitExt(struct skein1024_ctx *ctx,size_t hashBitLen,u64 treeInfo,
         u64  w[SKEIN1024_STATE_WORDS];
     } cfg;                              /* config block */
 
-    Skein_Assert(hashBitLen > 0,SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL,SKEIN_FAIL);
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
     /* compute the initial chaining values ctx->X[], based on key */
     if (keyBytes == 0)                          /* is there a key? */
     {
-        memset(ctx->X,0,sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
     }
     else                                        /* here to pre-process a key */
     {
         Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
         /* do a mini-Init right here */
-        ctx->h.hashBitLen=8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx,KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X,0,sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein1024_Update(ctx,key,keyBytes);     /* hash the key */
-        Skein1024_Final_Pad(ctx,cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X,cfg.b,sizeof(cfg.b));     /* copy over into ctx->X[] */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
     }
     /* build/process the config block, type == CONFIG (could be precomputed for each key) */
     ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx,CFG_FINAL);
+    Skein_Start_New_Type(ctx, CFG_FINAL);
 
-    memset(&cfg.w,0,sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
     cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
     cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
     cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
 
-    Skein_Show_Key(1024,&ctx->h,key,keyBytes);
+    Skein_Show_Key(1024, &ctx->h, key, keyBytes);
 
     /* compute the initial chaining values from config block */
-    Skein1024_Process_Block(ctx,cfg.b,1,SKEIN_CFG_STR_LEN);
+    Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
     /* The chaining vars ctx->X are now initialized */
     /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx,MSG);
+    Skein_Start_New_Type(ctx, MSG);
 
     return SKEIN_SUCCESS;
 }
@@ -503,7 +503,7 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt
 {
     size_t n;
 
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* process full blocks, if any */
     if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
@@ -514,20 +514,20 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt
             if (n)
             {
                 Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
                 msgByteCnt  -= n;
                 msg         += n;
                 ctx->h.bCnt += n;
             }
             Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
-            Skein1024_Process_Block(ctx,ctx->b,1,SKEIN1024_BLOCK_BYTES);
+            Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
             ctx->h.bCnt = 0;
         }
         /* now process any remaining full blocks, directly from input message data */
         if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
         {
             n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein1024_Process_Block(ctx,msg,n,SKEIN1024_BLOCK_BYTES);
+            Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
             msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
             msg        += n * SKEIN1024_BLOCK_BYTES;
         }
@@ -538,7 +538,7 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt
     if (msgByteCnt)
     {
         Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
         ctx->h.bCnt += msgByteCnt;
     }
 
@@ -549,33 +549,33 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt
 /* finalize the hash computation and output the result */
 int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
     if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
 
-    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);  /* process the final block */
+    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN1024_BLOCK_BYTES)
             n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(1024,&ctx->h,n,hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -587,14 +587,14 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
     if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_256_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
 
     return SKEIN_SUCCESS;
 }
@@ -603,14 +603,14 @@ int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
     if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_512_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
 
     return SKEIN_SUCCESS;
 }
@@ -619,14 +619,14 @@ int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
     if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt],0,SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt);    /* process the final block */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal,ctx->X,SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
 
     return SKEIN_SUCCESS;
 }
@@ -636,27 +636,27 @@ int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 /* just do the OUTPUT stage                                       */
 int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN_256_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_256_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_256_BLOCK_BYTES)
             n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -665,27 +665,27 @@ int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 /* just do the OUTPUT stage                                       */
 int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN_512_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein_512_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN_512_BLOCK_BYTES)
             n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
@@ -694,27 +694,27 @@ int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 /* just do the OUTPUT stage                                       */
 int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    size_t i,n,byteCnt;
+    size_t i, n, byteCnt;
     u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES,SKEIN_FAIL);    /* catch uninitialized context */
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
     /* now output the result */
     byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
 
     /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b,0,sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X,ctx->X,sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i=0;i*SKEIN1024_BLOCK_BYTES < byteCnt;i++)
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
     {
-        ((u64 *)ctx->b)[0]= Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx,OUT_FINAL);
-        Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64)); /* run "counter mode" */
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
         n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
         if (n >= SKEIN1024_BLOCK_BYTES)
             n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES,ctx->X,n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256,&ctx->h,n,hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X,X,sizeof(X));   /* restore the counter mode key for next time */
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
     }
     return SKEIN_SUCCESS;
 }
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index df92806c4ec4..a3f471be8db3 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -31,7 +31,7 @@ int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size)
 {
     Skein_Assert(ctx && size, SKEIN_FAIL);
 
-    memset(ctx ,0, sizeof(struct skein_ctx));
+    memset(ctx , 0, sizeof(struct skein_ctx));
     ctx->skeinSize = size;
 
     return SKEIN_SUCCESS;
@@ -97,18 +97,18 @@ int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
     case Skein256:
         ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
                                 treeInfo,
-                                (const u8*)key, keyLen);
+                                (const u8 *)key, keyLen);
 
         break;
     case Skein512:
         ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
                                 treeInfo,
-                                (const u8*)key, keyLen);
+                                (const u8 *)key, keyLen);
         break;
     case Skein1024:
         ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
                                 treeInfo,
-                                (const u8*)key, keyLen);
+                                (const u8 *)key, keyLen);
 
         break;
     }
@@ -146,13 +146,13 @@ int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
 
     switch (ctx->skeinSize) {
     case Skein256:
-        ret = Skein_256_Update(&ctx->m.s256, (const u8*)msg, msgByteCnt);
+        ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg, msgByteCnt);
         break;
     case Skein512:
-        ret = Skein_512_Update(&ctx->m.s512, (const u8*)msg, msgByteCnt);
+        ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg, msgByteCnt);
         break;
     case Skein1024:
-        ret = Skein1024_Update(&ctx->m.s1024, (const u8*)msg, msgByteCnt);
+        ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg, msgByteCnt);
         break;
     }
     return ret;
@@ -186,7 +186,7 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
      * Skein's real partial block buffer.
      * If this layout ever changes we have to adapt this as well.
      */
-    up = (u8*)ctx->m.s256.X + ctx->skeinSize / 8;
+    up = (u8 *)ctx->m.s256.X + ctx->skeinSize / 8;
 
     Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
 
@@ -206,13 +206,13 @@ int skeinFinal(struct skein_ctx *ctx, u8 *hash)
 
     switch (ctx->skeinSize) {
     case Skein256:
-        ret = Skein_256_Final(&ctx->m.s256, (u8*)hash);
+        ret = Skein_256_Final(&ctx->m.s256, (u8 *)hash);
         break;
     case Skein512:
-        ret = Skein_512_Final(&ctx->m.s512, (u8*)hash);
+        ret = Skein_512_Final(&ctx->m.s512, (u8 *)hash);
         break;
     case Skein1024:
-        ret = Skein1024_Final(&ctx->m.s1024, (u8*)hash);
+        ret = Skein1024_Final(&ctx->m.s1024, (u8 *)hash);
         break;
     }
     return ret;
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 02e68dbab0d4..a4b1ec56ad83 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -147,16 +147,16 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
         blkPtr += SKEIN1024_BLOCK_BYTES;
 
         /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[ 0] = ctx->X[ 0] ^ w[ 0];
-        ctx->X[ 1] = ctx->X[ 1] ^ w[ 1];
-        ctx->X[ 2] = ctx->X[ 2] ^ w[ 2];
-        ctx->X[ 3] = ctx->X[ 3] ^ w[ 3];
-        ctx->X[ 4] = ctx->X[ 4] ^ w[ 4];
-        ctx->X[ 5] = ctx->X[ 5] ^ w[ 5];
-        ctx->X[ 6] = ctx->X[ 6] ^ w[ 6];
-        ctx->X[ 7] = ctx->X[ 7] ^ w[ 7];
-        ctx->X[ 8] = ctx->X[ 8] ^ w[ 8];
-        ctx->X[ 9] = ctx->X[ 9] ^ w[ 9];
+        ctx->X[0]  = ctx->X[0]  ^ w[0];
+        ctx->X[1]  = ctx->X[1]  ^ w[1];
+        ctx->X[2]  = ctx->X[2]  ^ w[2];
+        ctx->X[3]  = ctx->X[3]  ^ w[3];
+        ctx->X[4]  = ctx->X[4]  ^ w[4];
+        ctx->X[5]  = ctx->X[5]  ^ w[5];
+        ctx->X[6]  = ctx->X[6]  ^ w[6];
+        ctx->X[7]  = ctx->X[7]  ^ w[7];
+        ctx->X[8]  = ctx->X[8]  ^ w[8];
+        ctx->X[9]  = ctx->X[9]  ^ w[9];
         ctx->X[10] = ctx->X[10] ^ w[10];
         ctx->X[11] = ctx->X[11] ^ w[11];
         ctx->X[12] = ctx->X[12] ^ w[12];
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 179bde121380..791bacdd3d57 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -39,16 +39,15 @@
 
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
     { /* do it in C */
-    enum
-        {
+    enum {
         WCNT = SKEIN_256_STATE_WORDS
         };
 #undef  RCNT
 #define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
 
-#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
 #else
 #define SKEIN_UNROLL_256 (0)
@@ -63,8 +62,8 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t b
 #else
     u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64  X0,X1,X2,X3;                        /* local copy of context vars, for speed */
-    u64  w [WCNT];                           /* local copy of input block */
+    u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
+    u64  w[WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
     const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
     Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
@@ -85,95 +84,95 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t b
 
         ts[2] = ts[0] ^ ts[1];
 
-        Skein_Get64_LSB_First(w,blkPtr,WCNT);   /* get input block in little-endian format */
+        Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
         DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
         X0 = w[0] + ks[0];                      /* do the first full key injection */
         X1 = w[1] + ks[1] + ts[0];
         X2 = w[2] + ks[2] + ts[1];
         X3 = w[3] + ks[3];
 
-        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);    /* show starting state values */
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
 
         blkPtr += SKEIN_256_BLOCK_BYTES;
 
         /* run the rounds */
 
-#define Round256(p0,p1,p2,p3,ROT,rNum)                              \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2; \
+#define Round256(p0, p1, p2, p3, ROT, rNum)                              \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 
 #if SKEIN_UNROLL_256 == 0                       
-#define R256(p0,p1,p2,p3,ROT,rNum)           /* fully unrolled */   \
-    Round256(p0,p1,p2,p3,ROT,rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rNum,Xptr);
+#define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
+    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
 #define I256(R)                                                     \
     X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
     X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
     X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
     X3   += ks[((R)+4) % 5] +     (R)+1;                            \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
-#define R256(p0,p1,p2,p3,ROT,rNum)                                  \
-    Round256(p0,p1,p2,p3,ROT,rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rNum,Xptr);
+#define R256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
 #define I256(R)                                                     \
     X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
     X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
     X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
-    X3   += ks[r+(R)+3] +    r+(R)   ;                              \
-    ks[r + (R)+4    ]   = ks[r+(R)-1];     /* rotate key schedule */\
-    ts[r + (R)+2    ]   = ts[r+(R)-1];                              \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+    X3   += ks[r+(R)+3] +    r+(R);                              \
+    ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
+    ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 
-    for (r=1;r < 2*RCNT;r+=2*SKEIN_UNROLL_256)  /* loop thru it */
+    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
 #endif  
         {    
 #define R256_8_rounds(R)                  \
-        R256(0,1,2,3,R_256_0,8*(R) + 1);  \
-        R256(0,3,2,1,R_256_1,8*(R) + 2);  \
-        R256(0,1,2,3,R_256_2,8*(R) + 3);  \
-        R256(0,3,2,1,R_256_3,8*(R) + 4);  \
-        I256(2*(R));                      \
-        R256(0,1,2,3,R_256_4,8*(R) + 5);  \
-        R256(0,3,2,1,R_256_5,8*(R) + 6);  \
-        R256(0,1,2,3,R_256_6,8*(R) + 7);  \
-        R256(0,3,2,1,R_256_7,8*(R) + 8);  \
-        I256(2*(R)+1);
-
-        R256_8_rounds( 0);
+        R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
+        R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
+        R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
+        R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
+        I256(2 * (R));                      \
+        R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
+        R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
+        R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
+        R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
+        I256(2 * (R) + 1);
+
+        R256_8_rounds(0);
 
 #define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
 
-  #if   R256_Unroll_R( 1)
-        R256_8_rounds( 1);
+  #if   R256_Unroll_R(1)
+        R256_8_rounds(1);
   #endif
-  #if   R256_Unroll_R( 2)
-        R256_8_rounds( 2);
+  #if   R256_Unroll_R(2)
+        R256_8_rounds(2);
   #endif
-  #if   R256_Unroll_R( 3)
-        R256_8_rounds( 3);
+  #if   R256_Unroll_R(3)
+        R256_8_rounds(3);
   #endif
-  #if   R256_Unroll_R( 4)
-        R256_8_rounds( 4);
+  #if   R256_Unroll_R(4)
+        R256_8_rounds(4);
   #endif
-  #if   R256_Unroll_R( 5)
-        R256_8_rounds( 5);
+  #if   R256_Unroll_R(5)
+        R256_8_rounds(5);
   #endif
-  #if   R256_Unroll_R( 6)
-        R256_8_rounds( 6);
+  #if   R256_Unroll_R(6)
+        R256_8_rounds(6);
   #endif
-  #if   R256_Unroll_R( 7)
-        R256_8_rounds( 7);
+  #if   R256_Unroll_R(7)
+        R256_8_rounds(7);
   #endif
-  #if   R256_Unroll_R( 8)
-        R256_8_rounds( 8);
+  #if   R256_Unroll_R(8)
+        R256_8_rounds(8);
   #endif
-  #if   R256_Unroll_R( 9)
-        R256_8_rounds( 9);
+  #if   R256_Unroll_R(9)
+        R256_8_rounds(9);
   #endif
   #if   R256_Unroll_R(10)
         R256_8_rounds(10);
@@ -200,7 +199,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx,const u8 *blkPtr,size_t b
         ctx->X[2] = X2 ^ w[2];
         ctx->X[3] = X3 ^ w[3];
 
-        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
 
         ts[1] &= ~SKEIN_T1_FLAG_FIRST;
         }
@@ -224,16 +223,15 @@ unsigned int Skein_256_Unroll_Cnt(void)
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
     { /* do it in C */
-    enum
-        {
+    enum {
         WCNT = SKEIN_512_STATE_WORDS
         };
 #undef  RCNT
 #define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
 
-#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
 #else
 #define SKEIN_UNROLL_512 (0)
@@ -248,8 +246,8 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t b
 #else
     u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64  X0,X1,X2,X3,X4,X5,X6,X7;            /* local copy of vars, for speed */
-    u64  w [WCNT];                           /* local copy of input block */
+    u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
+    u64  w[WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
     const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
     Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
@@ -277,9 +275,9 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t b
 
         ts[2] = ts[0] ^ ts[1];
 
-        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */
+        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
         DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
         X0   = w[0] + ks[0];                    /* do the first full key injection */
         X1   = w[1] + ks[1];
@@ -292,92 +290,92 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t b
 
         blkPtr += SKEIN_512_BLOCK_BYTES;
 
-        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
         /* run the rounds */
-#define Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                  \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2; \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4; \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6; \
+#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
 
 #if SKEIN_UNROLL_512 == 0                       
-#define R512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)      /* unrolled */  \
-    Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rNum,Xptr);
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
+    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
 #define I512(R)                                                     \
-    X0   += ks[((R)+1) % 9];   /* inject the key schedule value */  \
-    X1   += ks[((R)+2) % 9];                                        \
-    X2   += ks[((R)+3) % 9];                                        \
-    X3   += ks[((R)+4) % 9];                                        \
-    X4   += ks[((R)+5) % 9];                                        \
-    X5   += ks[((R)+6) % 9] + ts[((R)+1) % 3];                      \
-    X6   += ks[((R)+7) % 9] + ts[((R)+2) % 3];                      \
-    X7   += ks[((R)+8) % 9] +     (R)+1;                            \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
+    X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
+    X1   += ks[((R) + 2) % 9];                                        \
+    X2   += ks[((R) + 3) % 9];                                        \
+    X3   += ks[((R) + 4) % 9];                                        \
+    X4   += ks[((R) + 5) % 9];                                        \
+    X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
+    X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
+    X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
-#define R512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
-    Round512(p0,p1,p2,p3,p4,p5,p6,p7,ROT,rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rNum,Xptr);
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
 #define I512(R)                                                     \
-    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
-    X1   += ks[r+(R)+1];                                            \
-    X2   += ks[r+(R)+2];                                            \
-    X3   += ks[r+(R)+3];                                            \
-    X4   += ks[r+(R)+4];                                            \
-    X5   += ks[r+(R)+5] + ts[r+(R)+0];                              \
-    X6   += ks[r+(R)+6] + ts[r+(R)+1];                              \
-    X7   += ks[r+(R)+7] +    r+(R)   ;                              \
-    ks[r +       (R)+8] = ks[r+(R)-1];  /* rotate key schedule */   \
-    ts[r +       (R)+2] = ts[r+(R)-1];                              \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
-
-    for (r=1;r < 2*RCNT;r+=2*SKEIN_UNROLL_512)   /* loop thru it */
+    X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
+    X1   += ks[r + (R) + 1];                                            \
+    X2   += ks[r + (R) + 2];                                            \
+    X3   += ks[r + (R) + 3];                                            \
+    X4   += ks[r + (R) + 4];                                            \
+    X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
+    X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
+    X7   += ks[r + (R) + 7] +         r + (R);                              \
+    ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
+    ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
 #endif                         /* end of looped code definitions */
         {
 #define R512_8_rounds(R)  /* do 8 full rounds */  \
-        R512(0,1,2,3,4,5,6,7,R_512_0,8*(R)+ 1);   \
-        R512(2,1,4,7,6,5,0,3,R_512_1,8*(R)+ 2);   \
-        R512(4,1,6,3,0,5,2,7,R_512_2,8*(R)+ 3);   \
-        R512(6,1,0,7,2,5,4,3,R_512_3,8*(R)+ 4);   \
-        I512(2*(R));                              \
-        R512(0,1,2,3,4,5,6,7,R_512_4,8*(R)+ 5);   \
-        R512(2,1,4,7,6,5,0,3,R_512_5,8*(R)+ 6);   \
-        R512(4,1,6,3,0,5,2,7,R_512_6,8*(R)+ 7);   \
-        R512(6,1,0,7,2,5,4,3,R_512_7,8*(R)+ 8);   \
-        I512(2*(R)+1);        /* and key injection */
-
-        R512_8_rounds( 0);
+        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
+        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
+        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
+        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
+        I512(2 * (R));                              \
+        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
+        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
+        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
+        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
+        I512(2 * (R) + 1);        /* and key injection */
+
+        R512_8_rounds(0);
 
 #define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
 
-  #if   R512_Unroll_R( 1)
-        R512_8_rounds( 1);
+  #if   R512_Unroll_R(1)
+        R512_8_rounds(1);
   #endif
-  #if   R512_Unroll_R( 2)
-        R512_8_rounds( 2);
+  #if   R512_Unroll_R(2)
+        R512_8_rounds(2);
   #endif
-  #if   R512_Unroll_R( 3)
-        R512_8_rounds( 3);
+  #if   R512_Unroll_R(3)
+        R512_8_rounds(3);
   #endif
-  #if   R512_Unroll_R( 4)
-        R512_8_rounds( 4);
+  #if   R512_Unroll_R(4)
+        R512_8_rounds(4);
   #endif
-  #if   R512_Unroll_R( 5)
-        R512_8_rounds( 5);
+  #if   R512_Unroll_R(5)
+        R512_8_rounds(5);
   #endif
-  #if   R512_Unroll_R( 6)
-        R512_8_rounds( 6);
+  #if   R512_Unroll_R(6)
+        R512_8_rounds(6);
   #endif
-  #if   R512_Unroll_R( 7)
-        R512_8_rounds( 7);
+  #if   R512_Unroll_R(7)
+        R512_8_rounds(7);
   #endif
-  #if   R512_Unroll_R( 8)
-        R512_8_rounds( 8);
+  #if   R512_Unroll_R(8)
+        R512_8_rounds(8);
   #endif
-  #if   R512_Unroll_R( 9)
-        R512_8_rounds( 9);
+  #if   R512_Unroll_R(9)
+        R512_8_rounds(9);
   #endif
   #if   R512_Unroll_R(10)
         R512_8_rounds(10);
@@ -408,7 +406,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx,const u8 *blkPtr,size_t b
         ctx->X[5] = X5 ^ w[5];
         ctx->X[6] = X6 ^ w[6];
         ctx->X[7] = X7 ^ w[7];
-        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
 
         ts[1] &= ~SKEIN_T1_FLAG_FIRST;
         }
@@ -432,16 +430,15 @@ unsigned int Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t blkCnt,size_t byteCntAdd)
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
     { /* do it in C, always looping (unrolled is bigger AND slower!) */
-    enum
-        {
+    enum {
         WCNT = SKEIN1024_STATE_WORDS
         };
 #undef  RCNT
 #define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
 
-#ifdef  SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
 #else
 #define SKEIN_UNROLL_1024 (0)
@@ -457,14 +454,14 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
     u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
 
-    u64  X00,X01,X02,X03,X04,X05,X06,X07,    /* local copy of vars, for speed */
-            X08,X09,X10,X11,X12,X13,X14,X15;
-    u64  w [WCNT];                           /* local copy of input block */
+    u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
+            X08, X09, X10, X11, X12, X13, X14, X15;
+    u64  w[WCNT];                            /* local copy of input block */
 #ifdef SKEIN_DEBUG
     const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
-    Xptr[ 0] = &X00;  Xptr[ 1] = &X01;  Xptr[ 2] = &X02;  Xptr[ 3] = &X03;
-    Xptr[ 4] = &X04;  Xptr[ 5] = &X05;  Xptr[ 6] = &X06;  Xptr[ 7] = &X07;
-    Xptr[ 8] = &X08;  Xptr[ 9] = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
+    Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
+    Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
+    Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
     Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
 #endif
 
@@ -476,43 +473,43 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
         ts[0] += byteCntAdd;                    /* update processed length */
 
         /* precompute the key schedule for this block */
-        ks[ 0] = ctx->X[ 0];
-        ks[ 1] = ctx->X[ 1];
-        ks[ 2] = ctx->X[ 2];
-        ks[ 3] = ctx->X[ 3];
-        ks[ 4] = ctx->X[ 4];
-        ks[ 5] = ctx->X[ 5];
-        ks[ 6] = ctx->X[ 6];
-        ks[ 7] = ctx->X[ 7];
-        ks[ 8] = ctx->X[ 8];
-        ks[ 9] = ctx->X[ 9];
+        ks[0]  = ctx->X[0];
+        ks[1]  = ctx->X[1];
+        ks[2]  = ctx->X[2];
+        ks[3]  = ctx->X[3];
+        ks[4]  = ctx->X[4];
+        ks[5]  = ctx->X[5];
+        ks[6]  = ctx->X[6];
+        ks[7]  = ctx->X[7];
+        ks[8]  = ctx->X[8];
+        ks[9]  = ctx->X[9];
         ks[10] = ctx->X[10];
         ks[11] = ctx->X[11];
         ks[12] = ctx->X[12];
         ks[13] = ctx->X[13];
         ks[14] = ctx->X[14];
         ks[15] = ctx->X[15];
-        ks[16] = ks[ 0] ^ ks[ 1] ^ ks[ 2] ^ ks[ 3] ^
-                 ks[ 4] ^ ks[ 5] ^ ks[ 6] ^ ks[ 7] ^
-                 ks[ 8] ^ ks[ 9] ^ ks[10] ^ ks[11] ^
+        ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
+                  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
+                  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
                  ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
 
         ts[2]  = ts[0] ^ ts[1];
 
-        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */
+        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
         DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS,&ctx->h,ctx->X,blkPtr,w,ks,ts);
-
-        X00    = w[ 0] + ks[ 0];                 /* do the first full key injection */
-        X01    = w[ 1] + ks[ 1];
-        X02    = w[ 2] + ks[ 2];
-        X03    = w[ 3] + ks[ 3];
-        X04    = w[ 4] + ks[ 4];
-        X05    = w[ 5] + ks[ 5];
-        X06    = w[ 6] + ks[ 6];
-        X07    = w[ 7] + ks[ 7];
-        X08    = w[ 8] + ks[ 8];
-        X09    = w[ 9] + ks[ 9];
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+        X00    =  w[0] +  ks[0];                 /* do the first full key injection */
+        X01    =  w[1] +  ks[1];
+        X02    =  w[2] +  ks[2];
+        X03    =  w[3] +  ks[3];
+        X04    =  w[4] +  ks[4];
+        X05    =  w[5] +  ks[5];
+        X06    =  w[6] +  ks[6];
+        X07    =  w[7] +  ks[7];
+        X08    =  w[8] +  ks[8];
+        X09    =  w[9] +  ks[9];
         X10    = w[10] + ks[10];
         X11    = w[11] + ks[11];
         X12    = w[12] + ks[12];
@@ -520,112 +517,112 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
         X14    = w[14] + ks[14] + ts[1];
         X15    = w[15] + ks[15];
 
-        Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INITIAL,Xptr);
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
 
-#define Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rNum) \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0;   \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2;   \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4;   \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6;   \
-    X##p8 += X##p9; X##p9 = RotL_64(X##p9,ROT##_4); X##p9 ^= X##p8;   \
-    X##pA += X##pB; X##pB = RotL_64(X##pB,ROT##_5); X##pB ^= X##pA;   \
-    X##pC += X##pD; X##pD = RotL_64(X##pD,ROT##_6); X##pD ^= X##pC;   \
-    X##pE += X##pF; X##pF = RotL_64(X##pF,ROT##_7); X##pF ^= X##pE;   \
+#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
+    X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
+    X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
+    X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
+    X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
 
 #if SKEIN_UNROLL_1024 == 0                      
-#define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
-    Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,rn,Xptr);
-
-#define I1024(R)                                                      \
-    X00   += ks[((R)+ 1) % 17]; /* inject the key schedule value */   \
-    X01   += ks[((R)+ 2) % 17];                                       \
-    X02   += ks[((R)+ 3) % 17];                                       \
-    X03   += ks[((R)+ 4) % 17];                                       \
-    X04   += ks[((R)+ 5) % 17];                                       \
-    X05   += ks[((R)+ 6) % 17];                                       \
-    X06   += ks[((R)+ 7) % 17];                                       \
-    X07   += ks[((R)+ 8) % 17];                                       \
-    X08   += ks[((R)+ 9) % 17];                                       \
-    X09   += ks[((R)+10) % 17];                                       \
-    X10   += ks[((R)+11) % 17];                                       \
-    X11   += ks[((R)+12) % 17];                                       \
-    X12   += ks[((R)+13) % 17];                                       \
-    X13   += ks[((R)+14) % 17] + ts[((R)+1) % 3];                     \
-    X14   += ks[((R)+15) % 17] + ts[((R)+2) % 3];                     \
-    X15   += ks[((R)+16) % 17] +     (R)+1;                           \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr); 
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+
+#define I1024(R)                                                        \
+    X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
+    X01   += ks[((R) +  2) % 17];                                       \
+    X02   += ks[((R) +  3) % 17];                                       \
+    X03   += ks[((R) +  4) % 17];                                       \
+    X04   += ks[((R) +  5) % 17];                                       \
+    X05   += ks[((R) +  6) % 17];                                       \
+    X06   += ks[((R) +  7) % 17];                                       \
+    X07   += ks[((R) +  8) % 17];                                       \
+    X08   += ks[((R) +  9) % 17];                                       \
+    X09   += ks[((R) + 10) % 17];                                       \
+    X10   += ks[((R) + 11) % 17];                                       \
+    X11   += ks[((R) + 12) % 17];                                       \
+    X12   += ks[((R) + 13) % 17];                                       \
+    X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
+    X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
+    X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
 #else                                       /* looping version */
-#define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
-    Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,4*(r-1)+rn,Xptr);
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
 
 #define I1024(R)                                                      \
-    X00   += ks[r+(R)+ 0];    /* inject the key schedule value */     \
-    X01   += ks[r+(R)+ 1];                                            \
-    X02   += ks[r+(R)+ 2];                                            \
-    X03   += ks[r+(R)+ 3];                                            \
-    X04   += ks[r+(R)+ 4];                                            \
-    X05   += ks[r+(R)+ 5];                                            \
-    X06   += ks[r+(R)+ 6];                                            \
-    X07   += ks[r+(R)+ 7];                                            \
-    X08   += ks[r+(R)+ 8];                                            \
-    X09   += ks[r+(R)+ 9];                                            \
-    X10   += ks[r+(R)+10];                                            \
-    X11   += ks[r+(R)+11];                                            \
-    X12   += ks[r+(R)+12];                                            \
-    X13   += ks[r+(R)+13] + ts[r+(R)+0];                              \
-    X14   += ks[r+(R)+14] + ts[r+(R)+1];                              \
-    X15   += ks[r+(R)+15] +    r+(R)   ;                              \
-    ks[r  +       (R)+16] = ks[r+(R)-1];  /* rotate key schedule */   \
-    ts[r  +       (R)+ 2] = ts[r+(R)-1];                              \
-    Skein_Show_R_Ptr(BLK_BITS,&ctx->h,SKEIN_RND_KEY_INJECT,Xptr);
-
-    for (r=1;r <= 2*RCNT;r+=2*SKEIN_UNROLL_1024)    /* loop thru it */
+    X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
+    X01   += ks[r + (R) +  1];                                            \
+    X02   += ks[r + (R) +  2];                                            \
+    X03   += ks[r + (R) +  3];                                            \
+    X04   += ks[r + (R) +  4];                                            \
+    X05   += ks[r + (R) +  5];                                            \
+    X06   += ks[r + (R) +  6];                                            \
+    X07   += ks[r + (R) +  7];                                            \
+    X08   += ks[r + (R) +  8];                                            \
+    X09   += ks[r + (R) +  9];                                            \
+    X10   += ks[r + (R) + 10];                                            \
+    X11   += ks[r + (R) + 11];                                            \
+    X12   += ks[r + (R) + 12];                                            \
+    X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
+    X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
+    X15   += ks[r + (R) + 15] +         r + (R);                          \
+    ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
+    ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
+    Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
 #endif  
         {
 #define R1024_8_rounds(R)    /* do 8 full rounds */                               \
-        R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_0,8*(R) + 1); \
-        R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_1,8*(R) + 2); \
-        R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_2,8*(R) + 3); \
-        R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_3,8*(R) + 4); \
+        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
+        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
+        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
+        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
         I1024(2*(R));                                                             \
-        R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_4,8*(R) + 5); \
-        R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_5,8*(R) + 6); \
-        R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_6,8*(R) + 7); \
-        R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_7,8*(R) + 8); \
+        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
+        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
+        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
+        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
         I1024(2*(R)+1);
 
-        R1024_8_rounds( 0);
+        R1024_8_rounds(0);
 
 #define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
 
-  #if   R1024_Unroll_R( 1)
-        R1024_8_rounds( 1);
+  #if   R1024_Unroll_R(1)
+        R1024_8_rounds(1);
   #endif
-  #if   R1024_Unroll_R( 2)
-        R1024_8_rounds( 2);
+  #if   R1024_Unroll_R(2)
+        R1024_8_rounds(2);
   #endif
-  #if   R1024_Unroll_R( 3)
-        R1024_8_rounds( 3);
+  #if   R1024_Unroll_R(3)
+        R1024_8_rounds(3);
   #endif
-  #if   R1024_Unroll_R( 4)
-        R1024_8_rounds( 4);
+  #if   R1024_Unroll_R(4)
+        R1024_8_rounds(4);
   #endif
-  #if   R1024_Unroll_R( 5)
-        R1024_8_rounds( 5);
+  #if   R1024_Unroll_R(5)
+        R1024_8_rounds(5);
   #endif
-  #if   R1024_Unroll_R( 6)
-        R1024_8_rounds( 6);
+  #if   R1024_Unroll_R(6)
+        R1024_8_rounds(6);
   #endif
-  #if   R1024_Unroll_R( 7)
-        R1024_8_rounds( 7);
+  #if   R1024_Unroll_R(7)
+        R1024_8_rounds(7);
   #endif
-  #if   R1024_Unroll_R( 8)
-        R1024_8_rounds( 8);
+  #if   R1024_Unroll_R(8)
+        R1024_8_rounds(8);
   #endif
-  #if   R1024_Unroll_R( 9)
-        R1024_8_rounds( 9);
+  #if   R1024_Unroll_R(9)
+        R1024_8_rounds(9);
   #endif
   #if   R1024_Unroll_R(10)
         R1024_8_rounds(10);
@@ -648,16 +645,16 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
         }
         /* do the final "feedforward" xor, update context chaining vars */
 
-        ctx->X[ 0] = X00 ^ w[ 0];
-        ctx->X[ 1] = X01 ^ w[ 1];
-        ctx->X[ 2] = X02 ^ w[ 2];
-        ctx->X[ 3] = X03 ^ w[ 3];
-        ctx->X[ 4] = X04 ^ w[ 4];
-        ctx->X[ 5] = X05 ^ w[ 5];
-        ctx->X[ 6] = X06 ^ w[ 6];
-        ctx->X[ 7] = X07 ^ w[ 7];
-        ctx->X[ 8] = X08 ^ w[ 8];
-        ctx->X[ 9] = X09 ^ w[ 9];
+        ctx->X[0] = X00 ^ w[0];
+        ctx->X[1] = X01 ^ w[1];
+        ctx->X[2] = X02 ^ w[2];
+        ctx->X[3] = X03 ^ w[3];
+        ctx->X[4] = X04 ^ w[4];
+        ctx->X[5] = X05 ^ w[5];
+        ctx->X[6] = X06 ^ w[6];
+        ctx->X[7] = X07 ^ w[7];
+        ctx->X[8] = X08 ^ w[8];
+        ctx->X[9] = X09 ^ w[9];
         ctx->X[10] = X10 ^ w[10];
         ctx->X[11] = X11 ^ w[11];
         ctx->X[12] = X12 ^ w[12];
@@ -665,7 +662,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx,const u8 *blkPtr,size_t b
         ctx->X[14] = X14 ^ w[14];
         ctx->X[15] = X15 ^ w[15];
 
-        Skein_Show_Round(BLK_BITS,&ctx->h,SKEIN_RND_FEED_FWD,ctx->X);
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
         
         ts[1] &= ~SKEIN_T1_FLAG_FIRST;
         blkPtr += SKEIN1024_BLOCK_BYTES;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 11/21] staging: crypto: skein: dos2unix, remove executable perms
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (9 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 10/21] staging: crypto: skein: cleanup whitespace around operators/punc Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 12/21] staging: crypto: skein: fix leading whitespace Jason Cooper
                     ` (9 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

$ find drivers/staging/skein -type f | xargs todos -d
$ chmod -x drivers/staging/skein/skeinApi.c
$ chmod -x drivers/staging/skein/include/skeinApi.h

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h    |  630 ++++++-------
 drivers/staging/skein/include/skeinApi.h |    0
 drivers/staging/skein/include/skein_iv.h |  398 ++++-----
 drivers/staging/skein/skein.c            | 1442 +++++++++++++++---------------
 drivers/staging/skein/skeinApi.c         |    0
 drivers/staging/skein/skeinBlockNo3F.c   |  344 +++----
 drivers/staging/skein/skein_block.c      | 1372 ++++++++++++++--------------
 7 files changed, 2093 insertions(+), 2093 deletions(-)
 mode change 100755 => 100644 drivers/staging/skein/include/skeinApi.h
 mode change 100755 => 100644 drivers/staging/skein/skeinApi.c

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index fef29ad64c93..18bb15824e41 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -1,315 +1,315 @@
-#ifndef _SKEIN_H_
-#define _SKEIN_H_     1
-/**************************************************************************
-**
-** Interface declarations and internal definitions for Skein hashing.
-**
-** Source code author: Doug Whiting, 2008.
-**
-** This algorithm and source code is released to the public domain.
-**
-***************************************************************************
-** 
-** The following compile-time switches may be defined to control some
-** tradeoffs between speed, code size, error checking, and security.
-**
-** The "default" note explains what happens when the switch is not defined.
-**
-**  SKEIN_DEBUG            -- make callouts from inside Skein code
-**                            to examine/display intermediate values.
-**                            [default: no callouts (no overhead)]
-**
-**  SKEIN_ERR_CHECK        -- how error checking is handled inside Skein
-**                            code. If not defined, most error checking 
-**                            is disabled (for performance). Otherwise, 
-**                            the switch value is interpreted as:
-**                                0: use assert()      to flag errors
-**                                1: return SKEIN_FAIL to flag errors
-**
-***************************************************************************/
-
-#ifndef RotL_64
-#define RotL_64(x, N)    (((x) << (N)) | ((x) >> (64-(N))))
-#endif
-
-/* below two prototype assume we are handed aligned data */
-#define Skein_Put64_LSB_First(dst08, src64, bCnt) memcpy(dst08, src64, bCnt)
-#define Skein_Get64_LSB_First(dst64, src08, wCnt) memcpy(dst64, src08, 8*(wCnt))
-#define Skein_Swap64(w64)  (w64)
-
-enum
-    {
-    SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
-    SKEIN_FAIL            =      1,
-    SKEIN_BAD_HASHLEN     =      2
-    };
-
-#define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
-
-#define  SKEIN_256_STATE_WORDS  (4)
-#define  SKEIN_512_STATE_WORDS  (8)
-#define  SKEIN1024_STATE_WORDS (16)
-#define  SKEIN_MAX_STATE_WORDS (16)
-
-#define  SKEIN_256_STATE_BYTES  (8*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_STATE_BYTES  (8*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_STATE_BYTES  (8*SKEIN1024_STATE_WORDS)
-
-#define  SKEIN_256_STATE_BITS  (64*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_STATE_BITS  (64*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_STATE_BITS  (64*SKEIN1024_STATE_WORDS)
-
-#define  SKEIN_256_BLOCK_BYTES  (8*SKEIN_256_STATE_WORDS)
-#define  SKEIN_512_BLOCK_BYTES  (8*SKEIN_512_STATE_WORDS)
-#define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
-
-struct skein_ctx_hdr
-    {
-    size_t  hashBitLen;                      /* size of hash result, in bits */
-    size_t  bCnt;                            /* current byte count in buffer b[] */
-    u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
-    };
-
-struct skein_256_ctx                               /*  256-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
-
-struct skein_512_ctx                             /*  512-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
-
-struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
-
-/*   Skein APIs for (incremental) "straight hashing" */
-int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
-int  Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen);
-int  Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen);
-
-int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-
-int  Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
-
-/*
-**   Skein APIs for "extended" initialization: MAC keys, tree hashing.
-**   After an InitExt() call, just use Update/Final calls as with Init().
-**
-**   Notes: Same parameters as _Init() calls, plus treeInfo/key/keyBytes.
-**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL, 
-**              the results of InitExt() are identical to calling Init().
-**          The function Init() may be called once to "precompute" the IV for
-**              a given hashBitLen value, then by saving a copy of the context
-**              the IV computation may be avoided in later calls.
-**          Similarly, the function InitExt() may be called once per MAC key 
-**              to precompute the MAC IV, then a copy of the context saved and
-**              reused for each new MAC computation.
-**/
-int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-
-/*
-**   Skein APIs for MAC and tree hash:
-**      Final_Pad:  pad, do final block, but no OUTPUT type
-**      Output:     do just the output stage
-*/
-int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal);
-
-#ifndef SKEIN_TREE_HASH
-#define SKEIN_TREE_HASH (1)
-#endif
-#if  SKEIN_TREE_HASH
-int  Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal);
-int  Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal);
-int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
-#endif
-
-/*****************************************************************
-** "Internal" Skein definitions
-**    -- not needed for sequential hashing API, but will be 
-**           helpful for other uses of Skein (e.g., tree hash mode).
-**    -- included here so that they can be shared between
-**           reference and optimized code.
-******************************************************************/
-
-/* tweak word T[1]: bit field starting positions */
-#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
-                                
-#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
-#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
-#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
-#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
-#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
-                                
-/* tweak word T[1]: flag bit definition(s) */
-#define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
-#define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
-#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
-                                
-/* tweak word T[1]: tree level bit field mask */
-#define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
-#define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
-
-/* tweak word T[1]: block type field */
-#define SKEIN_BLK_TYPE_KEY       (0)                    /* key, for MAC and KDF */
-#define SKEIN_BLK_TYPE_CFG       (4)                    /* configuration block */
-#define SKEIN_BLK_TYPE_PERS      (8)                    /* personalization string */
-#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
-#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
-#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
-#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
-#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
-#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
-
-#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
-#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
-#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
-#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
-#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
-#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
-#define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
-#define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
-#define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
-#define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */
-
-#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
-#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)
-
-#define SKEIN_VERSION           (1)
-
-#ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
-#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
-#endif
-
-#define SKEIN_MK_64(hi32, lo32)  ((lo32) + (((u64) (hi32)) << 32))
-#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION, SKEIN_ID_STRING_LE)
-#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA, 0xA9FC1A22)
-
-#define SKEIN_CFG_STR_LEN       (4*8)
-
-/* bit field definitions in config block treeInfo word */
-#define SKEIN_CFG_TREE_LEAF_SIZE_POS  (0)
-#define SKEIN_CFG_TREE_NODE_SIZE_POS  (8)
-#define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
-
-#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
-#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
-#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
-
-#define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
-    ((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
-     (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
-     (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
-
-#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
-
-/*
-**   Skein macros for getting/setting tweak words, etc.
-**   These are useful for partial input bytes, hash tree init/update, etc.
-**/
-#define Skein_Get_Tweak(ctxPtr, TWK_NUM)          ((ctxPtr)->h.T[TWK_NUM])
-#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal); }
-
-#define Skein_Get_T0(ctxPtr)     Skein_Get_Tweak(ctxPtr, 0)
-#define Skein_Get_T1(ctxPtr)     Skein_Get_Tweak(ctxPtr, 1)
-#define Skein_Set_T0(ctxPtr, T0) Skein_Set_Tweak(ctxPtr, 0, T0)
-#define Skein_Set_T1(ctxPtr, T1) Skein_Set_Tweak(ctxPtr, 1, T1)
-
-/* set both tweak words at once */
-#define Skein_Set_T0_T1(ctxPtr, T0, T1)           \
-    {                                           \
-    Skein_Set_T0(ctxPtr, (T0));                  \
-    Skein_Set_T1(ctxPtr, (T1));                  \
-    }
-
-#define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
-    Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
-
-/* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
-#define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
-    { Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
-
-#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
-#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
-
-#define Skein_Set_Tree_Level(hdr, height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); }
-
-/*****************************************************************
-** "Internal" Skein definitions for debugging and error checking
-******************************************************************/
-#ifdef SKEIN_DEBUG             /* examine/display intermediate values? */
-#include "skein_debug.h"
-#else                           /* default is no callouts */
-#define Skein_Show_Block(bits, ctx, X, blkPtr, wPtr, ksEvenPtr, ksOddPtr)
-#define Skein_Show_Round(bits, ctx, r, X)
-#define Skein_Show_R_Ptr(bits, ctx, r, X_ptr)
-#define Skein_Show_Final(bits, ctx, cnt, outPtr)
-#define Skein_Show_Key(bits, ctx, key, keyBytes)
-#endif
-
-#define Skein_Assert(x, retCode)/* default: ignore all Asserts, for performance */
-#define Skein_assert(x)
-
-/*****************************************************************
-** Skein block function constants (shared across Ref and Opt code)
-******************************************************************/
-enum    
-    {   
-        /* Skein_256 round rotation constants */
-    R_256_0_0 = 14, R_256_0_1 = 16,
-    R_256_1_0 = 52, R_256_1_1 = 57,
-    R_256_2_0 = 23, R_256_2_1 = 40,
-    R_256_3_0 =  5, R_256_3_1 = 37,
-    R_256_4_0 = 25, R_256_4_1 = 33,
-    R_256_5_0 = 46, R_256_5_1 = 12,
-    R_256_6_0 = 58, R_256_6_1 = 22,
-    R_256_7_0 = 32, R_256_7_1 = 32,
-
-        /* Skein_512 round rotation constants */
-    R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
-    R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
-    R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
-    R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
-    R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
-    R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
-    R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
-    R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
-
-        /* Skein1024 round rotation constants */
-    R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
-    R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
-    R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
-    R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
-    R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
-    R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
-    R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
-    R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
-    };
-
-#ifndef SKEIN_ROUNDS
-#define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
-#define SKEIN_512_ROUNDS_TOTAL (72)
-#define SKEIN1024_ROUNDS_TOTAL (80)
-#else                                        /* allow command-line define in range 8*(5..14)   */
-#define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
-#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/10)  + 5) % 10) + 5))
-#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS)     + 5) % 10) + 5))
-#endif
-
-#endif  /* ifndef _SKEIN_H_ */
+#ifndef _SKEIN_H_
+#define _SKEIN_H_     1
+/**************************************************************************
+**
+** Interface declarations and internal definitions for Skein hashing.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+***************************************************************************
+** 
+** The following compile-time switches may be defined to control some
+** tradeoffs between speed, code size, error checking, and security.
+**
+** The "default" note explains what happens when the switch is not defined.
+**
+**  SKEIN_DEBUG            -- make callouts from inside Skein code
+**                            to examine/display intermediate values.
+**                            [default: no callouts (no overhead)]
+**
+**  SKEIN_ERR_CHECK        -- how error checking is handled inside Skein
+**                            code. If not defined, most error checking 
+**                            is disabled (for performance). Otherwise, 
+**                            the switch value is interpreted as:
+**                                0: use assert()      to flag errors
+**                                1: return SKEIN_FAIL to flag errors
+**
+***************************************************************************/
+
+#ifndef RotL_64
+#define RotL_64(x, N)    (((x) << (N)) | ((x) >> (64-(N))))
+#endif
+
+/* below two prototype assume we are handed aligned data */
+#define Skein_Put64_LSB_First(dst08, src64, bCnt) memcpy(dst08, src64, bCnt)
+#define Skein_Get64_LSB_First(dst64, src08, wCnt) memcpy(dst64, src08, 8*(wCnt))
+#define Skein_Swap64(w64)  (w64)
+
+enum
+    {
+    SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
+    SKEIN_FAIL            =      1,
+    SKEIN_BAD_HASHLEN     =      2
+    };
+
+#define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
+
+#define  SKEIN_256_STATE_WORDS  (4)
+#define  SKEIN_512_STATE_WORDS  (8)
+#define  SKEIN1024_STATE_WORDS (16)
+#define  SKEIN_MAX_STATE_WORDS (16)
+
+#define  SKEIN_256_STATE_BYTES  (8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BYTES  (8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BYTES  (8*SKEIN1024_STATE_WORDS)
+
+#define  SKEIN_256_STATE_BITS  (64*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_STATE_BITS  (64*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_STATE_BITS  (64*SKEIN1024_STATE_WORDS)
+
+#define  SKEIN_256_BLOCK_BYTES  (8*SKEIN_256_STATE_WORDS)
+#define  SKEIN_512_BLOCK_BYTES  (8*SKEIN_512_STATE_WORDS)
+#define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
+
+struct skein_ctx_hdr
+    {
+    size_t  hashBitLen;                      /* size of hash result, in bits */
+    size_t  bCnt;                            /* current byte count in buffer b[] */
+    u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+    };
+
+struct skein_256_ctx                               /*  256-bit Skein hash context structure */
+    {
+    struct skein_ctx_hdr h;                      /* common header context variables */
+    u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    };
+
+struct skein_512_ctx                             /*  512-bit Skein hash context structure */
+    {
+    struct skein_ctx_hdr h;                      /* common header context variables */
+    u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    };
+
+struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
+    {
+    struct skein_ctx_hdr h;                      /* common header context variables */
+    u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
+    u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+    };
+
+/*   Skein APIs for (incremental) "straight hashing" */
+int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
+int  Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen);
+int  Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen);
+
+int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+
+int  Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
+
+/*
+**   Skein APIs for "extended" initialization: MAC keys, tree hashing.
+**   After an InitExt() call, just use Update/Final calls as with Init().
+**
+**   Notes: Same parameters as _Init() calls, plus treeInfo/key/keyBytes.
+**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL, 
+**              the results of InitExt() are identical to calling Init().
+**          The function Init() may be called once to "precompute" the IV for
+**              a given hashBitLen value, then by saving a copy of the context
+**              the IV computation may be avoided in later calls.
+**          Similarly, the function InitExt() may be called once per MAC key 
+**              to precompute the MAC IV, then a copy of the context saved and
+**              reused for each new MAC computation.
+**/
+int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+
+/*
+**   Skein APIs for MAC and tree hash:
+**      Final_Pad:  pad, do final block, but no OUTPUT type
+**      Output:     do just the output stage
+*/
+int  Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal);
+
+#ifndef SKEIN_TREE_HASH
+#define SKEIN_TREE_HASH (1)
+#endif
+#if  SKEIN_TREE_HASH
+int  Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal);
+int  Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal);
+int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
+#endif
+
+/*****************************************************************
+** "Internal" Skein definitions
+**    -- not needed for sequential hashing API, but will be 
+**           helpful for other uses of Skein (e.g., tree hash mode).
+**    -- included here so that they can be shared between
+**           reference and optimized code.
+******************************************************************/
+
+/* tweak word T[1]: bit field starting positions */
+#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
+                                
+#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
+#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
+#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
+#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
+#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
+                                
+/* tweak word T[1]: flag bit definition(s) */
+#define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
+#define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
+#define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
+                                
+/* tweak word T[1]: tree level bit field mask */
+#define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
+#define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
+
+/* tweak word T[1]: block type field */
+#define SKEIN_BLK_TYPE_KEY       (0)                    /* key, for MAC and KDF */
+#define SKEIN_BLK_TYPE_CFG       (4)                    /* configuration block */
+#define SKEIN_BLK_TYPE_PERS      (8)                    /* personalization string */
+#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
+#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
+#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
+#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
+#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
+#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
+
+#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
+#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
+#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
+#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
+#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
+#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
+#define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
+#define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
+#define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
+#define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */
+
+#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
+#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)
+
+#define SKEIN_VERSION           (1)
+
+#ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
+#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
+#endif
+
+#define SKEIN_MK_64(hi32, lo32)  ((lo32) + (((u64) (hi32)) << 32))
+#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION, SKEIN_ID_STRING_LE)
+#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA, 0xA9FC1A22)
+
+#define SKEIN_CFG_STR_LEN       (4*8)
+
+/* bit field definitions in config block treeInfo word */
+#define SKEIN_CFG_TREE_LEAF_SIZE_POS  (0)
+#define SKEIN_CFG_TREE_NODE_SIZE_POS  (8)
+#define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
+
+#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
+#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
+#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
+
+#define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
+    ((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+     (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+     (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
+
+#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
+
+/*
+**   Skein macros for getting/setting tweak words, etc.
+**   These are useful for partial input bytes, hash tree init/update, etc.
+**/
+#define Skein_Get_Tweak(ctxPtr, TWK_NUM)          ((ctxPtr)->h.T[TWK_NUM])
+#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal); }
+
+#define Skein_Get_T0(ctxPtr)     Skein_Get_Tweak(ctxPtr, 0)
+#define Skein_Get_T1(ctxPtr)     Skein_Get_Tweak(ctxPtr, 1)
+#define Skein_Set_T0(ctxPtr, T0) Skein_Set_Tweak(ctxPtr, 0, T0)
+#define Skein_Set_T1(ctxPtr, T1) Skein_Set_Tweak(ctxPtr, 1, T1)
+
+/* set both tweak words at once */
+#define Skein_Set_T0_T1(ctxPtr, T0, T1)           \
+    {                                           \
+    Skein_Set_T0(ctxPtr, (T0));                  \
+    Skein_Set_T1(ctxPtr, (T1));                  \
+    }
+
+#define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
+    Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
+
+/* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
+#define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
+    { Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
+
+#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
+#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
+
+#define Skein_Set_Tree_Level(hdr, height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); }
+
+/*****************************************************************
+** "Internal" Skein definitions for debugging and error checking
+******************************************************************/
+#ifdef SKEIN_DEBUG             /* examine/display intermediate values? */
+#include "skein_debug.h"
+#else                           /* default is no callouts */
+#define Skein_Show_Block(bits, ctx, X, blkPtr, wPtr, ksEvenPtr, ksOddPtr)
+#define Skein_Show_Round(bits, ctx, r, X)
+#define Skein_Show_R_Ptr(bits, ctx, r, X_ptr)
+#define Skein_Show_Final(bits, ctx, cnt, outPtr)
+#define Skein_Show_Key(bits, ctx, key, keyBytes)
+#endif
+
+#define Skein_Assert(x, retCode)/* default: ignore all Asserts, for performance */
+#define Skein_assert(x)
+
+/*****************************************************************
+** Skein block function constants (shared across Ref and Opt code)
+******************************************************************/
+enum    
+    {   
+        /* Skein_256 round rotation constants */
+    R_256_0_0 = 14, R_256_0_1 = 16,
+    R_256_1_0 = 52, R_256_1_1 = 57,
+    R_256_2_0 = 23, R_256_2_1 = 40,
+    R_256_3_0 =  5, R_256_3_1 = 37,
+    R_256_4_0 = 25, R_256_4_1 = 33,
+    R_256_5_0 = 46, R_256_5_1 = 12,
+    R_256_6_0 = 58, R_256_6_1 = 22,
+    R_256_7_0 = 32, R_256_7_1 = 32,
+
+        /* Skein_512 round rotation constants */
+    R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
+    R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
+    R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
+    R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
+    R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
+    R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
+    R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
+    R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
+
+        /* Skein1024 round rotation constants */
+    R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
+    R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
+    R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
+    R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
+    R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
+    R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
+    R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
+    R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
+    };
+
+#ifndef SKEIN_ROUNDS
+#define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
+#define SKEIN_512_ROUNDS_TOTAL (72)
+#define SKEIN1024_ROUNDS_TOTAL (80)
+#else                                        /* allow command-line define in range 8*(5..14)   */
+#define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
+#define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/10)  + 5) % 10) + 5))
+#define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS)     + 5) % 10) + 5))
+#endif
+
+#endif  /* ifndef _SKEIN_H_ */
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
old mode 100755
new mode 100644
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index aff9394551a0..813bad528e3c 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -1,199 +1,199 @@
-#ifndef _SKEIN_IV_H_
-#define _SKEIN_IV_H_
-
-#include <skein.h>    /* get Skein macros and types */
-
-/*
-***************** Pre-computed Skein IVs *******************
-**
-** NOTE: these values are not "magic" constants, but
-** are generated using the Threefish block function.
-** They are pre-computed here only for speed; i.e., to
-** avoid the need for a Threefish call during Init().
-**
-** The IV for any fixed hash length may be pre-computed.
-** Only the most common values are included here.
-**
-************************************************************
-**/
-
-#define MK_64 SKEIN_MK_64
-
-/* blkSize =  256 bits. hashSize =  128 bits */
-const u64 SKEIN_256_IV_128[] =
-    {
-    MK_64(0xE1111906, 0x964D7260),
-    MK_64(0x883DAAA7, 0x7C8D811C),
-    MK_64(0x10080DF4, 0x91960F7A),
-    MK_64(0xCCF7DDE5, 0xB45BC1C2)
-    };
-
-/* blkSize =  256 bits. hashSize =  160 bits */
-const u64 SKEIN_256_IV_160[] =
-    {
-    MK_64(0x14202314, 0x72825E98),
-    MK_64(0x2AC4E9A2, 0x5A77E590),
-    MK_64(0xD47A5856, 0x8838D63E),
-    MK_64(0x2DD2E496, 0x8586AB7D)
-    };
-
-/* blkSize =  256 bits. hashSize =  224 bits */
-const u64 SKEIN_256_IV_224[] =
-    {
-    MK_64(0xC6098A8C, 0x9AE5EA0B),
-    MK_64(0x876D5686, 0x08C5191C),
-    MK_64(0x99CB88D7, 0xD7F53884),
-    MK_64(0x384BDDB1, 0xAEDDB5DE)
-    };
-
-/* blkSize =  256 bits. hashSize =  256 bits */
-const u64 SKEIN_256_IV_256[] =
-    {
-    MK_64(0xFC9DA860, 0xD048B449),
-    MK_64(0x2FCA6647, 0x9FA7D833),
-    MK_64(0xB33BC389, 0x6656840F),
-    MK_64(0x6A54E920, 0xFDE8DA69)
-    };
-
-/* blkSize =  512 bits. hashSize =  128 bits */
-const u64 SKEIN_512_IV_128[] =
-    {
-    MK_64(0xA8BC7BF3, 0x6FBF9F52),
-    MK_64(0x1E9872CE, 0xBD1AF0AA),
-    MK_64(0x309B1790, 0xB32190D3),
-    MK_64(0xBCFBB854, 0x3F94805C),
-    MK_64(0x0DA61BCD, 0x6E31B11B),
-    MK_64(0x1A18EBEA, 0xD46A32E3),
-    MK_64(0xA2CC5B18, 0xCE84AA82),
-    MK_64(0x6982AB28, 0x9D46982D)
-    };
-
-/* blkSize =  512 bits. hashSize =  160 bits */
-const u64 SKEIN_512_IV_160[] =
-    {
-    MK_64(0x28B81A2A, 0xE013BD91),
-    MK_64(0xC2F11668, 0xB5BDF78F),
-    MK_64(0x1760D8F3, 0xF6A56F12),
-    MK_64(0x4FB74758, 0x8239904F),
-    MK_64(0x21EDE07F, 0x7EAF5056),
-    MK_64(0xD908922E, 0x63ED70B8),
-    MK_64(0xB8EC76FF, 0xECCB52FA),
-    MK_64(0x01A47BB8, 0xA3F27A6E)
-    };
-
-/* blkSize =  512 bits. hashSize =  224 bits */
-const u64 SKEIN_512_IV_224[] =
-    {
-    MK_64(0xCCD06162, 0x48677224),
-    MK_64(0xCBA65CF3, 0xA92339EF),
-    MK_64(0x8CCD69D6, 0x52FF4B64),
-    MK_64(0x398AED7B, 0x3AB890B4),
-    MK_64(0x0F59D1B1, 0x457D2BD0),
-    MK_64(0x6776FE65, 0x75D4EB3D),
-    MK_64(0x99FBC70E, 0x997413E9),
-    MK_64(0x9E2CFCCF, 0xE1C41EF7)
-    };
-
-/* blkSize =  512 bits. hashSize =  256 bits */
-const u64 SKEIN_512_IV_256[] =
-    {
-    MK_64(0xCCD044A1, 0x2FDB3E13),
-    MK_64(0xE8359030, 0x1A79A9EB),
-    MK_64(0x55AEA061, 0x4F816E6F),
-    MK_64(0x2A2767A4, 0xAE9B94DB),
-    MK_64(0xEC06025E, 0x74DD7683),
-    MK_64(0xE7A436CD, 0xC4746251),
-    MK_64(0xC36FBAF9, 0x393AD185),
-    MK_64(0x3EEDBA18, 0x33EDFC13)
-    };
-
-/* blkSize =  512 bits. hashSize =  384 bits */
-const u64 SKEIN_512_IV_384[] =
-    {
-    MK_64(0xA3F6C6BF, 0x3A75EF5F),
-    MK_64(0xB0FEF9CC, 0xFD84FAA4),
-    MK_64(0x9D77DD66, 0x3D770CFE),
-    MK_64(0xD798CBF3, 0xB468FDDA),
-    MK_64(0x1BC4A666, 0x8A0E4465),
-    MK_64(0x7ED7D434, 0xE5807407),
-    MK_64(0x548FC1AC, 0xD4EC44D6),
-    MK_64(0x266E1754, 0x6AA18FF8)
-    };
-
-/* blkSize =  512 bits. hashSize =  512 bits */
-const u64 SKEIN_512_IV_512[] =
-    {
-    MK_64(0x4903ADFF, 0x749C51CE),
-    MK_64(0x0D95DE39, 0x9746DF03),
-    MK_64(0x8FD19341, 0x27C79BCE),
-    MK_64(0x9A255629, 0xFF352CB1),
-    MK_64(0x5DB62599, 0xDF6CA7B0),
-    MK_64(0xEABE394C, 0xA9D5C3F4),
-    MK_64(0x991112C7, 0x1A75B523),
-    MK_64(0xAE18A40B, 0x660FCC33)
-    };
-
-/* blkSize = 1024 bits. hashSize =  384 bits */
-const u64 SKEIN1024_IV_384[] =
-    {
-    MK_64(0x5102B6B8, 0xC1894A35),
-    MK_64(0xFEEBC9E3, 0xFE8AF11A),
-    MK_64(0x0C807F06, 0xE32BED71),
-    MK_64(0x60C13A52, 0xB41A91F6),
-    MK_64(0x9716D35D, 0xD4917C38),
-    MK_64(0xE780DF12, 0x6FD31D3A),
-    MK_64(0x797846B6, 0xC898303A),
-    MK_64(0xB172C2A8, 0xB3572A3B),
-    MK_64(0xC9BC8203, 0xA6104A6C),
-    MK_64(0x65909338, 0xD75624F4),
-    MK_64(0x94BCC568, 0x4B3F81A0),
-    MK_64(0x3EBBF51E, 0x10ECFD46),
-    MK_64(0x2DF50F0B, 0xEEB08542),
-    MK_64(0x3B5A6530, 0x0DBC6516),
-    MK_64(0x484B9CD2, 0x167BBCE1),
-    MK_64(0x2D136947, 0xD4CBAFEA)
-    };
-
-/* blkSize = 1024 bits. hashSize =  512 bits */
-const u64 SKEIN1024_IV_512[] =
-    {
-    MK_64(0xCAEC0E5D, 0x7C1B1B18),
-    MK_64(0xA01B0E04, 0x5F03E802),
-    MK_64(0x33840451, 0xED912885),
-    MK_64(0x374AFB04, 0xEAEC2E1C),
-    MK_64(0xDF25A0E2, 0x813581F7),
-    MK_64(0xE4004093, 0x8B12F9D2),
-    MK_64(0xA662D539, 0xC2ED39B6),
-    MK_64(0xFA8B85CF, 0x45D8C75A),
-    MK_64(0x8316ED8E, 0x29EDE796),
-    MK_64(0x053289C0, 0x2E9F91B8),
-    MK_64(0xC3F8EF1D, 0x6D518B73),
-    MK_64(0xBDCEC3C4, 0xD5EF332E),
-    MK_64(0x549A7E52, 0x22974487),
-    MK_64(0x67070872, 0x5B749816),
-    MK_64(0xB9CD28FB, 0xF0581BD1),
-    MK_64(0x0E2940B8, 0x15804974)
-    };
-
-/* blkSize = 1024 bits. hashSize = 1024 bits */
-const u64 SKEIN1024_IV_1024[] =
-    {
-    MK_64(0xD593DA07, 0x41E72355),
-    MK_64(0x15B5E511, 0xAC73E00C),
-    MK_64(0x5180E5AE, 0xBAF2C4F0),
-    MK_64(0x03BD41D3, 0xFCBCAFAF),
-    MK_64(0x1CAEC6FD, 0x1983A898),
-    MK_64(0x6E510B8B, 0xCDD0589F),
-    MK_64(0x77E2BDFD, 0xC6394ADA),
-    MK_64(0xC11E1DB5, 0x24DCB0A3),
-    MK_64(0xD6D14AF9, 0xC6329AB5),
-    MK_64(0x6A9B0BFC, 0x6EB67E0D),
-    MK_64(0x9243C60D, 0xCCFF1332),
-    MK_64(0x1A1F1DDE, 0x743F02D4),
-    MK_64(0x0996753C, 0x10ED0BB8),
-    MK_64(0x6572DD22, 0xF2B4969A),
-    MK_64(0x61FD3062, 0xD00A579A),
-    MK_64(0x1DE0536E, 0x8682E539)
-    };
-
-#endif /* _SKEIN_IV_H_ */
+#ifndef _SKEIN_IV_H_
+#define _SKEIN_IV_H_
+
+#include <skein.h>    /* get Skein macros and types */
+
+/*
+***************** Pre-computed Skein IVs *******************
+**
+** NOTE: these values are not "magic" constants, but
+** are generated using the Threefish block function.
+** They are pre-computed here only for speed; i.e., to
+** avoid the need for a Threefish call during Init().
+**
+** The IV for any fixed hash length may be pre-computed.
+** Only the most common values are included here.
+**
+************************************************************
+**/
+
+#define MK_64 SKEIN_MK_64
+
+/* blkSize =  256 bits. hashSize =  128 bits */
+const u64 SKEIN_256_IV_128[] =
+    {
+    MK_64(0xE1111906, 0x964D7260),
+    MK_64(0x883DAAA7, 0x7C8D811C),
+    MK_64(0x10080DF4, 0x91960F7A),
+    MK_64(0xCCF7DDE5, 0xB45BC1C2)
+    };
+
+/* blkSize =  256 bits. hashSize =  160 bits */
+const u64 SKEIN_256_IV_160[] =
+    {
+    MK_64(0x14202314, 0x72825E98),
+    MK_64(0x2AC4E9A2, 0x5A77E590),
+    MK_64(0xD47A5856, 0x8838D63E),
+    MK_64(0x2DD2E496, 0x8586AB7D)
+    };
+
+/* blkSize =  256 bits. hashSize =  224 bits */
+const u64 SKEIN_256_IV_224[] =
+    {
+    MK_64(0xC6098A8C, 0x9AE5EA0B),
+    MK_64(0x876D5686, 0x08C5191C),
+    MK_64(0x99CB88D7, 0xD7F53884),
+    MK_64(0x384BDDB1, 0xAEDDB5DE)
+    };
+
+/* blkSize =  256 bits. hashSize =  256 bits */
+const u64 SKEIN_256_IV_256[] =
+    {
+    MK_64(0xFC9DA860, 0xD048B449),
+    MK_64(0x2FCA6647, 0x9FA7D833),
+    MK_64(0xB33BC389, 0x6656840F),
+    MK_64(0x6A54E920, 0xFDE8DA69)
+    };
+
+/* blkSize =  512 bits. hashSize =  128 bits */
+const u64 SKEIN_512_IV_128[] =
+    {
+    MK_64(0xA8BC7BF3, 0x6FBF9F52),
+    MK_64(0x1E9872CE, 0xBD1AF0AA),
+    MK_64(0x309B1790, 0xB32190D3),
+    MK_64(0xBCFBB854, 0x3F94805C),
+    MK_64(0x0DA61BCD, 0x6E31B11B),
+    MK_64(0x1A18EBEA, 0xD46A32E3),
+    MK_64(0xA2CC5B18, 0xCE84AA82),
+    MK_64(0x6982AB28, 0x9D46982D)
+    };
+
+/* blkSize =  512 bits. hashSize =  160 bits */
+const u64 SKEIN_512_IV_160[] =
+    {
+    MK_64(0x28B81A2A, 0xE013BD91),
+    MK_64(0xC2F11668, 0xB5BDF78F),
+    MK_64(0x1760D8F3, 0xF6A56F12),
+    MK_64(0x4FB74758, 0x8239904F),
+    MK_64(0x21EDE07F, 0x7EAF5056),
+    MK_64(0xD908922E, 0x63ED70B8),
+    MK_64(0xB8EC76FF, 0xECCB52FA),
+    MK_64(0x01A47BB8, 0xA3F27A6E)
+    };
+
+/* blkSize =  512 bits. hashSize =  224 bits */
+const u64 SKEIN_512_IV_224[] =
+    {
+    MK_64(0xCCD06162, 0x48677224),
+    MK_64(0xCBA65CF3, 0xA92339EF),
+    MK_64(0x8CCD69D6, 0x52FF4B64),
+    MK_64(0x398AED7B, 0x3AB890B4),
+    MK_64(0x0F59D1B1, 0x457D2BD0),
+    MK_64(0x6776FE65, 0x75D4EB3D),
+    MK_64(0x99FBC70E, 0x997413E9),
+    MK_64(0x9E2CFCCF, 0xE1C41EF7)
+    };
+
+/* blkSize =  512 bits. hashSize =  256 bits */
+const u64 SKEIN_512_IV_256[] =
+    {
+    MK_64(0xCCD044A1, 0x2FDB3E13),
+    MK_64(0xE8359030, 0x1A79A9EB),
+    MK_64(0x55AEA061, 0x4F816E6F),
+    MK_64(0x2A2767A4, 0xAE9B94DB),
+    MK_64(0xEC06025E, 0x74DD7683),
+    MK_64(0xE7A436CD, 0xC4746251),
+    MK_64(0xC36FBAF9, 0x393AD185),
+    MK_64(0x3EEDBA18, 0x33EDFC13)
+    };
+
+/* blkSize =  512 bits. hashSize =  384 bits */
+const u64 SKEIN_512_IV_384[] =
+    {
+    MK_64(0xA3F6C6BF, 0x3A75EF5F),
+    MK_64(0xB0FEF9CC, 0xFD84FAA4),
+    MK_64(0x9D77DD66, 0x3D770CFE),
+    MK_64(0xD798CBF3, 0xB468FDDA),
+    MK_64(0x1BC4A666, 0x8A0E4465),
+    MK_64(0x7ED7D434, 0xE5807407),
+    MK_64(0x548FC1AC, 0xD4EC44D6),
+    MK_64(0x266E1754, 0x6AA18FF8)
+    };
+
+/* blkSize =  512 bits. hashSize =  512 bits */
+const u64 SKEIN_512_IV_512[] =
+    {
+    MK_64(0x4903ADFF, 0x749C51CE),
+    MK_64(0x0D95DE39, 0x9746DF03),
+    MK_64(0x8FD19341, 0x27C79BCE),
+    MK_64(0x9A255629, 0xFF352CB1),
+    MK_64(0x5DB62599, 0xDF6CA7B0),
+    MK_64(0xEABE394C, 0xA9D5C3F4),
+    MK_64(0x991112C7, 0x1A75B523),
+    MK_64(0xAE18A40B, 0x660FCC33)
+    };
+
+/* blkSize = 1024 bits. hashSize =  384 bits */
+const u64 SKEIN1024_IV_384[] =
+    {
+    MK_64(0x5102B6B8, 0xC1894A35),
+    MK_64(0xFEEBC9E3, 0xFE8AF11A),
+    MK_64(0x0C807F06, 0xE32BED71),
+    MK_64(0x60C13A52, 0xB41A91F6),
+    MK_64(0x9716D35D, 0xD4917C38),
+    MK_64(0xE780DF12, 0x6FD31D3A),
+    MK_64(0x797846B6, 0xC898303A),
+    MK_64(0xB172C2A8, 0xB3572A3B),
+    MK_64(0xC9BC8203, 0xA6104A6C),
+    MK_64(0x65909338, 0xD75624F4),
+    MK_64(0x94BCC568, 0x4B3F81A0),
+    MK_64(0x3EBBF51E, 0x10ECFD46),
+    MK_64(0x2DF50F0B, 0xEEB08542),
+    MK_64(0x3B5A6530, 0x0DBC6516),
+    MK_64(0x484B9CD2, 0x167BBCE1),
+    MK_64(0x2D136947, 0xD4CBAFEA)
+    };
+
+/* blkSize = 1024 bits. hashSize =  512 bits */
+const u64 SKEIN1024_IV_512[] =
+    {
+    MK_64(0xCAEC0E5D, 0x7C1B1B18),
+    MK_64(0xA01B0E04, 0x5F03E802),
+    MK_64(0x33840451, 0xED912885),
+    MK_64(0x374AFB04, 0xEAEC2E1C),
+    MK_64(0xDF25A0E2, 0x813581F7),
+    MK_64(0xE4004093, 0x8B12F9D2),
+    MK_64(0xA662D539, 0xC2ED39B6),
+    MK_64(0xFA8B85CF, 0x45D8C75A),
+    MK_64(0x8316ED8E, 0x29EDE796),
+    MK_64(0x053289C0, 0x2E9F91B8),
+    MK_64(0xC3F8EF1D, 0x6D518B73),
+    MK_64(0xBDCEC3C4, 0xD5EF332E),
+    MK_64(0x549A7E52, 0x22974487),
+    MK_64(0x67070872, 0x5B749816),
+    MK_64(0xB9CD28FB, 0xF0581BD1),
+    MK_64(0x0E2940B8, 0x15804974)
+    };
+
+/* blkSize = 1024 bits. hashSize = 1024 bits */
+const u64 SKEIN1024_IV_1024[] =
+    {
+    MK_64(0xD593DA07, 0x41E72355),
+    MK_64(0x15B5E511, 0xAC73E00C),
+    MK_64(0x5180E5AE, 0xBAF2C4F0),
+    MK_64(0x03BD41D3, 0xFCBCAFAF),
+    MK_64(0x1CAEC6FD, 0x1983A898),
+    MK_64(0x6E510B8B, 0xCDD0589F),
+    MK_64(0x77E2BDFD, 0xC6394ADA),
+    MK_64(0xC11E1DB5, 0x24DCB0A3),
+    MK_64(0xD6D14AF9, 0xC6329AB5),
+    MK_64(0x6A9B0BFC, 0x6EB67E0D),
+    MK_64(0x9243C60D, 0xCCFF1332),
+    MK_64(0x1A1F1DDE, 0x743F02D4),
+    MK_64(0x0996753C, 0x10ED0BB8),
+    MK_64(0x6572DD22, 0xF2B4969A),
+    MK_64(0x61FD3062, 0xD00A579A),
+    MK_64(0x1DE0536E, 0x8682E539)
+    };
+
+#endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 0ea0a6aeb168..e2e5685157a0 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -1,721 +1,721 @@
-/***********************************************************************
-**
-** Implementation of the Skein hash function.
-**
-** Source code author: Doug Whiting, 2008.
-**
-** This algorithm and source code is released to the public domain.
-**
-************************************************************************/
-
-#define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
-
-#include <linux/string.h>       /* get the memcpy/memset functions */
-#include <skein.h> /* get the Skein API definitions   */
-#include <skein_iv.h>    /* get precomputed IVs */
-
-/*****************************************************************/
-/* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-void    Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-
-/*****************************************************************/
-/*     256-bit Skein                                             */
-/*****************************************************************/
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a straight hashing operation  */
-int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
-{
-    union
-    {
-        u8  b[SKEIN_256_STATE_BYTES];
-        u64  w[SKEIN_256_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {             /* use pre-computed values, where available */
-    case  256:
-        memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
-        break;
-    case  224:
-        memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
-        break;
-    case  160:
-        memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
-        break;
-    case  128:
-        memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
-{
-    union
-    {
-        u8  b[SKEIN_256_STATE_BYTES];
-        u64  w[SKEIN_256_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(256, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* process the input bytes */
-int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
-{
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
-            Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
-            msg        += n * SKEIN_256_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the result */
-int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_256_BLOCK_BYTES)
-            n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/*****************************************************************/
-/*     512-bit Skein                                             */
-/*****************************************************************/
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a straight hashing operation  */
-int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
-{
-    union
-    {
-        u8  b[SKEIN_512_STATE_BYTES];
-        u64  w[SKEIN_512_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {             /* use pre-computed values, where available */
-    case  512:
-        memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
-        break;
-    case  384:
-        memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
-        break;
-    case  256:
-        memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
-        break;
-    case  224:
-        memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
-{
-    union
-    {
-        u8  b[SKEIN_512_STATE_BYTES];
-        u64  w[SKEIN_512_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(512, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* process the input bytes */
-int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
-{
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
-            Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
-            msg        += n * SKEIN_512_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the result */
-int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_512_BLOCK_BYTES)
-            n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/*****************************************************************/
-/*    1024-bit Skein                                             */
-/*****************************************************************/
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a straight hashing operation  */
-int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
-{
-    union
-    {
-        u8  b[SKEIN1024_STATE_BYTES];
-        u64  w[SKEIN1024_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {              /* use pre-computed values, where available */
-    case  512:
-        memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
-        break;
-    case  384:
-        memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
-        break;
-    case 1024:
-        memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
-{
-    union
-    {
-        u8  b[SKEIN1024_STATE_BYTES];
-        u64  w[SKEIN1024_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(1024, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* process the input bytes */
-int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
-{
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
-            Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
-            msg        += n * SKEIN1024_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the result */
-int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN1024_BLOCK_BYTES)
-            n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/**************** Functions to support MAC/tree hashing ***************/
-/*   (this code is identical for Optimized and Reference versions)    */
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
-{
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
-
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
-{
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
-
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
-
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* finalize the hash computation and output the block, no OUTPUT stage */
-int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
-{
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
-
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
-
-    return SKEIN_SUCCESS;
-}
-
-#if SKEIN_TREE_HASH
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* just do the OUTPUT stage                                       */
-int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_256_BLOCK_BYTES)
-            n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* just do the OUTPUT stage                                       */
-int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_512_BLOCK_BYTES)
-            n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-
-/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
-/* just do the OUTPUT stage                                       */
-int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
-{
-    size_t i, n, byteCnt;
-    u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN1024_BLOCK_BYTES)
-            n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
-}
-#endif
+/***********************************************************************
+**
+** Implementation of the Skein hash function.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+************************************************************************/
+
+#define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
+
+#include <linux/string.h>       /* get the memcpy/memset functions */
+#include <skein.h> /* get the Skein API definitions   */
+#include <skein_iv.h>    /* get precomputed IVs */
+
+/*****************************************************************/
+/* External function to process blkCnt (nonzero) full block(s) of data. */
+void    Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void    Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+
+/*****************************************************************/
+/*     256-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u8  b[SKEIN_256_STATE_BYTES];
+        u64  w[SKEIN_256_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {             /* use pre-computed values, where available */
+    case  256:
+        memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
+        break;
+    case  224:
+        memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
+        break;
+    case  160:
+        memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
+        break;
+    case  128:
+        memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+        break;
+    }
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+{
+    union
+    {
+        u8  b[SKEIN_256_STATE_BYTES];
+        u64  w[SKEIN_256_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx, CFG_FINAL);
+
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(256, &ctx->h, key, keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
+            Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
+            msg        += n * SKEIN_256_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN_256_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_256_BLOCK_BYTES)
+            n  = SKEIN_256_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*****************************************************************/
+/*     512-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u8  b[SKEIN_512_STATE_BYTES];
+        u64  w[SKEIN_512_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {             /* use pre-computed values, where available */
+    case  512:
+        memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
+        break;
+    case  384:
+        memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
+        break;
+    case  256:
+        memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
+        break;
+    case  224:
+        memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+        break;
+    }
+
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+{
+    union
+    {
+        u8  b[SKEIN_512_STATE_BYTES];
+        u64  w[SKEIN_512_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx, CFG_FINAL);
+
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(512, &ctx->h, key, keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
+            Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
+            msg        += n * SKEIN_512_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN_512_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_512_BLOCK_BYTES)
+            n  = SKEIN_512_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*****************************************************************/
+/*    1024-bit Skein                                             */
+/*****************************************************************/
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a straight hashing operation  */
+int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
+{
+    union
+    {
+        u8  b[SKEIN1024_STATE_BYTES];
+        u64  w[SKEIN1024_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+    switch (hashBitLen)
+    {              /* use pre-computed values, where available */
+    case  512:
+        memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
+        break;
+    case  384:
+        memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
+        break;
+    case 1024:
+        memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
+        break;
+    default:
+        /* here if there is no precomputed IV value available */
+        /* build/process the config block, type == CONFIG (could be precomputed) */
+        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+        /* compute the initial chaining values from config block */
+        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+        Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+        break;
+    }
+
+    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* init the context for a MAC and/or tree hash operation */
+/* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+{
+    union
+    {
+        u8  b[SKEIN1024_STATE_BYTES];
+        u64  w[SKEIN1024_STATE_WORDS];
+    } cfg;                              /* config block */
+
+    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+    /* compute the initial chaining values ctx->X[], based on key */
+    if (keyBytes == 0)                          /* is there a key? */
+    {
+        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+    }
+    else                                        /* here to pre-process a key */
+    {
+        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+        /* do a mini-Init right here */
+        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+        Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
+        Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+    }
+    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
+    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+    Skein_Start_New_Type(ctx, CFG_FINAL);
+
+    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+    Skein_Show_Key(1024, &ctx->h, key, keyBytes);
+
+    /* compute the initial chaining values from config block */
+    Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+    /* The chaining vars ctx->X are now initialized */
+    /* Set up to process the data message portion of the hash (default) */
+    Skein_Start_New_Type(ctx, MSG);
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* process the input bytes */
+int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+{
+    size_t n;
+
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* process full blocks, if any */
+    if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
+    {
+        if (ctx->h.bCnt)                              /* finish up any buffered message data */
+        {
+            n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+            if (n)
+            {
+                Skein_assert(n < msgByteCnt);         /* check on our logic here */
+                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+                msgByteCnt  -= n;
+                msg         += n;
+                ctx->h.bCnt += n;
+            }
+            Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
+            Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
+            ctx->h.bCnt = 0;
+        }
+        /* now process any remaining full blocks, directly from input message data */
+        if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
+        {
+            n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
+            Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
+            msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
+            msg        += n * SKEIN1024_BLOCK_BYTES;
+        }
+        Skein_assert(ctx->h.bCnt == 0);
+    }
+
+    /* copy any remaining source message data bytes into b[] */
+    if (msgByteCnt)
+    {
+        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
+        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+        ctx->h.bCnt += msgByteCnt;
+    }
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the result */
+int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN1024_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+
+    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN1024_BLOCK_BYTES)
+            n  = SKEIN1024_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/**************** Functions to support MAC/tree hashing ***************/
+/*   (this code is identical for Optimized and Reference versions)    */
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* finalize the hash computation and output the block, no OUTPUT stage */
+int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
+{
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
+        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+
+    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+
+    return SKEIN_SUCCESS;
+}
+
+#if SKEIN_TREE_HASH
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN_256_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_256_BLOCK_BYTES)
+            n  = SKEIN_256_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN_512_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN_512_BLOCK_BYTES)
+            n  = SKEIN_512_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+
+/*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
+/* just do the OUTPUT stage                                       */
+int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
+{
+    size_t i, n, byteCnt;
+    u64 X[SKEIN1024_STATE_WORDS];
+    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+    /* now output the result */
+    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+    /* run Threefish in "counter mode" to generate output */
+    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
+    {
+        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+        Skein_Start_New_Type(ctx, OUT_FINAL);
+        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+        if (n >= SKEIN1024_BLOCK_BYTES)
+            n  = SKEIN1024_BLOCK_BYTES;
+        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+    }
+    return SKEIN_SUCCESS;
+}
+#endif
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
old mode 100755
new mode 100644
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index a4b1ec56ad83..d98933eeb0bf 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -1,172 +1,172 @@
-
-#include <linux/string.h>
-#include <skein.h>
-#include <threefishApi.h>
-
-
-/*****************************  Skein_256 ******************************/
-void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
-                             size_t blkCnt, size_t byteCntAdd)
-{
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
-    u64 words[3];
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish256, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN_256_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = ctx->X[0] ^ w[0];
-        ctx->X[1] = ctx->X[1] ^ w[1];
-        ctx->X[2] = ctx->X[2] ^ w[2];
-        ctx->X[3] = ctx->X[3] ^ w[3];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
-}
-
-void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
-                             size_t blkCnt, size_t byteCntAdd)
-{
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64 words[3];
-    u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish512, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN_512_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = ctx->X[0] ^ w[0];
-        ctx->X[1] = ctx->X[1] ^ w[1];
-        ctx->X[2] = ctx->X[2] ^ w[2];
-        ctx->X[3] = ctx->X[3] ^ w[3];
-        ctx->X[4] = ctx->X[4] ^ w[4];
-        ctx->X[5] = ctx->X[5] ^ w[5];
-        ctx->X[6] = ctx->X[6] ^ w[6];
-        ctx->X[7] = ctx->X[7] ^ w[7];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
-}
-
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
-                              size_t blkCnt, size_t byteCntAdd)
-{
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64 words[3];
-    u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish1024, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN1024_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0]  = ctx->X[0]  ^ w[0];
-        ctx->X[1]  = ctx->X[1]  ^ w[1];
-        ctx->X[2]  = ctx->X[2]  ^ w[2];
-        ctx->X[3]  = ctx->X[3]  ^ w[3];
-        ctx->X[4]  = ctx->X[4]  ^ w[4];
-        ctx->X[5]  = ctx->X[5]  ^ w[5];
-        ctx->X[6]  = ctx->X[6]  ^ w[6];
-        ctx->X[7]  = ctx->X[7]  ^ w[7];
-        ctx->X[8]  = ctx->X[8]  ^ w[8];
-        ctx->X[9]  = ctx->X[9]  ^ w[9];
-        ctx->X[10] = ctx->X[10] ^ w[10];
-        ctx->X[11] = ctx->X[11] ^ w[11];
-        ctx->X[12] = ctx->X[12] ^ w[12];
-        ctx->X[13] = ctx->X[13] ^ w[13];
-        ctx->X[14] = ctx->X[14] ^ w[14];
-        ctx->X[15] = ctx->X[15] ^ w[15];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
-}
+
+#include <linux/string.h>
+#include <skein.h>
+#include <threefishApi.h>
+
+
+/*****************************  Skein_256 ******************************/
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
+                             size_t blkCnt, size_t byteCntAdd)
+{
+    struct threefish_key key;
+    u64 tweak[2];
+    int i;
+    u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+    u64 words[3];
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64 carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish256, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN_256_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = ctx->X[0] ^ w[0];
+        ctx->X[1] = ctx->X[1] ^ w[1];
+        ctx->X[2] = ctx->X[2] ^ w[2];
+        ctx->X[3] = ctx->X[3] ^ w[3];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
+
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
+                             size_t blkCnt, size_t byteCntAdd)
+{
+    struct threefish_key key;
+    u64 tweak[2];
+    int i;
+    u64 words[3];
+    u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64 carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish512, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN_512_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = ctx->X[0] ^ w[0];
+        ctx->X[1] = ctx->X[1] ^ w[1];
+        ctx->X[2] = ctx->X[2] ^ w[2];
+        ctx->X[3] = ctx->X[3] ^ w[3];
+        ctx->X[4] = ctx->X[4] ^ w[4];
+        ctx->X[5] = ctx->X[5] ^ w[5];
+        ctx->X[6] = ctx->X[6] ^ w[6];
+        ctx->X[7] = ctx->X[7] ^ w[7];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
+
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
+                              size_t blkCnt, size_t byteCntAdd)
+{
+    struct threefish_key key;
+    u64 tweak[2];
+    int i;
+    u64 words[3];
+    u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    tweak[0] = ctx->h.T[0];
+    tweak[1] = ctx->h.T[1];
+
+    do  {
+        u64 carry = byteCntAdd;
+
+        words[0] = tweak[0] & 0xffffffffL;
+        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+        words[2] = (tweak[1] & 0xffffffffL);
+
+        for (i = 0; i < 3; i++) {
+            carry += words[i];
+            words[i] = carry;
+            carry >>= 32;
+        }        
+        tweak[0] = words[0] & 0xffffffffL;
+        tweak[0] |= (words[1] & 0xffffffffL) << 32;
+        tweak[1] |= words[2] & 0xffffffffL;
+
+        threefishSetKey(&key, Threefish1024, ctx->X, tweak);
+
+        Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
+
+        threefishEncryptBlockWords(&key, w, ctx->X);
+
+        blkPtr += SKEIN1024_BLOCK_BYTES;
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0]  = ctx->X[0]  ^ w[0];
+        ctx->X[1]  = ctx->X[1]  ^ w[1];
+        ctx->X[2]  = ctx->X[2]  ^ w[2];
+        ctx->X[3]  = ctx->X[3]  ^ w[3];
+        ctx->X[4]  = ctx->X[4]  ^ w[4];
+        ctx->X[5]  = ctx->X[5]  ^ w[5];
+        ctx->X[6]  = ctx->X[6]  ^ w[6];
+        ctx->X[7]  = ctx->X[7]  ^ w[7];
+        ctx->X[8]  = ctx->X[8]  ^ w[8];
+        ctx->X[9]  = ctx->X[9]  ^ w[9];
+        ctx->X[10] = ctx->X[10] ^ w[10];
+        ctx->X[11] = ctx->X[11] ^ w[11];
+        ctx->X[12] = ctx->X[12] ^ w[12];
+        ctx->X[13] = ctx->X[13] ^ w[13];
+        ctx->X[14] = ctx->X[14] ^ w[14];
+        ctx->X[15] = ctx->X[15] ^ w[15];
+
+        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+    } while (--blkCnt);
+
+    ctx->h.T[0] = tweak[0];
+    ctx->h.T[1] = tweak[1];
+}
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 791bacdd3d57..e62b6442783e 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -1,686 +1,686 @@
-/***********************************************************************
-**
-** Implementation of the Skein block functions.
-**
-** Source code author: Doug Whiting, 2008.
-**
-** This algorithm and source code is released to the public domain.
-**
-** Compile-time switches:
-**
-**  SKEIN_USE_ASM  -- set bits (256/512/1024) to select which
-**                    versions use ASM code for block processing
-**                    [default: use C for all block sizes]
-**
-************************************************************************/
-
-#include <linux/string.h>
-#include <skein.h>
-
-#ifndef SKEIN_USE_ASM
-#define SKEIN_USE_ASM   (0)                     /* default is all C code (no ASM) */
-#endif
-
-#ifndef SKEIN_LOOP
-#define SKEIN_LOOP 001                          /* default: unroll 256 and 512, but not 1024 */
-#endif
-
-#define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
-#define KW_TWK_BASE     (0)
-#define KW_KEY_BASE     (3)
-#define ks              (kw + KW_KEY_BASE)                
-#define ts              (kw + KW_TWK_BASE)
-
-#ifdef SKEIN_DEBUG
-#define DebugSaveTweak(ctx) { ctx->h.T[0] = ts[0]; ctx->h.T[1] = ts[1]; }
-#else
-#define DebugSaveTweak(ctx)
-#endif
-
-/*****************************  Skein_256 ******************************/
-#if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C */
-    enum {
-        WCNT = SKEIN_256_STATE_WORDS
-        };
-#undef  RCNT
-#define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
-
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
-#define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
-#else
-#define SKEIN_UNROLL_256 (0)
-#endif
-
-#if SKEIN_UNROLL_256
-#if (RCNT % SKEIN_UNROLL_256)
-#error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
-#endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
-#else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
-#endif
-    u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
-    u64  w[WCNT];                           /* local copy of input block */
-#ifdef SKEIN_DEBUG
-    const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
-#endif
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0] = ctx->X[0];     
-        ks[1] = ctx->X[1];
-        ks[2] = ctx->X[2];
-        ks[3] = ctx->X[3];
-        ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
-
-        ts[2] = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X0 = w[0] + ks[0];                      /* do the first full key injection */
-        X1 = w[1] + ks[1] + ts[0];
-        X2 = w[2] + ks[2] + ts[1];
-        X3 = w[3] + ks[3];
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
-
-        blkPtr += SKEIN_256_BLOCK_BYTES;
-
-        /* run the rounds */
-
-#define Round256(p0, p1, p2, p3, ROT, rNum)                              \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
-
-#if SKEIN_UNROLL_256 == 0                       
-#define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
-    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
-
-#define I256(R)                                                     \
-    X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
-    X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
-    X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
-    X3   += ks[((R)+4) % 5] +     (R)+1;                            \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
-
-#define I256(R)                                                     \
-    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
-    X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
-    X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
-    X3   += ks[r+(R)+3] +    r+(R);                              \
-    ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
-    ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
-#endif  
-        {    
-#define R256_8_rounds(R)                  \
-        R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
-        R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
-        R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
-        R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
-        I256(2 * (R));                      \
-        R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
-        R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
-        R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
-        R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
-        I256(2 * (R) + 1);
-
-        R256_8_rounds(0);
-
-#define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
-
-  #if   R256_Unroll_R(1)
-        R256_8_rounds(1);
-  #endif
-  #if   R256_Unroll_R(2)
-        R256_8_rounds(2);
-  #endif
-  #if   R256_Unroll_R(3)
-        R256_8_rounds(3);
-  #endif
-  #if   R256_Unroll_R(4)
-        R256_8_rounds(4);
-  #endif
-  #if   R256_Unroll_R(5)
-        R256_8_rounds(5);
-  #endif
-  #if   R256_Unroll_R(6)
-        R256_8_rounds(6);
-  #endif
-  #if   R256_Unroll_R(7)
-        R256_8_rounds(7);
-  #endif
-  #if   R256_Unroll_R(8)
-        R256_8_rounds(8);
-  #endif
-  #if   R256_Unroll_R(9)
-        R256_8_rounds(9);
-  #endif
-  #if   R256_Unroll_R(10)
-        R256_8_rounds(10);
-  #endif
-  #if   R256_Unroll_R(11)
-        R256_8_rounds(11);
-  #endif
-  #if   R256_Unroll_R(12)
-        R256_8_rounds(12);
-  #endif
-  #if   R256_Unroll_R(13)
-        R256_8_rounds(13);
-  #endif
-  #if   R256_Unroll_R(14)
-        R256_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_256 > 14)
-#error  "need more unrolling in Skein_256_Process_Block"
-  #endif
-        }
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = X0 ^ w[0];
-        ctx->X[1] = X1 ^ w[1];
-        ctx->X[2] = X2 ^ w[2];
-        ctx->X[3] = X3 ^ w[3];
-
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
-
-#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
-size_t Skein_256_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein_256_Process_Block_CodeSize) -
-           ((u8 *) Skein_256_Process_Block);
-    }
-unsigned int Skein_256_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_256;
-    }
-#endif
-#endif
-
-/*****************************  Skein_512 ******************************/
-#if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C */
-    enum {
-        WCNT = SKEIN_512_STATE_WORDS
-        };
-#undef  RCNT
-#define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
-
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
-#define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
-#else
-#define SKEIN_UNROLL_512 (0)
-#endif
-
-#if SKEIN_UNROLL_512
-#if (RCNT % SKEIN_UNROLL_512)
-#error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
-#endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
-#else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
-#endif
-    u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
-    u64  w[WCNT];                           /* local copy of input block */
-#ifdef SKEIN_DEBUG
-    const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
-    Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
-#endif
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0] = ctx->X[0];
-        ks[1] = ctx->X[1];
-        ks[2] = ctx->X[2];
-        ks[3] = ctx->X[3];
-        ks[4] = ctx->X[4];
-        ks[5] = ctx->X[5];
-        ks[6] = ctx->X[6];
-        ks[7] = ctx->X[7];
-        ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
-                ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
-
-        ts[2] = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X0   = w[0] + ks[0];                    /* do the first full key injection */
-        X1   = w[1] + ks[1];
-        X2   = w[2] + ks[2];
-        X3   = w[3] + ks[3];
-        X4   = w[4] + ks[4];
-        X5   = w[5] + ks[5] + ts[0];
-        X6   = w[6] + ks[6] + ts[1];
-        X7   = w[7] + ks[7];
-
-        blkPtr += SKEIN_512_BLOCK_BYTES;
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
-        /* run the rounds */
-#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
-
-#if SKEIN_UNROLL_512 == 0                       
-#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
-    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
-
-#define I512(R)                                                     \
-    X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
-    X1   += ks[((R) + 2) % 9];                                        \
-    X2   += ks[((R) + 3) % 9];                                        \
-    X3   += ks[((R) + 4) % 9];                                        \
-    X4   += ks[((R) + 5) % 9];                                        \
-    X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
-    X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
-    X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
-
-#define I512(R)                                                     \
-    X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
-    X1   += ks[r + (R) + 1];                                            \
-    X2   += ks[r + (R) + 2];                                            \
-    X3   += ks[r + (R) + 3];                                            \
-    X4   += ks[r + (R) + 4];                                            \
-    X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
-    X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
-    X7   += ks[r + (R) + 7] +         r + (R);                              \
-    ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
-    ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
-#endif                         /* end of looped code definitions */
-        {
-#define R512_8_rounds(R)  /* do 8 full rounds */  \
-        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
-        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
-        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
-        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
-        I512(2 * (R));                              \
-        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
-        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
-        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
-        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
-        I512(2 * (R) + 1);        /* and key injection */
-
-        R512_8_rounds(0);
-
-#define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
-
-  #if   R512_Unroll_R(1)
-        R512_8_rounds(1);
-  #endif
-  #if   R512_Unroll_R(2)
-        R512_8_rounds(2);
-  #endif
-  #if   R512_Unroll_R(3)
-        R512_8_rounds(3);
-  #endif
-  #if   R512_Unroll_R(4)
-        R512_8_rounds(4);
-  #endif
-  #if   R512_Unroll_R(5)
-        R512_8_rounds(5);
-  #endif
-  #if   R512_Unroll_R(6)
-        R512_8_rounds(6);
-  #endif
-  #if   R512_Unroll_R(7)
-        R512_8_rounds(7);
-  #endif
-  #if   R512_Unroll_R(8)
-        R512_8_rounds(8);
-  #endif
-  #if   R512_Unroll_R(9)
-        R512_8_rounds(9);
-  #endif
-  #if   R512_Unroll_R(10)
-        R512_8_rounds(10);
-  #endif
-  #if   R512_Unroll_R(11)
-        R512_8_rounds(11);
-  #endif
-  #if   R512_Unroll_R(12)
-        R512_8_rounds(12);
-  #endif
-  #if   R512_Unroll_R(13)
-        R512_8_rounds(13);
-  #endif
-  #if   R512_Unroll_R(14)
-        R512_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_512 > 14)
-#error  "need more unrolling in Skein_512_Process_Block"
-  #endif
-        }
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = X0 ^ w[0];
-        ctx->X[1] = X1 ^ w[1];
-        ctx->X[2] = X2 ^ w[2];
-        ctx->X[3] = X3 ^ w[3];
-        ctx->X[4] = X4 ^ w[4];
-        ctx->X[5] = X5 ^ w[5];
-        ctx->X[6] = X6 ^ w[6];
-        ctx->X[7] = X7 ^ w[7];
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
-
-#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
-size_t Skein_512_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein_512_Process_Block_CodeSize) -
-           ((u8 *) Skein_512_Process_Block);
-    }
-unsigned int Skein_512_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_512;
-    }
-#endif
-#endif
-
-/*****************************  Skein1024 ******************************/
-#if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C, always looping (unrolled is bigger AND slower!) */
-    enum {
-        WCNT = SKEIN1024_STATE_WORDS
-        };
-#undef  RCNT
-#define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
-
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
-#define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
-#else
-#define SKEIN_UNROLL_1024 (0)
-#endif
-
-#if (SKEIN_UNROLL_1024 != 0)
-#if (RCNT % SKEIN_UNROLL_1024)
-#error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
-#endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
-#else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
-#endif
-
-    u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
-            X08, X09, X10, X11, X12, X13, X14, X15;
-    u64  w[WCNT];                            /* local copy of input block */
-#ifdef SKEIN_DEBUG
-    const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
-    Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
-    Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
-    Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
-#endif
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0]  = ctx->X[0];
-        ks[1]  = ctx->X[1];
-        ks[2]  = ctx->X[2];
-        ks[3]  = ctx->X[3];
-        ks[4]  = ctx->X[4];
-        ks[5]  = ctx->X[5];
-        ks[6]  = ctx->X[6];
-        ks[7]  = ctx->X[7];
-        ks[8]  = ctx->X[8];
-        ks[9]  = ctx->X[9];
-        ks[10] = ctx->X[10];
-        ks[11] = ctx->X[11];
-        ks[12] = ctx->X[12];
-        ks[13] = ctx->X[13];
-        ks[14] = ctx->X[14];
-        ks[15] = ctx->X[15];
-        ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
-                  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
-                  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
-                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
-
-        ts[2]  = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X00    =  w[0] +  ks[0];                 /* do the first full key injection */
-        X01    =  w[1] +  ks[1];
-        X02    =  w[2] +  ks[2];
-        X03    =  w[3] +  ks[3];
-        X04    =  w[4] +  ks[4];
-        X05    =  w[5] +  ks[5];
-        X06    =  w[6] +  ks[6];
-        X07    =  w[7] +  ks[7];
-        X08    =  w[8] +  ks[8];
-        X09    =  w[9] +  ks[9];
-        X10    = w[10] + ks[10];
-        X11    = w[11] + ks[11];
-        X12    = w[12] + ks[12];
-        X13    = w[13] + ks[13] + ts[0];
-        X14    = w[14] + ks[14] + ts[1];
-        X15    = w[15] + ks[15];
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
-
-#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
-    X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
-    X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
-    X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
-    X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
-
-#if SKEIN_UNROLL_1024 == 0                      
-#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
-
-#define I1024(R)                                                        \
-    X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
-    X01   += ks[((R) +  2) % 17];                                       \
-    X02   += ks[((R) +  3) % 17];                                       \
-    X03   += ks[((R) +  4) % 17];                                       \
-    X04   += ks[((R) +  5) % 17];                                       \
-    X05   += ks[((R) +  6) % 17];                                       \
-    X06   += ks[((R) +  7) % 17];                                       \
-    X07   += ks[((R) +  8) % 17];                                       \
-    X08   += ks[((R) +  9) % 17];                                       \
-    X09   += ks[((R) + 10) % 17];                                       \
-    X10   += ks[((R) + 11) % 17];                                       \
-    X11   += ks[((R) + 12) % 17];                                       \
-    X12   += ks[((R) + 13) % 17];                                       \
-    X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
-    X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
-    X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
-#else                                       /* looping version */
-#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
-
-#define I1024(R)                                                      \
-    X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
-    X01   += ks[r + (R) +  1];                                            \
-    X02   += ks[r + (R) +  2];                                            \
-    X03   += ks[r + (R) +  3];                                            \
-    X04   += ks[r + (R) +  4];                                            \
-    X05   += ks[r + (R) +  5];                                            \
-    X06   += ks[r + (R) +  6];                                            \
-    X07   += ks[r + (R) +  7];                                            \
-    X08   += ks[r + (R) +  8];                                            \
-    X09   += ks[r + (R) +  9];                                            \
-    X10   += ks[r + (R) + 10];                                            \
-    X11   += ks[r + (R) + 11];                                            \
-    X12   += ks[r + (R) + 12];                                            \
-    X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
-    X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
-    X15   += ks[r + (R) + 15] +         r + (R);                          \
-    ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
-    ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
-    Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
-#endif  
-        {
-#define R1024_8_rounds(R)    /* do 8 full rounds */                               \
-        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
-        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
-        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
-        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
-        I1024(2*(R));                                                             \
-        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
-        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
-        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
-        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
-        I1024(2*(R)+1);
-
-        R1024_8_rounds(0);
-
-#define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
-
-  #if   R1024_Unroll_R(1)
-        R1024_8_rounds(1);
-  #endif
-  #if   R1024_Unroll_R(2)
-        R1024_8_rounds(2);
-  #endif
-  #if   R1024_Unroll_R(3)
-        R1024_8_rounds(3);
-  #endif
-  #if   R1024_Unroll_R(4)
-        R1024_8_rounds(4);
-  #endif
-  #if   R1024_Unroll_R(5)
-        R1024_8_rounds(5);
-  #endif
-  #if   R1024_Unroll_R(6)
-        R1024_8_rounds(6);
-  #endif
-  #if   R1024_Unroll_R(7)
-        R1024_8_rounds(7);
-  #endif
-  #if   R1024_Unroll_R(8)
-        R1024_8_rounds(8);
-  #endif
-  #if   R1024_Unroll_R(9)
-        R1024_8_rounds(9);
-  #endif
-  #if   R1024_Unroll_R(10)
-        R1024_8_rounds(10);
-  #endif
-  #if   R1024_Unroll_R(11)
-        R1024_8_rounds(11);
-  #endif
-  #if   R1024_Unroll_R(12)
-        R1024_8_rounds(12);
-  #endif
-  #if   R1024_Unroll_R(13)
-        R1024_8_rounds(13);
-  #endif
-  #if   R1024_Unroll_R(14)
-        R1024_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_1024 > 14)
-#error  "need more unrolling in Skein_1024_Process_Block"
-  #endif
-        }
-        /* do the final "feedforward" xor, update context chaining vars */
-
-        ctx->X[0] = X00 ^ w[0];
-        ctx->X[1] = X01 ^ w[1];
-        ctx->X[2] = X02 ^ w[2];
-        ctx->X[3] = X03 ^ w[3];
-        ctx->X[4] = X04 ^ w[4];
-        ctx->X[5] = X05 ^ w[5];
-        ctx->X[6] = X06 ^ w[6];
-        ctx->X[7] = X07 ^ w[7];
-        ctx->X[8] = X08 ^ w[8];
-        ctx->X[9] = X09 ^ w[9];
-        ctx->X[10] = X10 ^ w[10];
-        ctx->X[11] = X11 ^ w[11];
-        ctx->X[12] = X12 ^ w[12];
-        ctx->X[13] = X13 ^ w[13];
-        ctx->X[14] = X14 ^ w[14];
-        ctx->X[15] = X15 ^ w[15];
-
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-        
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        blkPtr += SKEIN1024_BLOCK_BYTES;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
-
-#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
-size_t Skein1024_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein1024_Process_Block_CodeSize) -
-           ((u8 *) Skein1024_Process_Block);
-    }
-unsigned int Skein1024_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_1024;
-    }
-#endif
-#endif
+/***********************************************************************
+**
+** Implementation of the Skein block functions.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+** Compile-time switches:
+**
+**  SKEIN_USE_ASM  -- set bits (256/512/1024) to select which
+**                    versions use ASM code for block processing
+**                    [default: use C for all block sizes]
+**
+************************************************************************/
+
+#include <linux/string.h>
+#include <skein.h>
+
+#ifndef SKEIN_USE_ASM
+#define SKEIN_USE_ASM   (0)                     /* default is all C code (no ASM) */
+#endif
+
+#ifndef SKEIN_LOOP
+#define SKEIN_LOOP 001                          /* default: unroll 256 and 512, but not 1024 */
+#endif
+
+#define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
+#define KW_TWK_BASE     (0)
+#define KW_KEY_BASE     (3)
+#define ks              (kw + KW_KEY_BASE)                
+#define ts              (kw + KW_TWK_BASE)
+
+#ifdef SKEIN_DEBUG
+#define DebugSaveTweak(ctx) { ctx->h.T[0] = ts[0]; ctx->h.T[1] = ts[1]; }
+#else
+#define DebugSaveTweak(ctx)
+#endif
+
+/*****************************  Skein_256 ******************************/
+#if !(SKEIN_USE_ASM & 256)
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+    { /* do it in C */
+    enum {
+        WCNT = SKEIN_256_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
+
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
+#else
+#define SKEIN_UNROLL_256 (0)
+#endif
+
+#if SKEIN_UNROLL_256
+#if (RCNT % SKEIN_UNROLL_256)
+#error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+    u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
+    u64  w[WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+#endif
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0] = ctx->X[0];     
+        ks[1] = ctx->X[1];
+        ks[2] = ctx->X[2];
+        ks[3] = ctx->X[3];
+        ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
+
+        ts[2] = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+        X0 = w[0] + ks[0];                      /* do the first full key injection */
+        X1 = w[1] + ks[1] + ts[0];
+        X2 = w[2] + ks[2] + ts[1];
+        X3 = w[3] + ks[3];
+
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
+
+        blkPtr += SKEIN_256_BLOCK_BYTES;
+
+        /* run the rounds */
+
+#define Round256(p0, p1, p2, p3, ROT, rNum)                              \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+
+#if SKEIN_UNROLL_256 == 0                       
+#define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
+    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+
+#define I256(R)                                                     \
+    X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
+    X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
+    X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
+    X3   += ks[((R)+4) % 5] +     (R)+1;                            \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+#else                                       /* looping version */
+#define R256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+
+#define I256(R)                                                     \
+    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
+    X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
+    X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
+    X3   += ks[r+(R)+3] +    r+(R);                              \
+    ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
+    ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
+#endif  
+        {    
+#define R256_8_rounds(R)                  \
+        R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
+        R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
+        R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
+        R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
+        I256(2 * (R));                      \
+        R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
+        R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
+        R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
+        R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
+        I256(2 * (R) + 1);
+
+        R256_8_rounds(0);
+
+#define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
+
+  #if   R256_Unroll_R(1)
+        R256_8_rounds(1);
+  #endif
+  #if   R256_Unroll_R(2)
+        R256_8_rounds(2);
+  #endif
+  #if   R256_Unroll_R(3)
+        R256_8_rounds(3);
+  #endif
+  #if   R256_Unroll_R(4)
+        R256_8_rounds(4);
+  #endif
+  #if   R256_Unroll_R(5)
+        R256_8_rounds(5);
+  #endif
+  #if   R256_Unroll_R(6)
+        R256_8_rounds(6);
+  #endif
+  #if   R256_Unroll_R(7)
+        R256_8_rounds(7);
+  #endif
+  #if   R256_Unroll_R(8)
+        R256_8_rounds(8);
+  #endif
+  #if   R256_Unroll_R(9)
+        R256_8_rounds(9);
+  #endif
+  #if   R256_Unroll_R(10)
+        R256_8_rounds(10);
+  #endif
+  #if   R256_Unroll_R(11)
+        R256_8_rounds(11);
+  #endif
+  #if   R256_Unroll_R(12)
+        R256_8_rounds(12);
+  #endif
+  #if   R256_Unroll_R(13)
+        R256_8_rounds(13);
+  #endif
+  #if   R256_Unroll_R(14)
+        R256_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_256 > 14)
+#error  "need more unrolling in Skein_256_Process_Block"
+  #endif
+        }
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = X0 ^ w[0];
+        ctx->X[1] = X1 ^ w[1];
+        ctx->X[2] = X2 ^ w[2];
+        ctx->X[3] = X3 ^ w[3];
+
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein_256_Process_Block_CodeSize(void)
+    {
+    return ((u8 *) Skein_256_Process_Block_CodeSize) -
+           ((u8 *) Skein_256_Process_Block);
+    }
+unsigned int Skein_256_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_256;
+    }
+#endif
+#endif
+
+/*****************************  Skein_512 ******************************/
+#if !(SKEIN_USE_ASM & 512)
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+    { /* do it in C */
+    enum {
+        WCNT = SKEIN_512_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
+
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
+#else
+#define SKEIN_UNROLL_512 (0)
+#endif
+
+#if SKEIN_UNROLL_512
+#if (RCNT % SKEIN_UNROLL_512)
+#error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+    u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
+    u64  w[WCNT];                           /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+    Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
+#endif
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0] = ctx->X[0];
+        ks[1] = ctx->X[1];
+        ks[2] = ctx->X[2];
+        ks[3] = ctx->X[3];
+        ks[4] = ctx->X[4];
+        ks[5] = ctx->X[5];
+        ks[6] = ctx->X[6];
+        ks[7] = ctx->X[7];
+        ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
+                ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
+
+        ts[2] = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+        X0   = w[0] + ks[0];                    /* do the first full key injection */
+        X1   = w[1] + ks[1];
+        X2   = w[2] + ks[2];
+        X3   = w[3] + ks[3];
+        X4   = w[4] + ks[4];
+        X5   = w[5] + ks[5] + ts[0];
+        X6   = w[6] + ks[6] + ts[1];
+        X7   = w[7] + ks[7];
+
+        blkPtr += SKEIN_512_BLOCK_BYTES;
+
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+        /* run the rounds */
+#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
+
+#if SKEIN_UNROLL_512 == 0                       
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
+    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+
+#define I512(R)                                                     \
+    X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
+    X1   += ks[((R) + 2) % 9];                                        \
+    X2   += ks[((R) + 3) % 9];                                        \
+    X3   += ks[((R) + 4) % 9];                                        \
+    X4   += ks[((R) + 5) % 9];                                        \
+    X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
+    X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
+    X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+#else                                       /* looping version */
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+
+#define I512(R)                                                     \
+    X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
+    X1   += ks[r + (R) + 1];                                            \
+    X2   += ks[r + (R) + 2];                                            \
+    X3   += ks[r + (R) + 3];                                            \
+    X4   += ks[r + (R) + 4];                                            \
+    X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
+    X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
+    X7   += ks[r + (R) + 7] +         r + (R);                              \
+    ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
+    ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
+#endif                         /* end of looped code definitions */
+        {
+#define R512_8_rounds(R)  /* do 8 full rounds */  \
+        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
+        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
+        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
+        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
+        I512(2 * (R));                              \
+        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
+        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
+        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
+        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
+        I512(2 * (R) + 1);        /* and key injection */
+
+        R512_8_rounds(0);
+
+#define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
+
+  #if   R512_Unroll_R(1)
+        R512_8_rounds(1);
+  #endif
+  #if   R512_Unroll_R(2)
+        R512_8_rounds(2);
+  #endif
+  #if   R512_Unroll_R(3)
+        R512_8_rounds(3);
+  #endif
+  #if   R512_Unroll_R(4)
+        R512_8_rounds(4);
+  #endif
+  #if   R512_Unroll_R(5)
+        R512_8_rounds(5);
+  #endif
+  #if   R512_Unroll_R(6)
+        R512_8_rounds(6);
+  #endif
+  #if   R512_Unroll_R(7)
+        R512_8_rounds(7);
+  #endif
+  #if   R512_Unroll_R(8)
+        R512_8_rounds(8);
+  #endif
+  #if   R512_Unroll_R(9)
+        R512_8_rounds(9);
+  #endif
+  #if   R512_Unroll_R(10)
+        R512_8_rounds(10);
+  #endif
+  #if   R512_Unroll_R(11)
+        R512_8_rounds(11);
+  #endif
+  #if   R512_Unroll_R(12)
+        R512_8_rounds(12);
+  #endif
+  #if   R512_Unroll_R(13)
+        R512_8_rounds(13);
+  #endif
+  #if   R512_Unroll_R(14)
+        R512_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_512 > 14)
+#error  "need more unrolling in Skein_512_Process_Block"
+  #endif
+        }
+
+        /* do the final "feedforward" xor, update context chaining vars */
+        ctx->X[0] = X0 ^ w[0];
+        ctx->X[1] = X1 ^ w[1];
+        ctx->X[2] = X2 ^ w[2];
+        ctx->X[3] = X3 ^ w[3];
+        ctx->X[4] = X4 ^ w[4];
+        ctx->X[5] = X5 ^ w[5];
+        ctx->X[6] = X6 ^ w[6];
+        ctx->X[7] = X7 ^ w[7];
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein_512_Process_Block_CodeSize(void)
+    {
+    return ((u8 *) Skein_512_Process_Block_CodeSize) -
+           ((u8 *) Skein_512_Process_Block);
+    }
+unsigned int Skein_512_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_512;
+    }
+#endif
+#endif
+
+/*****************************  Skein1024 ******************************/
+#if !(SKEIN_USE_ASM & 1024)
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+    { /* do it in C, always looping (unrolled is bigger AND slower!) */
+    enum {
+        WCNT = SKEIN1024_STATE_WORDS
+        };
+#undef  RCNT
+#define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
+
+#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
+#else
+#define SKEIN_UNROLL_1024 (0)
+#endif
+
+#if (SKEIN_UNROLL_1024 != 0)
+#if (RCNT % SKEIN_UNROLL_1024)
+#error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
+#endif
+    size_t  r;
+    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+#else
+    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+#endif
+
+    u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
+            X08, X09, X10, X11, X12, X13, X14, X15;
+    u64  w[WCNT];                            /* local copy of input block */
+#ifdef SKEIN_DEBUG
+    const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+    Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
+    Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
+    Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
+    Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
+#endif
+
+    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+    ts[0] = ctx->h.T[0];
+    ts[1] = ctx->h.T[1];
+    do  {
+        /* this implementation only supports 2**64 input bytes (no carry out here) */
+        ts[0] += byteCntAdd;                    /* update processed length */
+
+        /* precompute the key schedule for this block */
+        ks[0]  = ctx->X[0];
+        ks[1]  = ctx->X[1];
+        ks[2]  = ctx->X[2];
+        ks[3]  = ctx->X[3];
+        ks[4]  = ctx->X[4];
+        ks[5]  = ctx->X[5];
+        ks[6]  = ctx->X[6];
+        ks[7]  = ctx->X[7];
+        ks[8]  = ctx->X[8];
+        ks[9]  = ctx->X[9];
+        ks[10] = ctx->X[10];
+        ks[11] = ctx->X[11];
+        ks[12] = ctx->X[12];
+        ks[13] = ctx->X[13];
+        ks[14] = ctx->X[14];
+        ks[15] = ctx->X[15];
+        ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
+                  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
+                  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
+                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
+
+        ts[2]  = ts[0] ^ ts[1];
+
+        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+        DebugSaveTweak(ctx);
+        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+        X00    =  w[0] +  ks[0];                 /* do the first full key injection */
+        X01    =  w[1] +  ks[1];
+        X02    =  w[2] +  ks[2];
+        X03    =  w[3] +  ks[3];
+        X04    =  w[4] +  ks[4];
+        X05    =  w[5] +  ks[5];
+        X06    =  w[6] +  ks[6];
+        X07    =  w[7] +  ks[7];
+        X08    =  w[8] +  ks[8];
+        X09    =  w[9] +  ks[9];
+        X10    = w[10] + ks[10];
+        X11    = w[11] + ks[11];
+        X12    = w[12] + ks[12];
+        X13    = w[13] + ks[13] + ts[0];
+        X14    = w[14] + ks[14] + ts[1];
+        X15    = w[15] + ks[15];
+
+        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+
+#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
+    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
+    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
+    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
+    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
+    X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
+    X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
+    X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
+    X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
+
+#if SKEIN_UNROLL_1024 == 0                      
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+
+#define I1024(R)                                                        \
+    X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
+    X01   += ks[((R) +  2) % 17];                                       \
+    X02   += ks[((R) +  3) % 17];                                       \
+    X03   += ks[((R) +  4) % 17];                                       \
+    X04   += ks[((R) +  5) % 17];                                       \
+    X05   += ks[((R) +  6) % 17];                                       \
+    X06   += ks[((R) +  7) % 17];                                       \
+    X07   += ks[((R) +  8) % 17];                                       \
+    X08   += ks[((R) +  9) % 17];                                       \
+    X09   += ks[((R) + 10) % 17];                                       \
+    X10   += ks[((R) + 11) % 17];                                       \
+    X11   += ks[((R) + 12) % 17];                                       \
+    X12   += ks[((R) + 13) % 17];                                       \
+    X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
+    X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
+    X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
+#else                                       /* looping version */
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
+
+#define I1024(R)                                                      \
+    X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
+    X01   += ks[r + (R) +  1];                                            \
+    X02   += ks[r + (R) +  2];                                            \
+    X03   += ks[r + (R) +  3];                                            \
+    X04   += ks[r + (R) +  4];                                            \
+    X05   += ks[r + (R) +  5];                                            \
+    X06   += ks[r + (R) +  6];                                            \
+    X07   += ks[r + (R) +  7];                                            \
+    X08   += ks[r + (R) +  8];                                            \
+    X09   += ks[r + (R) +  9];                                            \
+    X10   += ks[r + (R) + 10];                                            \
+    X11   += ks[r + (R) + 11];                                            \
+    X12   += ks[r + (R) + 12];                                            \
+    X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
+    X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
+    X15   += ks[r + (R) + 15] +         r + (R);                          \
+    ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
+    ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
+    Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+    for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
+#endif  
+        {
+#define R1024_8_rounds(R)    /* do 8 full rounds */                               \
+        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
+        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
+        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
+        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
+        I1024(2*(R));                                                             \
+        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
+        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
+        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
+        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
+        I1024(2*(R)+1);
+
+        R1024_8_rounds(0);
+
+#define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
+
+  #if   R1024_Unroll_R(1)
+        R1024_8_rounds(1);
+  #endif
+  #if   R1024_Unroll_R(2)
+        R1024_8_rounds(2);
+  #endif
+  #if   R1024_Unroll_R(3)
+        R1024_8_rounds(3);
+  #endif
+  #if   R1024_Unroll_R(4)
+        R1024_8_rounds(4);
+  #endif
+  #if   R1024_Unroll_R(5)
+        R1024_8_rounds(5);
+  #endif
+  #if   R1024_Unroll_R(6)
+        R1024_8_rounds(6);
+  #endif
+  #if   R1024_Unroll_R(7)
+        R1024_8_rounds(7);
+  #endif
+  #if   R1024_Unroll_R(8)
+        R1024_8_rounds(8);
+  #endif
+  #if   R1024_Unroll_R(9)
+        R1024_8_rounds(9);
+  #endif
+  #if   R1024_Unroll_R(10)
+        R1024_8_rounds(10);
+  #endif
+  #if   R1024_Unroll_R(11)
+        R1024_8_rounds(11);
+  #endif
+  #if   R1024_Unroll_R(12)
+        R1024_8_rounds(12);
+  #endif
+  #if   R1024_Unroll_R(13)
+        R1024_8_rounds(13);
+  #endif
+  #if   R1024_Unroll_R(14)
+        R1024_8_rounds(14);
+  #endif
+  #if  (SKEIN_UNROLL_1024 > 14)
+#error  "need more unrolling in Skein_1024_Process_Block"
+  #endif
+        }
+        /* do the final "feedforward" xor, update context chaining vars */
+
+        ctx->X[0] = X00 ^ w[0];
+        ctx->X[1] = X01 ^ w[1];
+        ctx->X[2] = X02 ^ w[2];
+        ctx->X[3] = X03 ^ w[3];
+        ctx->X[4] = X04 ^ w[4];
+        ctx->X[5] = X05 ^ w[5];
+        ctx->X[6] = X06 ^ w[6];
+        ctx->X[7] = X07 ^ w[7];
+        ctx->X[8] = X08 ^ w[8];
+        ctx->X[9] = X09 ^ w[9];
+        ctx->X[10] = X10 ^ w[10];
+        ctx->X[11] = X11 ^ w[11];
+        ctx->X[12] = X12 ^ w[12];
+        ctx->X[13] = X13 ^ w[13];
+        ctx->X[14] = X14 ^ w[14];
+        ctx->X[15] = X15 ^ w[15];
+
+        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+        
+        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+        blkPtr += SKEIN1024_BLOCK_BYTES;
+        }
+    while (--blkCnt);
+    ctx->h.T[0] = ts[0];
+    ctx->h.T[1] = ts[1];
+    }
+
+#if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
+size_t Skein1024_Process_Block_CodeSize(void)
+    {
+    return ((u8 *) Skein1024_Process_Block_CodeSize) -
+           ((u8 *) Skein1024_Process_Block);
+    }
+unsigned int Skein1024_Unroll_Cnt(void)
+    {
+    return SKEIN_UNROLL_1024;
+    }
+#endif
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 12/21] staging: crypto: skein: fix leading whitespace
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (10 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 11/21] staging: crypto: skein: dos2unix, remove executable perms Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 13/21] staging: crypto: skein: remove trailing whitespace Jason Cooper
                     ` (8 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        |  136 +-
 drivers/staging/skein/include/skeinApi.h     |  284 +--
 drivers/staging/skein/include/skein_iv.h     |  276 +--
 drivers/staging/skein/include/threefishApi.h |  230 +--
 drivers/staging/skein/skein.c                | 1126 +++++------
 drivers/staging/skein/skeinApi.c             |  320 +--
 drivers/staging/skein/skeinBlockNo3F.c       |  286 +--
 drivers/staging/skein/skein_block.c          | 1012 +++++-----
 drivers/staging/skein/threefish1024Block.c   | 2740 +++++++++++++-------------
 drivers/staging/skein/threefish256Block.c    |  639 +++---
 drivers/staging/skein/threefish512Block.c    | 1254 ++++++------
 drivers/staging/skein/threefishApi.c         |  102 +-
 12 files changed, 4200 insertions(+), 4205 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 18bb15824e41..906bcee41c39 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -38,11 +38,11 @@
 #define Skein_Swap64(w64)  (w64)
 
 enum
-    {
-    SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
-    SKEIN_FAIL            =      1,
-    SKEIN_BAD_HASHLEN     =      2
-    };
+	{
+	SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
+	SKEIN_FAIL            =      1,
+	SKEIN_BAD_HASHLEN     =      2
+	};
 
 #define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
 
@@ -64,32 +64,32 @@ enum
 #define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
 
 struct skein_ctx_hdr
-    {
-    size_t  hashBitLen;                      /* size of hash result, in bits */
-    size_t  bCnt;                            /* current byte count in buffer b[] */
-    u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
-    };
+	{
+	size_t  hashBitLen;                      /* size of hash result, in bits */
+	size_t  bCnt;                            /* current byte count in buffer b[] */
+	u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+	};
 
 struct skein_256_ctx                               /*  256-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
+	{
+	struct skein_ctx_hdr h;                      /* common header context variables */
+	u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
+	u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	};
 
 struct skein_512_ctx                             /*  512-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
+	{
+	struct skein_ctx_hdr h;                      /* common header context variables */
+	u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
+	u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	};
 
 struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
-    {
-    struct skein_ctx_hdr h;                      /* common header context variables */
-    u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
-    u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
-    };
+	{
+	struct skein_ctx_hdr h;                      /* common header context variables */
+	u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
+	u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	};
 
 /*   Skein APIs for (incremental) "straight hashing" */
 int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
@@ -150,18 +150,18 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 
 /* tweak word T[1]: bit field starting positions */
 #define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
-                                
+
 #define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
 #define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
 #define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
 #define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
 #define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
-                                
+
 /* tweak word T[1]: flag bit definition(s) */
 #define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
 #define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
 #define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
-                                
+
 /* tweak word T[1]: tree level bit field mask */
 #define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
 #define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
@@ -213,9 +213,9 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
 
 #define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
-    ((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
-     (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
-     (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
+	((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
+	 (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
+	 (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
 
 #define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
 
@@ -233,17 +233,17 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 
 /* set both tweak words at once */
 #define Skein_Set_T0_T1(ctxPtr, T0, T1)           \
-    {                                           \
-    Skein_Set_T0(ctxPtr, (T0));                  \
-    Skein_Set_T1(ctxPtr, (T1));                  \
-    }
+	{                                           \
+	Skein_Set_T0(ctxPtr, (T0));                  \
+	Skein_Set_T1(ctxPtr, (T1));                  \
+	}
 
 #define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
-    Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
+	Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
 
 /* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
 #define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
-    { Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
+	{ Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
 
 #define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
 #define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
@@ -270,37 +270,37 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 ** Skein block function constants (shared across Ref and Opt code)
 ******************************************************************/
 enum    
-    {   
-        /* Skein_256 round rotation constants */
-    R_256_0_0 = 14, R_256_0_1 = 16,
-    R_256_1_0 = 52, R_256_1_1 = 57,
-    R_256_2_0 = 23, R_256_2_1 = 40,
-    R_256_3_0 =  5, R_256_3_1 = 37,
-    R_256_4_0 = 25, R_256_4_1 = 33,
-    R_256_5_0 = 46, R_256_5_1 = 12,
-    R_256_6_0 = 58, R_256_6_1 = 22,
-    R_256_7_0 = 32, R_256_7_1 = 32,
-
-        /* Skein_512 round rotation constants */
-    R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
-    R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
-    R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
-    R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
-    R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
-    R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
-    R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
-    R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
-
-        /* Skein1024 round rotation constants */
-    R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
-    R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
-    R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
-    R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
-    R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
-    R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
-    R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
-    R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
-    };
+	{   
+	    /* Skein_256 round rotation constants */
+	R_256_0_0 = 14, R_256_0_1 = 16,
+	R_256_1_0 = 52, R_256_1_1 = 57,
+	R_256_2_0 = 23, R_256_2_1 = 40,
+	R_256_3_0 =  5, R_256_3_1 = 37,
+	R_256_4_0 = 25, R_256_4_1 = 33,
+	R_256_5_0 = 46, R_256_5_1 = 12,
+	R_256_6_0 = 58, R_256_6_1 = 22,
+	R_256_7_0 = 32, R_256_7_1 = 32,
+
+	    /* Skein_512 round rotation constants */
+	R_512_0_0 = 46, R_512_0_1 = 36, R_512_0_2 = 19, R_512_0_3 = 37,
+	R_512_1_0 = 33, R_512_1_1 = 27, R_512_1_2 = 14, R_512_1_3 = 42,
+	R_512_2_0 = 17, R_512_2_1 = 49, R_512_2_2 = 36, R_512_2_3 = 39,
+	R_512_3_0 = 44, R_512_3_1 =  9, R_512_3_2 = 54, R_512_3_3 = 56,
+	R_512_4_0 = 39, R_512_4_1 = 30, R_512_4_2 = 34, R_512_4_3 = 24,
+	R_512_5_0 = 13, R_512_5_1 = 50, R_512_5_2 = 10, R_512_5_3 = 17,
+	R_512_6_0 = 25, R_512_6_1 = 29, R_512_6_2 = 39, R_512_6_3 = 43,
+	R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
+
+	    /* Skein1024 round rotation constants */
+	R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
+	R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
+	R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
+	R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
+	R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
+	R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
+	R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
+	R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
+	};
 
 #ifndef SKEIN_ROUNDS
 #define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 2c52797918cf..0d7d59eff460 100644
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -81,148 +81,148 @@ OTHER DEALINGS IN THE SOFTWARE.
 #include <linux/types.h>
 #include <skein.h>
 
-    /**
-     * Which Skein size to use
-     */
-    enum skein_size {
-        Skein256 = 256,     /*!< Skein with 256 bit state */
-        Skein512 = 512,     /*!< Skein with 512 bit state */
-        Skein1024 = 1024    /*!< Skein with 1024 bit state */
-    };
-
-    /**
-     * Context for Skein.
-     *
-     * This structure was setup with some know-how of the internal
-     * Skein structures, in particular ordering of header and size dependent
-     * variables. If Skein implementation changes this, then adapt these
-     * structures as well.
-     */
-    struct skein_ctx {
-        u64 skeinSize;
-        u64  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
-        union {
-            struct skein_ctx_hdr h;
-            struct skein_256_ctx s256;
-            struct skein_512_ctx s512;
-            struct skein1024_ctx s1024;
-        } m;
-    };
-
-    /**
-     * Prepare a Skein context.
-     * 
-     * An application must call this function before it can use the Skein
-     * context. The functions clears memory and initializes size dependent
-     * variables.
-     *
-     * @param ctx
-     *     Pointer to a Skein context.
-     * @param size
-     *     Which Skein size to use.
-     * @return
-     *     SKEIN_SUCESS of SKEIN_FAIL
-     */
-    int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size);
-
-    /**
-     * Initialize a Skein context.
-     *
-     * Initializes the context with this data and saves the resulting Skein 
-     * state variables for further use.
-     *
-     * @param ctx
-     *     Pointer to a Skein context.
-     * @param hashBitLen
-     *     Number of MAC hash bits to compute
-     * @return
-     *     SKEIN_SUCESS of SKEIN_FAIL
-     * @see skeinReset
-     */
-    int skeinInit(struct skein_ctx *ctx, size_t hashBitLen);
-
-    /**
-     * Resets a Skein context for further use.
-     * 
-     * Restores the saved chaining variables to reset the Skein context. 
-     * Thus applications can reuse the same setup to  process several 
-     * messages. This saves a complete Skein initialization cycle.
-     * 
-     * @param ctx
-     *     Pointer to a pre-initialized Skein MAC context
-     */
-    void skeinReset(struct skein_ctx *ctx);
-    
-    /**
-     * Initializes a Skein context for MAC usage.
-     * 
-     * Initializes the context with this data and saves the resulting Skein 
-     * state variables for further use.
-     *
-     * Applications call the normal Skein functions to update the MAC and
-     * get the final result.
-     *
-     * @param ctx
-     *     Pointer to an empty or preinitialized Skein MAC context
-     * @param key
-     *     Pointer to key bytes or NULL
-     * @param keyLen
-     *     Length of the key in bytes or zero
-     * @param hashBitLen
-     *     Number of MAC hash bits to compute
-     * @return
-     *     SKEIN_SUCESS of SKEIN_FAIL
-     */
-    int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
-                     size_t hashBitLen);
-
-    /**
-     * Update Skein with the next part of the message.
-     *
-     * @param ctx
-     *     Pointer to initialized Skein context
-     * @param msg
-     *     Pointer to the message.
-     * @param msgByteCnt
-     *     Length of the message in @b bytes
-     * @return
-     *     Success or error code.
-     */
-    int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
-                    size_t msgByteCnt);
-
-    /**
-     * Update the hash with a message bit string.
-     *
-     * Skein can handle data not only as bytes but also as bit strings of
-     * arbitrary length (up to its maximum design size).
-     *
-     * @param ctx
-     *     Pointer to initialized Skein context
-     * @param msg
-     *     Pointer to the message.
-     * @param msgBitCnt
-     *     Length of the message in @b bits.
-     */
-    int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
-                        size_t msgBitCnt);
-
-    /**
-     * Finalize Skein and return the hash.
-     * 
-     * Before an application can reuse a Skein setup the application must
-     * reset the Skein context.
-     *
-     * @param ctx
-     *     Pointer to initialized Skein context
-     * @param hash
-     *     Pointer to buffer that receives the hash. The buffer must be large
-     *     enough to store @c hashBitLen bits.
-     * @return
-     *     Success or error code.
-     * @see skeinReset
-     */
-    int skeinFinal(struct skein_ctx *ctx, u8 *hash);
+/**
+ * Which Skein size to use
+ */
+enum skein_size {
+	Skein256 = 256,     /*!< Skein with 256 bit state */
+	Skein512 = 512,     /*!< Skein with 512 bit state */
+	Skein1024 = 1024    /*!< Skein with 1024 bit state */
+};
+
+/**
+ * Context for Skein.
+ *
+ * This structure was setup with some know-how of the internal
+ * Skein structures, in particular ordering of header and size dependent
+ * variables. If Skein implementation changes this, then adapt these
+ * structures as well.
+ */
+struct skein_ctx {
+	u64 skeinSize;
+	u64  XSave[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
+	union {
+		struct skein_ctx_hdr h;
+		struct skein_256_ctx s256;
+		struct skein_512_ctx s512;
+		struct skein1024_ctx s1024;
+	} m;
+};
+
+/**
+ * Prepare a Skein context.
+ * 
+ * An application must call this function before it can use the Skein
+ * context. The functions clears memory and initializes size dependent
+ * variables.
+ *
+ * @param ctx
+ *     Pointer to a Skein context.
+ * @param size
+ *     Which Skein size to use.
+ * @return
+ *     SKEIN_SUCESS of SKEIN_FAIL
+ */
+int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size);
+
+/**
+ * Initialize a Skein context.
+ *
+ * Initializes the context with this data and saves the resulting Skein 
+ * state variables for further use.
+ *
+ * @param ctx
+ *     Pointer to a Skein context.
+ * @param hashBitLen
+ *     Number of MAC hash bits to compute
+ * @return
+ *     SKEIN_SUCESS of SKEIN_FAIL
+ * @see skeinReset
+ */
+int skeinInit(struct skein_ctx *ctx, size_t hashBitLen);
+
+/**
+ * Resets a Skein context for further use.
+ * 
+ * Restores the saved chaining variables to reset the Skein context. 
+ * Thus applications can reuse the same setup to  process several 
+ * messages. This saves a complete Skein initialization cycle.
+ * 
+ * @param ctx
+ *     Pointer to a pre-initialized Skein MAC context
+ */
+void skeinReset(struct skein_ctx *ctx);
+
+/**
+ * Initializes a Skein context for MAC usage.
+ * 
+ * Initializes the context with this data and saves the resulting Skein 
+ * state variables for further use.
+ *
+ * Applications call the normal Skein functions to update the MAC and
+ * get the final result.
+ *
+ * @param ctx
+ *     Pointer to an empty or preinitialized Skein MAC context
+ * @param key
+ *     Pointer to key bytes or NULL
+ * @param keyLen
+ *     Length of the key in bytes or zero
+ * @param hashBitLen
+ *     Number of MAC hash bits to compute
+ * @return
+ *     SKEIN_SUCESS of SKEIN_FAIL
+ */
+int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
+		 size_t hashBitLen);
+
+/**
+ * Update Skein with the next part of the message.
+ *
+ * @param ctx
+ *     Pointer to initialized Skein context
+ * @param msg
+ *     Pointer to the message.
+ * @param msgByteCnt
+ *     Length of the message in @b bytes
+ * @return
+ *     Success or error code.
+ */
+int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
+		size_t msgByteCnt);
+
+/**
+ * Update the hash with a message bit string.
+ *
+ * Skein can handle data not only as bytes but also as bit strings of
+ * arbitrary length (up to its maximum design size).
+ *
+ * @param ctx
+ *     Pointer to initialized Skein context
+ * @param msg
+ *     Pointer to the message.
+ * @param msgBitCnt
+ *     Length of the message in @b bits.
+ */
+int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
+		    size_t msgBitCnt);
+
+/**
+ * Finalize Skein and return the hash.
+ * 
+ * Before an application can reuse a Skein setup the application must
+ * reset the Skein context.
+ *
+ * @param ctx
+ *     Pointer to initialized Skein context
+ * @param hash
+ *     Pointer to buffer that receives the hash. The buffer must be large
+ *     enough to store @c hashBitLen bits.
+ * @return
+ *     Success or error code.
+ * @see skeinReset
+ */
+int skeinFinal(struct skein_ctx *ctx, u8 *hash);
 
 /**
  * @}
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index 813bad528e3c..bbbba77c44d3 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -21,179 +21,179 @@
 
 /* blkSize =  256 bits. hashSize =  128 bits */
 const u64 SKEIN_256_IV_128[] =
-    {
-    MK_64(0xE1111906, 0x964D7260),
-    MK_64(0x883DAAA7, 0x7C8D811C),
-    MK_64(0x10080DF4, 0x91960F7A),
-    MK_64(0xCCF7DDE5, 0xB45BC1C2)
-    };
+	{
+	MK_64(0xE1111906, 0x964D7260),
+	MK_64(0x883DAAA7, 0x7C8D811C),
+	MK_64(0x10080DF4, 0x91960F7A),
+	MK_64(0xCCF7DDE5, 0xB45BC1C2)
+	};
 
 /* blkSize =  256 bits. hashSize =  160 bits */
 const u64 SKEIN_256_IV_160[] =
-    {
-    MK_64(0x14202314, 0x72825E98),
-    MK_64(0x2AC4E9A2, 0x5A77E590),
-    MK_64(0xD47A5856, 0x8838D63E),
-    MK_64(0x2DD2E496, 0x8586AB7D)
-    };
+	{
+	MK_64(0x14202314, 0x72825E98),
+	MK_64(0x2AC4E9A2, 0x5A77E590),
+	MK_64(0xD47A5856, 0x8838D63E),
+	MK_64(0x2DD2E496, 0x8586AB7D)
+	};
 
 /* blkSize =  256 bits. hashSize =  224 bits */
 const u64 SKEIN_256_IV_224[] =
-    {
-    MK_64(0xC6098A8C, 0x9AE5EA0B),
-    MK_64(0x876D5686, 0x08C5191C),
-    MK_64(0x99CB88D7, 0xD7F53884),
-    MK_64(0x384BDDB1, 0xAEDDB5DE)
-    };
+	{
+	MK_64(0xC6098A8C, 0x9AE5EA0B),
+	MK_64(0x876D5686, 0x08C5191C),
+	MK_64(0x99CB88D7, 0xD7F53884),
+	MK_64(0x384BDDB1, 0xAEDDB5DE)
+	};
 
 /* blkSize =  256 bits. hashSize =  256 bits */
 const u64 SKEIN_256_IV_256[] =
-    {
-    MK_64(0xFC9DA860, 0xD048B449),
-    MK_64(0x2FCA6647, 0x9FA7D833),
-    MK_64(0xB33BC389, 0x6656840F),
-    MK_64(0x6A54E920, 0xFDE8DA69)
-    };
+	{
+	MK_64(0xFC9DA860, 0xD048B449),
+	MK_64(0x2FCA6647, 0x9FA7D833),
+	MK_64(0xB33BC389, 0x6656840F),
+	MK_64(0x6A54E920, 0xFDE8DA69)
+	};
 
 /* blkSize =  512 bits. hashSize =  128 bits */
 const u64 SKEIN_512_IV_128[] =
-    {
-    MK_64(0xA8BC7BF3, 0x6FBF9F52),
-    MK_64(0x1E9872CE, 0xBD1AF0AA),
-    MK_64(0x309B1790, 0xB32190D3),
-    MK_64(0xBCFBB854, 0x3F94805C),
-    MK_64(0x0DA61BCD, 0x6E31B11B),
-    MK_64(0x1A18EBEA, 0xD46A32E3),
-    MK_64(0xA2CC5B18, 0xCE84AA82),
-    MK_64(0x6982AB28, 0x9D46982D)
-    };
+	{
+	MK_64(0xA8BC7BF3, 0x6FBF9F52),
+	MK_64(0x1E9872CE, 0xBD1AF0AA),
+	MK_64(0x309B1790, 0xB32190D3),
+	MK_64(0xBCFBB854, 0x3F94805C),
+	MK_64(0x0DA61BCD, 0x6E31B11B),
+	MK_64(0x1A18EBEA, 0xD46A32E3),
+	MK_64(0xA2CC5B18, 0xCE84AA82),
+	MK_64(0x6982AB28, 0x9D46982D)
+	};
 
 /* blkSize =  512 bits. hashSize =  160 bits */
 const u64 SKEIN_512_IV_160[] =
-    {
-    MK_64(0x28B81A2A, 0xE013BD91),
-    MK_64(0xC2F11668, 0xB5BDF78F),
-    MK_64(0x1760D8F3, 0xF6A56F12),
-    MK_64(0x4FB74758, 0x8239904F),
-    MK_64(0x21EDE07F, 0x7EAF5056),
-    MK_64(0xD908922E, 0x63ED70B8),
-    MK_64(0xB8EC76FF, 0xECCB52FA),
-    MK_64(0x01A47BB8, 0xA3F27A6E)
-    };
+	{
+	MK_64(0x28B81A2A, 0xE013BD91),
+	MK_64(0xC2F11668, 0xB5BDF78F),
+	MK_64(0x1760D8F3, 0xF6A56F12),
+	MK_64(0x4FB74758, 0x8239904F),
+	MK_64(0x21EDE07F, 0x7EAF5056),
+	MK_64(0xD908922E, 0x63ED70B8),
+	MK_64(0xB8EC76FF, 0xECCB52FA),
+	MK_64(0x01A47BB8, 0xA3F27A6E)
+	};
 
 /* blkSize =  512 bits. hashSize =  224 bits */
 const u64 SKEIN_512_IV_224[] =
-    {
-    MK_64(0xCCD06162, 0x48677224),
-    MK_64(0xCBA65CF3, 0xA92339EF),
-    MK_64(0x8CCD69D6, 0x52FF4B64),
-    MK_64(0x398AED7B, 0x3AB890B4),
-    MK_64(0x0F59D1B1, 0x457D2BD0),
-    MK_64(0x6776FE65, 0x75D4EB3D),
-    MK_64(0x99FBC70E, 0x997413E9),
-    MK_64(0x9E2CFCCF, 0xE1C41EF7)
-    };
+	{
+	MK_64(0xCCD06162, 0x48677224),
+	MK_64(0xCBA65CF3, 0xA92339EF),
+	MK_64(0x8CCD69D6, 0x52FF4B64),
+	MK_64(0x398AED7B, 0x3AB890B4),
+	MK_64(0x0F59D1B1, 0x457D2BD0),
+	MK_64(0x6776FE65, 0x75D4EB3D),
+	MK_64(0x99FBC70E, 0x997413E9),
+	MK_64(0x9E2CFCCF, 0xE1C41EF7)
+	};
 
 /* blkSize =  512 bits. hashSize =  256 bits */
 const u64 SKEIN_512_IV_256[] =
-    {
-    MK_64(0xCCD044A1, 0x2FDB3E13),
-    MK_64(0xE8359030, 0x1A79A9EB),
-    MK_64(0x55AEA061, 0x4F816E6F),
-    MK_64(0x2A2767A4, 0xAE9B94DB),
-    MK_64(0xEC06025E, 0x74DD7683),
-    MK_64(0xE7A436CD, 0xC4746251),
-    MK_64(0xC36FBAF9, 0x393AD185),
-    MK_64(0x3EEDBA18, 0x33EDFC13)
-    };
+	{
+	MK_64(0xCCD044A1, 0x2FDB3E13),
+	MK_64(0xE8359030, 0x1A79A9EB),
+	MK_64(0x55AEA061, 0x4F816E6F),
+	MK_64(0x2A2767A4, 0xAE9B94DB),
+	MK_64(0xEC06025E, 0x74DD7683),
+	MK_64(0xE7A436CD, 0xC4746251),
+	MK_64(0xC36FBAF9, 0x393AD185),
+	MK_64(0x3EEDBA18, 0x33EDFC13)
+	};
 
 /* blkSize =  512 bits. hashSize =  384 bits */
 const u64 SKEIN_512_IV_384[] =
-    {
-    MK_64(0xA3F6C6BF, 0x3A75EF5F),
-    MK_64(0xB0FEF9CC, 0xFD84FAA4),
-    MK_64(0x9D77DD66, 0x3D770CFE),
-    MK_64(0xD798CBF3, 0xB468FDDA),
-    MK_64(0x1BC4A666, 0x8A0E4465),
-    MK_64(0x7ED7D434, 0xE5807407),
-    MK_64(0x548FC1AC, 0xD4EC44D6),
-    MK_64(0x266E1754, 0x6AA18FF8)
-    };
+	{
+	MK_64(0xA3F6C6BF, 0x3A75EF5F),
+	MK_64(0xB0FEF9CC, 0xFD84FAA4),
+	MK_64(0x9D77DD66, 0x3D770CFE),
+	MK_64(0xD798CBF3, 0xB468FDDA),
+	MK_64(0x1BC4A666, 0x8A0E4465),
+	MK_64(0x7ED7D434, 0xE5807407),
+	MK_64(0x548FC1AC, 0xD4EC44D6),
+	MK_64(0x266E1754, 0x6AA18FF8)
+	};
 
 /* blkSize =  512 bits. hashSize =  512 bits */
 const u64 SKEIN_512_IV_512[] =
-    {
-    MK_64(0x4903ADFF, 0x749C51CE),
-    MK_64(0x0D95DE39, 0x9746DF03),
-    MK_64(0x8FD19341, 0x27C79BCE),
-    MK_64(0x9A255629, 0xFF352CB1),
-    MK_64(0x5DB62599, 0xDF6CA7B0),
-    MK_64(0xEABE394C, 0xA9D5C3F4),
-    MK_64(0x991112C7, 0x1A75B523),
-    MK_64(0xAE18A40B, 0x660FCC33)
-    };
+	{
+	MK_64(0x4903ADFF, 0x749C51CE),
+	MK_64(0x0D95DE39, 0x9746DF03),
+	MK_64(0x8FD19341, 0x27C79BCE),
+	MK_64(0x9A255629, 0xFF352CB1),
+	MK_64(0x5DB62599, 0xDF6CA7B0),
+	MK_64(0xEABE394C, 0xA9D5C3F4),
+	MK_64(0x991112C7, 0x1A75B523),
+	MK_64(0xAE18A40B, 0x660FCC33)
+	};
 
 /* blkSize = 1024 bits. hashSize =  384 bits */
 const u64 SKEIN1024_IV_384[] =
-    {
-    MK_64(0x5102B6B8, 0xC1894A35),
-    MK_64(0xFEEBC9E3, 0xFE8AF11A),
-    MK_64(0x0C807F06, 0xE32BED71),
-    MK_64(0x60C13A52, 0xB41A91F6),
-    MK_64(0x9716D35D, 0xD4917C38),
-    MK_64(0xE780DF12, 0x6FD31D3A),
-    MK_64(0x797846B6, 0xC898303A),
-    MK_64(0xB172C2A8, 0xB3572A3B),
-    MK_64(0xC9BC8203, 0xA6104A6C),
-    MK_64(0x65909338, 0xD75624F4),
-    MK_64(0x94BCC568, 0x4B3F81A0),
-    MK_64(0x3EBBF51E, 0x10ECFD46),
-    MK_64(0x2DF50F0B, 0xEEB08542),
-    MK_64(0x3B5A6530, 0x0DBC6516),
-    MK_64(0x484B9CD2, 0x167BBCE1),
-    MK_64(0x2D136947, 0xD4CBAFEA)
-    };
+	{
+	MK_64(0x5102B6B8, 0xC1894A35),
+	MK_64(0xFEEBC9E3, 0xFE8AF11A),
+	MK_64(0x0C807F06, 0xE32BED71),
+	MK_64(0x60C13A52, 0xB41A91F6),
+	MK_64(0x9716D35D, 0xD4917C38),
+	MK_64(0xE780DF12, 0x6FD31D3A),
+	MK_64(0x797846B6, 0xC898303A),
+	MK_64(0xB172C2A8, 0xB3572A3B),
+	MK_64(0xC9BC8203, 0xA6104A6C),
+	MK_64(0x65909338, 0xD75624F4),
+	MK_64(0x94BCC568, 0x4B3F81A0),
+	MK_64(0x3EBBF51E, 0x10ECFD46),
+	MK_64(0x2DF50F0B, 0xEEB08542),
+	MK_64(0x3B5A6530, 0x0DBC6516),
+	MK_64(0x484B9CD2, 0x167BBCE1),
+	MK_64(0x2D136947, 0xD4CBAFEA)
+	};
 
 /* blkSize = 1024 bits. hashSize =  512 bits */
 const u64 SKEIN1024_IV_512[] =
-    {
-    MK_64(0xCAEC0E5D, 0x7C1B1B18),
-    MK_64(0xA01B0E04, 0x5F03E802),
-    MK_64(0x33840451, 0xED912885),
-    MK_64(0x374AFB04, 0xEAEC2E1C),
-    MK_64(0xDF25A0E2, 0x813581F7),
-    MK_64(0xE4004093, 0x8B12F9D2),
-    MK_64(0xA662D539, 0xC2ED39B6),
-    MK_64(0xFA8B85CF, 0x45D8C75A),
-    MK_64(0x8316ED8E, 0x29EDE796),
-    MK_64(0x053289C0, 0x2E9F91B8),
-    MK_64(0xC3F8EF1D, 0x6D518B73),
-    MK_64(0xBDCEC3C4, 0xD5EF332E),
-    MK_64(0x549A7E52, 0x22974487),
-    MK_64(0x67070872, 0x5B749816),
-    MK_64(0xB9CD28FB, 0xF0581BD1),
-    MK_64(0x0E2940B8, 0x15804974)
-    };
+	{
+	MK_64(0xCAEC0E5D, 0x7C1B1B18),
+	MK_64(0xA01B0E04, 0x5F03E802),
+	MK_64(0x33840451, 0xED912885),
+	MK_64(0x374AFB04, 0xEAEC2E1C),
+	MK_64(0xDF25A0E2, 0x813581F7),
+	MK_64(0xE4004093, 0x8B12F9D2),
+	MK_64(0xA662D539, 0xC2ED39B6),
+	MK_64(0xFA8B85CF, 0x45D8C75A),
+	MK_64(0x8316ED8E, 0x29EDE796),
+	MK_64(0x053289C0, 0x2E9F91B8),
+	MK_64(0xC3F8EF1D, 0x6D518B73),
+	MK_64(0xBDCEC3C4, 0xD5EF332E),
+	MK_64(0x549A7E52, 0x22974487),
+	MK_64(0x67070872, 0x5B749816),
+	MK_64(0xB9CD28FB, 0xF0581BD1),
+	MK_64(0x0E2940B8, 0x15804974)
+	};
 
 /* blkSize = 1024 bits. hashSize = 1024 bits */
 const u64 SKEIN1024_IV_1024[] =
-    {
-    MK_64(0xD593DA07, 0x41E72355),
-    MK_64(0x15B5E511, 0xAC73E00C),
-    MK_64(0x5180E5AE, 0xBAF2C4F0),
-    MK_64(0x03BD41D3, 0xFCBCAFAF),
-    MK_64(0x1CAEC6FD, 0x1983A898),
-    MK_64(0x6E510B8B, 0xCDD0589F),
-    MK_64(0x77E2BDFD, 0xC6394ADA),
-    MK_64(0xC11E1DB5, 0x24DCB0A3),
-    MK_64(0xD6D14AF9, 0xC6329AB5),
-    MK_64(0x6A9B0BFC, 0x6EB67E0D),
-    MK_64(0x9243C60D, 0xCCFF1332),
-    MK_64(0x1A1F1DDE, 0x743F02D4),
-    MK_64(0x0996753C, 0x10ED0BB8),
-    MK_64(0x6572DD22, 0xF2B4969A),
-    MK_64(0x61FD3062, 0xD00A579A),
-    MK_64(0x1DE0536E, 0x8682E539)
-    };
+	{
+	MK_64(0xD593DA07, 0x41E72355),
+	MK_64(0x15B5E511, 0xAC73E00C),
+	MK_64(0x5180E5AE, 0xBAF2C4F0),
+	MK_64(0x03BD41D3, 0xFCBCAFAF),
+	MK_64(0x1CAEC6FD, 0x1983A898),
+	MK_64(0x6E510B8B, 0xCDD0589F),
+	MK_64(0x77E2BDFD, 0xC6394ADA),
+	MK_64(0xC11E1DB5, 0x24DCB0A3),
+	MK_64(0xD6D14AF9, 0xC6329AB5),
+	MK_64(0x6A9B0BFC, 0x6EB67E0D),
+	MK_64(0x9243C60D, 0xCCFF1332),
+	MK_64(0x1A1F1DDE, 0x743F02D4),
+	MK_64(0x0996753C, 0x10ED0BB8),
+	MK_64(0x6572DD22, 0xF2B4969A),
+	MK_64(0x61FD3062, 0xD00A579A),
+	MK_64(0x1DE0536E, 0x8682E539)
+	};
 
 #endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 1f9e6e14f50b..199257e37813 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -33,125 +33,125 @@
 
 #define KeyScheduleConst 0x1BD11BDAA9FC1A22L
 
-    /**
-     * Which Threefish size to use
-     */
-    enum threefish_size {
-        Threefish256 = 256,     /*!< Skein with 256 bit state */
-        Threefish512 = 512,     /*!< Skein with 512 bit state */
-        Threefish1024 = 1024    /*!< Skein with 1024 bit state */
-    };
-    
-    /**
-     * Context for Threefish key and tweak words.
-     * 
-     * This structure was setup with some know-how of the internal
-     * Skein structures, in particular ordering of header and size dependent
-     * variables. If Skein implementation changes this, the adapt these
-     * structures as well.
-     */
-    struct threefish_key {
-        u64 stateSize;
-        u64 key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
-        u64 tweak[3];
-    };
+/**
+ * Which Threefish size to use
+ */
+enum threefish_size {
+	Threefish256 = 256,     /*!< Skein with 256 bit state */
+	Threefish512 = 512,     /*!< Skein with 512 bit state */
+	Threefish1024 = 1024    /*!< Skein with 1024 bit state */
+};
+
+/**
+ * Context for Threefish key and tweak words.
+ * 
+ * This structure was setup with some know-how of the internal
+ * Skein structures, in particular ordering of header and size dependent
+ * variables. If Skein implementation changes this, the adapt these
+ * structures as well.
+ */
+struct threefish_key {
+	u64 stateSize;
+	u64 key[SKEIN_MAX_STATE_WORDS+1];   /* max number of key words*/
+	u64 tweak[3];
+};
+
+/**
+ * Set Threefish key and tweak data.
+ * 
+ * This function sets the key and tweak data for the Threefish cipher of
+ * the given size. The key data must have the same length (number of bits)
+ * as the state size 
+ *
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param size
+ *     Which Skein size to use.
+ * @param keyData
+ *     Pointer to the key words (word has 64 bits).
+ * @param tweak
+ *     Pointer to the two tweak words (word has 64 bits).
+ */
+void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize, u64 *keyData, u64 *tweak);
+
+/**
+ * Encrypt Threefisch block (bytes).
+ * 
+ * The buffer must have at least the same length (number of bits) aas the 
+ * state size for this key. The function uses the first @c stateSize bits
+ * of the input buffer, encrypts them and stores the result in the output
+ * buffer.
+ * 
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param in
+ *     Poionter to plaintext data buffer.
+ * @param out
+ *     Pointer to cipher buffer.
+ */
+void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
-    /**
-     * Set Threefish key and tweak data.
-     * 
-     * This function sets the key and tweak data for the Threefish cipher of
-     * the given size. The key data must have the same length (number of bits)
-     * as the state size 
-     *
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param size
-     *     Which Skein size to use.
-     * @param keyData
-     *     Pointer to the key words (word has 64 bits).
-     * @param tweak
-     *     Pointer to the two tweak words (word has 64 bits).
-     */
-    void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize, u64 *keyData, u64 *tweak);
-    
-    /**
-     * Encrypt Threefisch block (bytes).
-     * 
-     * The buffer must have at least the same length (number of bits) aas the 
-     * state size for this key. The function uses the first @c stateSize bits
-     * of the input buffer, encrypts them and stores the result in the output
-     * buffer.
-     * 
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param in
-     *     Poionter to plaintext data buffer.
-     * @param out
-     *     Pointer to cipher buffer.
-     */
-    void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
-    
-    /**
-     * Encrypt Threefisch block (words).
-     * 
-     * The buffer must have at least the same length (number of bits) aas the 
-     * state size for this key. The function uses the first @c stateSize bits
-     * of the input buffer, encrypts them and stores the result in the output
-     * buffer.
-     * 
-     * The wordsize ist set to 64 bits.
-     * 
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param in
-     *     Poionter to plaintext data buffer.
-     * @param out
-     *     Pointer to cipher buffer.
-     */
-    void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
+/**
+ * Encrypt Threefisch block (words).
+ * 
+ * The buffer must have at least the same length (number of bits) aas the 
+ * state size for this key. The function uses the first @c stateSize bits
+ * of the input buffer, encrypts them and stores the result in the output
+ * buffer.
+ * 
+ * The wordsize ist set to 64 bits.
+ * 
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param in
+ *     Poionter to plaintext data buffer.
+ * @param out
+ *     Pointer to cipher buffer.
+ */
+void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
 
-    /**
-     * Decrypt Threefisch block (bytes).
-     * 
-     * The buffer must have at least the same length (number of bits) aas the 
-     * state size for this key. The function uses the first @c stateSize bits
-     * of the input buffer, decrypts them and stores the result in the output
-     * buffer
-     * 
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param in
-     *     Poionter to cipher data buffer.
-     * @param out
-     *     Pointer to plaintext buffer.
-     */
-    void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
+/**
+ * Decrypt Threefisch block (bytes).
+ * 
+ * The buffer must have at least the same length (number of bits) aas the 
+ * state size for this key. The function uses the first @c stateSize bits
+ * of the input buffer, decrypts them and stores the result in the output
+ * buffer
+ * 
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param in
+ *     Poionter to cipher data buffer.
+ * @param out
+ *     Pointer to plaintext buffer.
+ */
+void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
-    /**
-     * Decrypt Threefisch block (words).
-     * 
-     * The buffer must have at least the same length (number of bits) aas the 
-     * state size for this key. The function uses the first @c stateSize bits
-     * of the input buffer, encrypts them and stores the result in the output
-     * buffer.
-     * 
-     * The wordsize ist set to 64 bits.
-     * 
-     * @param keyCtx
-     *     Pointer to a Threefish key structure.
-     * @param in
-     *     Poionter to cipher data buffer.
-     * @param out
-     *     Pointer to plaintext buffer.
-     */
-    void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
+/**
+ * Decrypt Threefisch block (words).
+ * 
+ * The buffer must have at least the same length (number of bits) aas the 
+ * state size for this key. The function uses the first @c stateSize bits
+ * of the input buffer, encrypts them and stores the result in the output
+ * buffer.
+ * 
+ * The wordsize ist set to 64 bits.
+ * 
+ * @param keyCtx
+ *     Pointer to a Threefish key structure.
+ * @param in
+ *     Poionter to cipher data buffer.
+ * @param out
+ *     Pointer to plaintext buffer.
+ */
+void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
 
-    void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
-    void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index e2e5685157a0..3f0f32806181 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -28,49 +28,49 @@ void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, siz
 /* init the context for a straight hashing operation  */
 int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 {
-    union
-    {
-        u8  b[SKEIN_256_STATE_BYTES];
-        u64  w[SKEIN_256_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {             /* use pre-computed values, where available */
-    case  256:
-        memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
-        break;
-    case  224:
-        memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
-        break;
-    case  160:
-        memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
-        break;
-    case  128:
-        memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN_256_STATE_BYTES];
+		u64  w[SKEIN_256_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+	switch (hashBitLen)
+	{             /* use pre-computed values, where available */
+	case  256:
+		memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
+		break;
+	case  224:
+		memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
+		break;
+	case  160:
+		memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
+		break;
+	case  128:
+		memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
+		break;
+	default:
+		/* here if there is no precomputed IV value available */
+		/* build/process the config block, type == CONFIG (could be precomputed) */
+		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+		/* compute the initial chaining values from config block */
+		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+		break;
+	}
+	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
@@ -78,133 +78,133 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 /* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
 int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-    union
-    {
-        u8  b[SKEIN_256_STATE_BYTES];
-        u64  w[SKEIN_256_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(256, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN_256_STATE_BYTES];
+		u64  w[SKEIN_256_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+	/* compute the initial chaining values ctx->X[], based on key */
+	if (keyBytes == 0)                          /* is there a key? */
+	{
+		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+	}
+	else                                        /* here to pre-process a key */
+	{
+		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		/* do a mini-Init right here */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+		Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
+		Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+	}
+	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
+	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	Skein_Start_New_Type(ctx, CFG_FINAL);
+
+	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+	Skein_Show_Key(256, &ctx->h, key, keyBytes);
+
+	/* compute the initial chaining values from config block */
+	Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+	/* The chaining vars ctx->X are now initialized */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
 int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
-            Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
-            msg        += n * SKEIN_256_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
+	size_t n;
+
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* process full blocks, if any */
+	if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
+	{
+		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		{
+			n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			if (n)
+			{
+				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+				msgByteCnt  -= n;
+				msg         += n;
+				ctx->h.bCnt += n;
+			}
+			Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
+			Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
+			ctx->h.bCnt = 0;
+		}
+		/* now process any remaining full blocks, directly from input message data */
+		if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
+		{
+			n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
+			Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
+			msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
+			msg        += n * SKEIN_256_BLOCK_BYTES;
+		}
+		Skein_assert(ctx->h.bCnt == 0);
+	}
+
+	/* copy any remaining source message data bytes into b[] */
+	if (msgByteCnt)
+	{
+		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
+		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+		ctx->h.bCnt += msgByteCnt;
+	}
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
 int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_256_BLOCK_BYTES)
-            n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN_256_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+
+	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN_256_BLOCK_BYTES)
+			n  = SKEIN_256_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /*****************************************************************/
@@ -215,50 +215,50 @@ int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 /* init the context for a straight hashing operation  */
 int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 {
-    union
-    {
-        u8  b[SKEIN_512_STATE_BYTES];
-        u64  w[SKEIN_512_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {             /* use pre-computed values, where available */
-    case  512:
-        memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
-        break;
-    case  384:
-        memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
-        break;
-    case  256:
-        memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
-        break;
-    case  224:
-        memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN_512_STATE_BYTES];
+		u64  w[SKEIN_512_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+	switch (hashBitLen)
+	{             /* use pre-computed values, where available */
+	case  512:
+		memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
+		break;
+	case  384:
+		memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
+		break;
+	case  256:
+		memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
+		break;
+	case  224:
+		memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
+		break;
+	default:
+		/* here if there is no precomputed IV value available */
+		/* build/process the config block, type == CONFIG (could be precomputed) */
+		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+		/* compute the initial chaining values from config block */
+		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+		break;
+	}
+
+	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
@@ -266,133 +266,133 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 /* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
 int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-    union
-    {
-        u8  b[SKEIN_512_STATE_BYTES];
-        u64  w[SKEIN_512_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(512, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN_512_STATE_BYTES];
+		u64  w[SKEIN_512_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+	/* compute the initial chaining values ctx->X[], based on key */
+	if (keyBytes == 0)                          /* is there a key? */
+	{
+		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+	}
+	else                                        /* here to pre-process a key */
+	{
+		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		/* do a mini-Init right here */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+		Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
+		Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+	}
+	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
+	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	Skein_Start_New_Type(ctx, CFG_FINAL);
+
+	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+	Skein_Show_Key(512, &ctx->h, key, keyBytes);
+
+	/* compute the initial chaining values from config block */
+	Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+	/* The chaining vars ctx->X are now initialized */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
 int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
-            Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
-            msg        += n * SKEIN_512_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
+	size_t n;
+
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* process full blocks, if any */
+	if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
+	{
+		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		{
+			n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			if (n)
+			{
+				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+				msgByteCnt  -= n;
+				msg         += n;
+				ctx->h.bCnt += n;
+			}
+			Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
+			Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
+			ctx->h.bCnt = 0;
+		}
+		/* now process any remaining full blocks, directly from input message data */
+		if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
+		{
+			n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
+			Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
+			msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
+			msg        += n * SKEIN_512_BLOCK_BYTES;
+		}
+		Skein_assert(ctx->h.bCnt == 0);
+	}
+
+	/* copy any remaining source message data bytes into b[] */
+	if (msgByteCnt)
+	{
+		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
+		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+		ctx->h.bCnt += msgByteCnt;
+	}
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
 int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_512_BLOCK_BYTES)
-            n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN_512_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+
+	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN_512_BLOCK_BYTES)
+			n  = SKEIN_512_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /*****************************************************************/
@@ -403,47 +403,47 @@ int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 /* init the context for a straight hashing operation  */
 int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 {
-    union
-    {
-        u8  b[SKEIN1024_STATE_BYTES];
-        u64  w[SKEIN1024_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
-
-    switch (hashBitLen)
-    {              /* use pre-computed values, where available */
-    case  512:
-        memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
-        break;
-    case  384:
-        memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
-        break;
-    case 1024:
-        memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
-        break;
-    default:
-        /* here if there is no precomputed IV value available */
-        /* build/process the config block, type == CONFIG (could be precomputed) */
-        Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-        cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-        cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-        cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-        memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
-
-        /* compute the initial chaining values from config block */
-        memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
-        Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-        break;
-    }
-
-    /* The chaining vars ctx->X are now initialized for the given hashBitLen. */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN1024_STATE_BYTES];
+		u64  w[SKEIN1024_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
+
+	switch (hashBitLen)
+	{              /* use pre-computed values, where available */
+	case  512:
+		memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
+		break;
+	case  384:
+		memcpy(ctx->X, SKEIN1024_IV_384, sizeof(ctx->X));
+		break;
+	case 1024:
+		memcpy(ctx->X, SKEIN1024_IV_1024, sizeof(ctx->X));
+		break;
+	default:
+		/* here if there is no precomputed IV value available */
+		/* build/process the config block, type == CONFIG (could be precomputed) */
+		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
+
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
+		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+
+		/* compute the initial chaining values from config block */
+		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+		break;
+	}
+
+	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
@@ -451,133 +451,133 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 /* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
 int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-    union
-    {
-        u8  b[SKEIN1024_STATE_BYTES];
-        u64  w[SKEIN1024_STATE_WORDS];
-    } cfg;                              /* config block */
-
-    Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
-    Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
-
-    /* compute the initial chaining values ctx->X[], based on key */
-    if (keyBytes == 0)                          /* is there a key? */
-    {
-        memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
-    }
-    else                                        /* here to pre-process a key */
-    {
-        Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
-        /* do a mini-Init right here */
-        ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-        Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-        memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-        Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
-        Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-        memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
-    }
-    /* build/process the config block, type == CONFIG (could be precomputed for each key) */
-    ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
-    Skein_Start_New_Type(ctx, CFG_FINAL);
-
-    memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
-    cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-    cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-    cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
-
-    Skein_Show_Key(1024, &ctx->h, key, keyBytes);
-
-    /* compute the initial chaining values from config block */
-    Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
-
-    /* The chaining vars ctx->X are now initialized */
-    /* Set up to process the data message portion of the hash (default) */
-    Skein_Start_New_Type(ctx, MSG);
-
-    return SKEIN_SUCCESS;
+	union
+	{
+		u8  b[SKEIN1024_STATE_BYTES];
+		u64  w[SKEIN1024_STATE_WORDS];
+	} cfg;                              /* config block */
+
+	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
+	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
+
+	/* compute the initial chaining values ctx->X[], based on key */
+	if (keyBytes == 0)                          /* is there a key? */
+	{
+		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+	}
+	else                                        /* here to pre-process a key */
+	{
+		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		/* do a mini-Init right here */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
+		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
+		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
+		Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
+		Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+	}
+	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
+	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	Skein_Start_New_Type(ctx, CFG_FINAL);
+
+	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+
+	Skein_Show_Key(1024, &ctx->h, key, keyBytes);
+
+	/* compute the initial chaining values from config block */
+	Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
+
+	/* The chaining vars ctx->X are now initialized */
+	/* Set up to process the data message portion of the hash (default) */
+	Skein_Start_New_Type(ctx, MSG);
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
 int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
 {
-    size_t n;
-
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* process full blocks, if any */
-    if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
-    {
-        if (ctx->h.bCnt)                              /* finish up any buffered message data */
-        {
-            n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
-            if (n)
-            {
-                Skein_assert(n < msgByteCnt);         /* check on our logic here */
-                memcpy(&ctx->b[ctx->h.bCnt], msg, n);
-                msgByteCnt  -= n;
-                msg         += n;
-                ctx->h.bCnt += n;
-            }
-            Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
-            Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
-            ctx->h.bCnt = 0;
-        }
-        /* now process any remaining full blocks, directly from input message data */
-        if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
-        {
-            n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
-            Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
-            msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
-            msg        += n * SKEIN1024_BLOCK_BYTES;
-        }
-        Skein_assert(ctx->h.bCnt == 0);
-    }
-
-    /* copy any remaining source message data bytes into b[] */
-    if (msgByteCnt)
-    {
-        Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
-        memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
-        ctx->h.bCnt += msgByteCnt;
-    }
-
-    return SKEIN_SUCCESS;
+	size_t n;
+
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* process full blocks, if any */
+	if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
+	{
+		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		{
+			n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			if (n)
+			{
+				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
+				msgByteCnt  -= n;
+				msg         += n;
+				ctx->h.bCnt += n;
+			}
+			Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
+			Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
+			ctx->h.bCnt = 0;
+		}
+		/* now process any remaining full blocks, directly from input message data */
+		if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
+		{
+			n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
+			Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
+			msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
+			msg        += n * SKEIN1024_BLOCK_BYTES;
+		}
+		Skein_assert(ctx->h.bCnt == 0);
+	}
+
+	/* copy any remaining source message data bytes into b[] */
+	if (msgByteCnt)
+	{
+		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
+		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
+		ctx->h.bCnt += msgByteCnt;
+	}
+
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the result */
 int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-
-    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN1024_BLOCK_BYTES)
-            n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN1024_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+
+	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN1024_BLOCK_BYTES)
+			n  = SKEIN1024_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /**************** Functions to support MAC/tree hashing ***************/
@@ -587,48 +587,48 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
 
-    return SKEIN_SUCCESS;
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-    Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
 
-    return SKEIN_SUCCESS;
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
 
-    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-    if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
-        memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-    Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
+	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
+		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
 
-    Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
 
-    return SKEIN_SUCCESS;
+	return SKEIN_SUCCESS;
 }
 
 #if SKEIN_TREE_HASH
@@ -636,86 +636,86 @@ int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 /* just do the OUTPUT stage                                       */
 int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_256_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_256_BLOCK_BYTES)
-            n  = SKEIN_256_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN_256_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN_256_BLOCK_BYTES)
+			n  = SKEIN_256_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
 int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN_512_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN_512_BLOCK_BYTES)
-            n  = SKEIN_512_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN_512_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN_512_BLOCK_BYTES)
+			n  = SKEIN_512_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* just do the OUTPUT stage                                       */
 int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-    size_t i, n, byteCnt;
-    u64 X[SKEIN1024_STATE_WORDS];
-    Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
-
-    /* now output the result */
-    byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
-
-    /* run Threefish in "counter mode" to generate output */
-    memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-    memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
-    for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-    {
-        ((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
-        Skein_Start_New_Type(ctx, OUT_FINAL);
-        Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-        n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
-        if (n >= SKEIN1024_BLOCK_BYTES)
-            n  = SKEIN1024_BLOCK_BYTES;
-        Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-        Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-        memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
-    }
-    return SKEIN_SUCCESS;
+	size_t i, n, byteCnt;
+	u64 X[SKEIN1024_STATE_WORDS];
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+
+	/* now output the result */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+
+	/* run Threefish in "counter mode" to generate output */
+	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
+	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
+	{
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		Skein_Start_New_Type(ctx, OUT_FINAL);
+		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
+		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+		if (n >= SKEIN1024_BLOCK_BYTES)
+			n  = SKEIN1024_BLOCK_BYTES;
+		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
+		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
+		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+	}
+	return SKEIN_SUCCESS;
 }
 #endif
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index a3f471be8db3..3ebb1d60ef93 100644
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -29,191 +29,191 @@ OTHER DEALINGS IN THE SOFTWARE.
 
 int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size)
 {
-    Skein_Assert(ctx && size, SKEIN_FAIL);
+	Skein_Assert(ctx && size, SKEIN_FAIL);
 
-    memset(ctx , 0, sizeof(struct skein_ctx));
-    ctx->skeinSize = size;
+	memset(ctx , 0, sizeof(struct skein_ctx));
+	ctx->skeinSize = size;
 
-    return SKEIN_SUCCESS;
+	return SKEIN_SUCCESS;
 }
 
 int skeinInit(struct skein_ctx *ctx, size_t hashBitLen)
 {
-    int ret = SKEIN_FAIL;
-    size_t Xlen = 0;
-    u64 *X = NULL;
-    u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
-
-    Skein_Assert(ctx, SKEIN_FAIL);
-    /*
-     * The following two lines rely of the fact that the real Skein contexts are
-     * a union in out context and thus have tha maximum memory available.
-     * The beauty of C :-) .
-     */
-    X = ctx->m.s256.X;
-    Xlen = ctx->skeinSize/8;
-    /*
-     * If size is the same and hash bit length is zero then reuse
-     * the save chaining variables.
-     */
-    switch (ctx->skeinSize) {
-    case Skein256:
-        ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
-                                treeInfo, NULL, 0);
-        break;
-    case Skein512:
-        ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
-                                treeInfo, NULL, 0);
-        break;
-    case Skein1024:
-        ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
-                                treeInfo, NULL, 0);
-        break;
-    }
-
-    if (ret == SKEIN_SUCCESS) {
-        /* Save chaining variables for this combination of size and hashBitLen */
-        memcpy(ctx->XSave, X, Xlen);
-    }
-    return ret;
+	int ret = SKEIN_FAIL;
+	size_t Xlen = 0;
+	u64 *X = NULL;
+	u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+
+	Skein_Assert(ctx, SKEIN_FAIL);
+	/*
+	 * The following two lines rely of the fact that the real Skein contexts are
+	 * a union in out context and thus have tha maximum memory available.
+	 * The beauty of C :-) .
+	 */
+	X = ctx->m.s256.X;
+	Xlen = ctx->skeinSize/8;
+	/*
+	 * If size is the same and hash bit length is zero then reuse
+	 * the save chaining variables.
+	 */
+	switch (ctx->skeinSize) {
+	case Skein256:
+		ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
+					treeInfo, NULL, 0);
+		break;
+	case Skein512:
+		ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
+					treeInfo, NULL, 0);
+		break;
+	case Skein1024:
+		ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
+					treeInfo, NULL, 0);
+		break;
+	}
+
+	if (ret == SKEIN_SUCCESS) {
+		/* Save chaining variables for this combination of size and hashBitLen */
+		memcpy(ctx->XSave, X, Xlen);
+	}
+	return ret;
 }
 
 int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
-                 size_t hashBitLen)
+		size_t hashBitLen)
 {
-    int ret = SKEIN_FAIL;
-    u64 *X = NULL;
-    size_t Xlen = 0;
-    u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
-
-    Skein_Assert(ctx, SKEIN_FAIL);
-
-    X = ctx->m.s256.X;
-    Xlen = ctx->skeinSize/8;
-
-    Skein_Assert(hashBitLen, SKEIN_BAD_HASHLEN);
-
-    switch (ctx->skeinSize) {
-    case Skein256:
-        ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
-                                treeInfo,
-                                (const u8 *)key, keyLen);
-
-        break;
-    case Skein512:
-        ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
-                                treeInfo,
-                                (const u8 *)key, keyLen);
-        break;
-    case Skein1024:
-        ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
-                                treeInfo,
-                                (const u8 *)key, keyLen);
-
-        break;
-    }
-    if (ret == SKEIN_SUCCESS) {
-        /* Save chaining variables for this combination of key, keyLen, hashBitLen */
-        memcpy(ctx->XSave, X, Xlen);
-    }
-    return ret;
+	int ret = SKEIN_FAIL;
+	u64 *X = NULL;
+	size_t Xlen = 0;
+	u64 treeInfo = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
+
+	Skein_Assert(ctx, SKEIN_FAIL);
+
+	X = ctx->m.s256.X;
+	Xlen = ctx->skeinSize/8;
+
+	Skein_Assert(hashBitLen, SKEIN_BAD_HASHLEN);
+
+	switch (ctx->skeinSize) {
+	case Skein256:
+		ret = Skein_256_InitExt(&ctx->m.s256, hashBitLen,
+					treeInfo,
+					(const u8 *)key, keyLen);
+
+		break;
+	case Skein512:
+		ret = Skein_512_InitExt(&ctx->m.s512, hashBitLen,
+					treeInfo,
+					(const u8 *)key, keyLen);
+		break;
+	case Skein1024:
+		ret = Skein1024_InitExt(&ctx->m.s1024, hashBitLen,
+					treeInfo,
+					(const u8 *)key, keyLen);
+
+		break;
+	}
+	if (ret == SKEIN_SUCCESS) {
+		/* Save chaining variables for this combination of key, keyLen, hashBitLen */
+		memcpy(ctx->XSave, X, Xlen);
+	}
+	return ret;
 }
 
 void skeinReset(struct skein_ctx *ctx)
 {
-    size_t Xlen = 0;
-    u64 *X = NULL;
-
-    /*
-     * The following two lines rely of the fact that the real Skein contexts are
-     * a union in out context and thus have tha maximum memory available.
-     * The beautiy of C :-) .
-     */
-    X = ctx->m.s256.X;
-    Xlen = ctx->skeinSize/8;
-    /* Restore the chaing variable, reset byte counter */
-    memcpy(X, ctx->XSave, Xlen);
-
-    /* Setup context to process the message */
-    Skein_Start_New_Type(&ctx->m, MSG);
+	size_t Xlen = 0;
+	u64 *X = NULL;
+
+	/*
+	 * The following two lines rely of the fact that the real Skein contexts are
+	 * a union in out context and thus have tha maximum memory available.
+	 * The beautiy of C :-) .
+	 */
+	X = ctx->m.s256.X;
+	Xlen = ctx->skeinSize/8;
+	/* Restore the chaing variable, reset byte counter */
+	memcpy(X, ctx->XSave, Xlen);
+
+	/* Setup context to process the message */
+	Skein_Start_New_Type(&ctx->m, MSG);
 }
 
 int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
-                size_t msgByteCnt)
+		size_t msgByteCnt)
 {
-    int ret = SKEIN_FAIL;
-    Skein_Assert(ctx, SKEIN_FAIL);
-
-    switch (ctx->skeinSize) {
-    case Skein256:
-        ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg, msgByteCnt);
-        break;
-    case Skein512:
-        ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg, msgByteCnt);
-        break;
-    case Skein1024:
-        ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg, msgByteCnt);
-        break;
-    }
-    return ret;
+	int ret = SKEIN_FAIL;
+	Skein_Assert(ctx, SKEIN_FAIL);
+
+	switch (ctx->skeinSize) {
+	case Skein256:
+		ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg, msgByteCnt);
+		break;
+	case Skein512:
+		ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg, msgByteCnt);
+		break;
+	case Skein1024:
+		ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg, msgByteCnt);
+		break;
+	}
+	return ret;
 
 }
 
 int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
-                    size_t msgBitCnt)
+			size_t msgBitCnt)
 {
-    /*
-     * I've used the bit pad implementation from skein_test.c (see NIST CD)
-     * and modified it to use the convenience functions and added some pointer
-     * arithmetic.
-     */
-    size_t length;
-    u8 mask;
-    u8 *up;
-
-    /* only the final Update() call is allowed do partial bytes, else assert an error */
-    Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
-
-    /* if number of bits is a multiple of bytes - that's easy */
-    if ((msgBitCnt & 0x7) == 0) {
-        return skeinUpdate(ctx, msg, msgBitCnt >> 3);
-    }
-    skeinUpdate(ctx, msg, (msgBitCnt >> 3) + 1);
-
-    /*
-     * The next line rely on the fact that the real Skein contexts
-     * are a union in our context. After the addition the pointer points to
-     * Skein's real partial block buffer.
-     * If this layout ever changes we have to adapt this as well.
-     */
-    up = (u8 *)ctx->m.s256.X + ctx->skeinSize / 8;
-
-    Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
-
-    /* now "pad" the final partial byte the way NIST likes */
-    length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
-    Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
-    mask = (u8) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
-    up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
-
-    return SKEIN_SUCCESS;
+	/*
+	 * I've used the bit pad implementation from skein_test.c (see NIST CD)
+	 * and modified it to use the convenience functions and added some pointer
+	 * arithmetic.
+	 */
+	size_t length;
+	u8 mask;
+	u8 *up;
+
+	/* only the final Update() call is allowed do partial bytes, else assert an error */
+	Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
+
+	/* if number of bits is a multiple of bytes - that's easy */
+	if ((msgBitCnt & 0x7) == 0) {
+		return skeinUpdate(ctx, msg, msgBitCnt >> 3);
+	}
+	skeinUpdate(ctx, msg, (msgBitCnt >> 3) + 1);
+
+	/*
+	 * The next line rely on the fact that the real Skein contexts
+	 * are a union in our context. After the addition the pointer points to
+	 * Skein's real partial block buffer.
+	 * If this layout ever changes we have to adapt this as well.
+	 */
+	up = (u8 *)ctx->m.s256.X + ctx->skeinSize / 8;
+
+	Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
+
+	/* now "pad" the final partial byte the way NIST likes */
+	length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
+	Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
+	mask = (u8) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
+	up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
+
+	return SKEIN_SUCCESS;
 }
 
 int skeinFinal(struct skein_ctx *ctx, u8 *hash)
 {
-    int ret = SKEIN_FAIL;
-    Skein_Assert(ctx, SKEIN_FAIL);
-
-    switch (ctx->skeinSize) {
-    case Skein256:
-        ret = Skein_256_Final(&ctx->m.s256, (u8 *)hash);
-        break;
-    case Skein512:
-        ret = Skein_512_Final(&ctx->m.s512, (u8 *)hash);
-        break;
-    case Skein1024:
-        ret = Skein1024_Final(&ctx->m.s1024, (u8 *)hash);
-        break;
-    }
-    return ret;
+	int ret = SKEIN_FAIL;
+	Skein_Assert(ctx, SKEIN_FAIL);
+
+	switch (ctx->skeinSize) {
+	case Skein256:
+		ret = Skein_256_Final(&ctx->m.s256, (u8 *)hash);
+		break;
+	case Skein512:
+		ret = Skein_512_Final(&ctx->m.s512, (u8 *)hash);
+		break;
+	case Skein1024:
+		ret = Skein1024_Final(&ctx->m.s1024, (u8 *)hash);
+		break;
+	}
+	return ret;
 }
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index d98933eeb0bf..3c2878c966e1 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -6,167 +6,167 @@
 
 /*****************************  Skein_256 ******************************/
 void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
-                             size_t blkCnt, size_t byteCntAdd)
+				size_t blkCnt, size_t byteCntAdd)
 {
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
-    u64 words[3];
+	struct threefish_key key;
+	u64 tweak[2];
+	int i;
+	u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+	u64 words[3];
 
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	tweak[0] = ctx->h.T[0];
+	tweak[1] = ctx->h.T[1];
 
-    do  {
-        u64 carry = byteCntAdd;
+	do  {
+		u64 carry = byteCntAdd;
 
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
+		words[0] = tweak[0] & 0xffffffffL;
+		words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+		words[2] = (tweak[1] & 0xffffffffL);
 
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
+		for (i = 0; i < 3; i++) {
+			carry += words[i];
+			words[i] = carry;
+			carry >>= 32;
+		}        
+		tweak[0] = words[0] & 0xffffffffL;
+		tweak[0] |= (words[1] & 0xffffffffL) << 32;
+		tweak[1] |= words[2] & 0xffffffffL;
 
-        threefishSetKey(&key, Threefish256, ctx->X, tweak);
+		threefishSetKey(&key, Threefish256, ctx->X, tweak);
 
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
 
-        threefishEncryptBlockWords(&key, w, ctx->X);
+		threefishEncryptBlockWords(&key, w, ctx->X);
 
-        blkPtr += SKEIN_256_BLOCK_BYTES;
+		blkPtr += SKEIN_256_BLOCK_BYTES;
 
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = ctx->X[0] ^ w[0];
-        ctx->X[1] = ctx->X[1] ^ w[1];
-        ctx->X[2] = ctx->X[2] ^ w[2];
-        ctx->X[3] = ctx->X[3] ^ w[3];
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0] = ctx->X[0] ^ w[0];
+		ctx->X[1] = ctx->X[1] ^ w[1];
+		ctx->X[2] = ctx->X[2] ^ w[2];
+		ctx->X[3] = ctx->X[3] ^ w[3];
 
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
+		tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+	} while (--blkCnt);
 
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
+	ctx->h.T[0] = tweak[0];
+	ctx->h.T[1] = tweak[1];
 }
 
 void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
-                             size_t blkCnt, size_t byteCntAdd)
+				size_t blkCnt, size_t byteCntAdd)
 {
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64 words[3];
-    u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish512, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN_512_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = ctx->X[0] ^ w[0];
-        ctx->X[1] = ctx->X[1] ^ w[1];
-        ctx->X[2] = ctx->X[2] ^ w[2];
-        ctx->X[3] = ctx->X[3] ^ w[3];
-        ctx->X[4] = ctx->X[4] ^ w[4];
-        ctx->X[5] = ctx->X[5] ^ w[5];
-        ctx->X[6] = ctx->X[6] ^ w[6];
-        ctx->X[7] = ctx->X[7] ^ w[7];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
+	struct threefish_key key;
+	u64 tweak[2];
+	int i;
+	u64 words[3];
+	u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	tweak[0] = ctx->h.T[0];
+	tweak[1] = ctx->h.T[1];
+
+	do  {
+		u64 carry = byteCntAdd;
+
+		words[0] = tweak[0] & 0xffffffffL;
+		words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+		words[2] = (tweak[1] & 0xffffffffL);
+
+		for (i = 0; i < 3; i++) {
+			carry += words[i];
+			words[i] = carry;
+			carry >>= 32;
+		}        
+		tweak[0] = words[0] & 0xffffffffL;
+		tweak[0] |= (words[1] & 0xffffffffL) << 32;
+		tweak[1] |= words[2] & 0xffffffffL;
+
+		threefishSetKey(&key, Threefish512, ctx->X, tweak);
+
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
+
+		threefishEncryptBlockWords(&key, w, ctx->X);
+
+		blkPtr += SKEIN_512_BLOCK_BYTES;
+
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0] = ctx->X[0] ^ w[0];
+		ctx->X[1] = ctx->X[1] ^ w[1];
+		ctx->X[2] = ctx->X[2] ^ w[2];
+		ctx->X[3] = ctx->X[3] ^ w[3];
+		ctx->X[4] = ctx->X[4] ^ w[4];
+		ctx->X[5] = ctx->X[5] ^ w[5];
+		ctx->X[6] = ctx->X[6] ^ w[6];
+		ctx->X[7] = ctx->X[7] ^ w[7];
+
+		tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+	} while (--blkCnt);
+
+	ctx->h.T[0] = tweak[0];
+	ctx->h.T[1] = tweak[1];
 }
 
 void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
-                              size_t blkCnt, size_t byteCntAdd)
+				size_t blkCnt, size_t byteCntAdd)
 {
-    struct threefish_key key;
-    u64 tweak[2];
-    int i;
-    u64 words[3];
-    u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
-
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    tweak[0] = ctx->h.T[0];
-    tweak[1] = ctx->h.T[1];
-
-    do  {
-        u64 carry = byteCntAdd;
-
-        words[0] = tweak[0] & 0xffffffffL;
-        words[1] = ((tweak[0] >> 32) & 0xffffffffL);
-        words[2] = (tweak[1] & 0xffffffffL);
-
-        for (i = 0; i < 3; i++) {
-            carry += words[i];
-            words[i] = carry;
-            carry >>= 32;
-        }        
-        tweak[0] = words[0] & 0xffffffffL;
-        tweak[0] |= (words[1] & 0xffffffffL) << 32;
-        tweak[1] |= words[2] & 0xffffffffL;
-
-        threefishSetKey(&key, Threefish1024, ctx->X, tweak);
-
-        Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
-
-        threefishEncryptBlockWords(&key, w, ctx->X);
-
-        blkPtr += SKEIN1024_BLOCK_BYTES;
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0]  = ctx->X[0]  ^ w[0];
-        ctx->X[1]  = ctx->X[1]  ^ w[1];
-        ctx->X[2]  = ctx->X[2]  ^ w[2];
-        ctx->X[3]  = ctx->X[3]  ^ w[3];
-        ctx->X[4]  = ctx->X[4]  ^ w[4];
-        ctx->X[5]  = ctx->X[5]  ^ w[5];
-        ctx->X[6]  = ctx->X[6]  ^ w[6];
-        ctx->X[7]  = ctx->X[7]  ^ w[7];
-        ctx->X[8]  = ctx->X[8]  ^ w[8];
-        ctx->X[9]  = ctx->X[9]  ^ w[9];
-        ctx->X[10] = ctx->X[10] ^ w[10];
-        ctx->X[11] = ctx->X[11] ^ w[11];
-        ctx->X[12] = ctx->X[12] ^ w[12];
-        ctx->X[13] = ctx->X[13] ^ w[13];
-        ctx->X[14] = ctx->X[14] ^ w[14];
-        ctx->X[15] = ctx->X[15] ^ w[15];
-
-        tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
-    } while (--blkCnt);
-
-    ctx->h.T[0] = tweak[0];
-    ctx->h.T[1] = tweak[1];
+	struct threefish_key key;
+	u64 tweak[2];
+	int i;
+	u64 words[3];
+	u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	tweak[0] = ctx->h.T[0];
+	tweak[1] = ctx->h.T[1];
+
+	do  {
+		u64 carry = byteCntAdd;
+
+		words[0] = tweak[0] & 0xffffffffL;
+		words[1] = ((tweak[0] >> 32) & 0xffffffffL);
+		words[2] = (tweak[1] & 0xffffffffL);
+
+		for (i = 0; i < 3; i++) {
+			carry += words[i];
+			words[i] = carry;
+			carry >>= 32;
+		}        
+		tweak[0] = words[0] & 0xffffffffL;
+		tweak[0] |= (words[1] & 0xffffffffL) << 32;
+		tweak[1] |= words[2] & 0xffffffffL;
+
+		threefishSetKey(&key, Threefish1024, ctx->X, tweak);
+
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
+
+		threefishEncryptBlockWords(&key, w, ctx->X);
+
+		blkPtr += SKEIN1024_BLOCK_BYTES;
+
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0]  = ctx->X[0]  ^ w[0];
+		ctx->X[1]  = ctx->X[1]  ^ w[1];
+		ctx->X[2]  = ctx->X[2]  ^ w[2];
+		ctx->X[3]  = ctx->X[3]  ^ w[3];
+		ctx->X[4]  = ctx->X[4]  ^ w[4];
+		ctx->X[5]  = ctx->X[5]  ^ w[5];
+		ctx->X[6]  = ctx->X[6]  ^ w[6];
+		ctx->X[7]  = ctx->X[7]  ^ w[7];
+		ctx->X[8]  = ctx->X[8]  ^ w[8];
+		ctx->X[9]  = ctx->X[9]  ^ w[9];
+		ctx->X[10] = ctx->X[10] ^ w[10];
+		ctx->X[11] = ctx->X[11] ^ w[11];
+		ctx->X[12] = ctx->X[12] ^ w[12];
+		ctx->X[13] = ctx->X[13] ^ w[13];
+		ctx->X[14] = ctx->X[14] ^ w[14];
+		ctx->X[15] = ctx->X[15] ^ w[15];
+
+		tweak[1] &= ~SKEIN_T1_FLAG_FIRST;
+	} while (--blkCnt);
+
+	ctx->h.T[0] = tweak[0];
+	ctx->h.T[1] = tweak[1];
 }
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index e62b6442783e..bb36860fafdf 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -40,10 +40,10 @@
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
 void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C */
-    enum {
-        WCNT = SKEIN_256_STATE_WORDS
-        };
+	{ /* do it in C */
+	enum {
+		WCNT = SKEIN_256_STATE_WORDS
+	};
 #undef  RCNT
 #define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
 
@@ -57,177 +57,177 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 #if (RCNT % SKEIN_UNROLL_256)
 #error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
 #endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	size_t  r;
+	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
-    u64  w[WCNT];                           /* local copy of input block */
+	u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
+	u64  w[WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+	const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+	Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
 #endif
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	ts[0] = ctx->h.T[0];
+	ts[1] = ctx->h.T[1];
+	do  {
+		/* this implementation only supports 2**64 input bytes (no carry out here) */
+		ts[0] += byteCntAdd;                    /* update processed length */
 
-        /* precompute the key schedule for this block */
-        ks[0] = ctx->X[0];     
-        ks[1] = ctx->X[1];
-        ks[2] = ctx->X[2];
-        ks[3] = ctx->X[3];
-        ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
+		/* precompute the key schedule for this block */
+		ks[0] = ctx->X[0];     
+		ks[1] = ctx->X[1];
+		ks[2] = ctx->X[2];
+		ks[3] = ctx->X[3];
+		ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
 
-        ts[2] = ts[0] ^ ts[1];
+		ts[2] = ts[0] ^ ts[1];
 
-        Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+		Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
+		DebugSaveTweak(ctx);
+		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
-        X0 = w[0] + ks[0];                      /* do the first full key injection */
-        X1 = w[1] + ks[1] + ts[0];
-        X2 = w[2] + ks[2] + ts[1];
-        X3 = w[3] + ks[3];
+		X0 = w[0] + ks[0];                      /* do the first full key injection */
+		X1 = w[1] + ks[1] + ts[0];
+		X2 = w[2] + ks[2] + ts[1];
+		X3 = w[3] + ks[3];
 
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
 
-        blkPtr += SKEIN_256_BLOCK_BYTES;
+		blkPtr += SKEIN_256_BLOCK_BYTES;
 
-        /* run the rounds */
+		/* run the rounds */
 
 #define Round256(p0, p1, p2, p3, ROT, rNum)                              \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 
 #if SKEIN_UNROLL_256 == 0                       
 #define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
-    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
 #define I256(R)                                                     \
-    X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
-    X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
-    X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
-    X3   += ks[((R)+4) % 5] +     (R)+1;                            \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
+	X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
+	X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
+	X3   += ks[((R)+4) % 5] +     (R)+1;                            \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
 #define R256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Round256(p0, p1, p2, p3, ROT, rNum)                                  \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
 #define I256(R)                                                     \
-    X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
-    X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
-    X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
-    X3   += ks[r+(R)+3] +    r+(R);                              \
-    ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
-    ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
+	X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
+	X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
+	X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
+	X3   += ks[r+(R)+3] +    r+(R);                              \
+	ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
+	ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
 #endif  
-        {    
+		{    
 #define R256_8_rounds(R)                  \
-        R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
-        R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
-        R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
-        R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
-        I256(2 * (R));                      \
-        R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
-        R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
-        R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
-        R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
-        I256(2 * (R) + 1);
-
-        R256_8_rounds(0);
+		R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
+		R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
+		R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
+		R256(0, 3, 2, 1, R_256_3, 8 * (R) + 4);  \
+		I256(2 * (R));                      \
+		R256(0, 1, 2, 3, R_256_4, 8 * (R) + 5);  \
+		R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
+		R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
+		R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
+		I256(2 * (R) + 1);
+
+		R256_8_rounds(0);
 
 #define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
 
-  #if   R256_Unroll_R(1)
-        R256_8_rounds(1);
-  #endif
-  #if   R256_Unroll_R(2)
-        R256_8_rounds(2);
-  #endif
-  #if   R256_Unroll_R(3)
-        R256_8_rounds(3);
-  #endif
-  #if   R256_Unroll_R(4)
-        R256_8_rounds(4);
-  #endif
-  #if   R256_Unroll_R(5)
-        R256_8_rounds(5);
-  #endif
-  #if   R256_Unroll_R(6)
-        R256_8_rounds(6);
-  #endif
-  #if   R256_Unroll_R(7)
-        R256_8_rounds(7);
-  #endif
-  #if   R256_Unroll_R(8)
-        R256_8_rounds(8);
-  #endif
-  #if   R256_Unroll_R(9)
-        R256_8_rounds(9);
-  #endif
-  #if   R256_Unroll_R(10)
-        R256_8_rounds(10);
-  #endif
-  #if   R256_Unroll_R(11)
-        R256_8_rounds(11);
-  #endif
-  #if   R256_Unroll_R(12)
-        R256_8_rounds(12);
-  #endif
-  #if   R256_Unroll_R(13)
-        R256_8_rounds(13);
-  #endif
-  #if   R256_Unroll_R(14)
-        R256_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_256 > 14)
+	#if   R256_Unroll_R(1)
+		R256_8_rounds(1);
+	#endif
+	#if   R256_Unroll_R(2)
+		R256_8_rounds(2);
+	#endif
+	#if   R256_Unroll_R(3)
+		R256_8_rounds(3);
+	#endif
+	#if   R256_Unroll_R(4)
+		R256_8_rounds(4);
+	#endif
+	#if   R256_Unroll_R(5)
+		R256_8_rounds(5);
+	#endif
+	#if   R256_Unroll_R(6)
+		R256_8_rounds(6);
+	#endif
+	#if   R256_Unroll_R(7)
+		R256_8_rounds(7);
+	#endif
+	#if   R256_Unroll_R(8)
+		R256_8_rounds(8);
+	#endif
+	#if   R256_Unroll_R(9)
+		R256_8_rounds(9);
+	#endif
+	#if   R256_Unroll_R(10)
+		R256_8_rounds(10);
+	#endif
+	#if   R256_Unroll_R(11)
+		R256_8_rounds(11);
+	#endif
+	#if   R256_Unroll_R(12)
+		R256_8_rounds(12);
+	#endif
+	#if   R256_Unroll_R(13)
+		R256_8_rounds(13);
+	#endif
+	#if   R256_Unroll_R(14)
+		R256_8_rounds(14);
+	#endif
+	#if  (SKEIN_UNROLL_256 > 14)
 #error  "need more unrolling in Skein_256_Process_Block"
-  #endif
-        }
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = X0 ^ w[0];
-        ctx->X[1] = X1 ^ w[1];
-        ctx->X[2] = X2 ^ w[2];
-        ctx->X[3] = X3 ^ w[3];
-
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
+	#endif
+		}
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0] = X0 ^ w[0];
+		ctx->X[1] = X1 ^ w[1];
+		ctx->X[2] = X2 ^ w[2];
+		ctx->X[3] = X3 ^ w[3];
+
+		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+	}
+	while (--blkCnt);
+	ctx->h.T[0] = ts[0];
+	ctx->h.T[1] = ts[1];
+}
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein_256_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein_256_Process_Block_CodeSize) -
-           ((u8 *) Skein_256_Process_Block);
-    }
+{
+	return ((u8 *) Skein_256_Process_Block_CodeSize) -
+		((u8 *) Skein_256_Process_Block);
+}
 unsigned int Skein_256_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_256;
-    }
+{
+	return SKEIN_UNROLL_256;
+}
 #endif
 #endif
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
 void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C */
-    enum {
-        WCNT = SKEIN_512_STATE_WORDS
-        };
+{ /* do it in C */
+	enum {
+		WCNT = SKEIN_512_STATE_WORDS
+	};
 #undef  RCNT
 #define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
 
@@ -241,200 +241,200 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 #if (RCNT % SKEIN_UNROLL_512)
 #error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
 #endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	size_t  r;
+	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
-    u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
-    u64  w[WCNT];                           /* local copy of input block */
+	u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
+	u64  w[WCNT];                           /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
-    Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
+	const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+	Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
+	Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
 #endif
 
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0] = ctx->X[0];
-        ks[1] = ctx->X[1];
-        ks[2] = ctx->X[2];
-        ks[3] = ctx->X[3];
-        ks[4] = ctx->X[4];
-        ks[5] = ctx->X[5];
-        ks[6] = ctx->X[6];
-        ks[7] = ctx->X[7];
-        ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
-                ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
-
-        ts[2] = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X0   = w[0] + ks[0];                    /* do the first full key injection */
-        X1   = w[1] + ks[1];
-        X2   = w[2] + ks[2];
-        X3   = w[3] + ks[3];
-        X4   = w[4] + ks[4];
-        X5   = w[5] + ks[5] + ts[0];
-        X6   = w[6] + ks[6] + ts[1];
-        X7   = w[7] + ks[7];
-
-        blkPtr += SKEIN_512_BLOCK_BYTES;
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
-        /* run the rounds */
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	ts[0] = ctx->h.T[0];
+	ts[1] = ctx->h.T[1];
+	do  {
+		/* this implementation only supports 2**64 input bytes (no carry out here) */
+		ts[0] += byteCntAdd;                    /* update processed length */
+
+		/* precompute the key schedule for this block */
+		ks[0] = ctx->X[0];
+		ks[1] = ctx->X[1];
+		ks[2] = ctx->X[2];
+		ks[3] = ctx->X[3];
+		ks[4] = ctx->X[4];
+		ks[5] = ctx->X[5];
+		ks[6] = ctx->X[6];
+		ks[7] = ctx->X[7];
+		ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
+			ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
+
+		ts[2] = ts[0] ^ ts[1];
+
+		Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+		DebugSaveTweak(ctx);
+		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+		X0   = w[0] + ks[0];                    /* do the first full key injection */
+		X1   = w[1] + ks[1];
+		X2   = w[2] + ks[2];
+		X3   = w[3] + ks[3];
+		X4   = w[4] + ks[4];
+		X5   = w[5] + ks[5] + ts[0];
+		X6   = w[6] + ks[6] + ts[1];
+		X7   = w[7] + ks[7];
+
+		blkPtr += SKEIN_512_BLOCK_BYTES;
+
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+		/* run the rounds */
 #define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
+		X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+		X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
+		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
 
 #if SKEIN_UNROLL_512 == 0                       
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
-    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
 #define I512(R)                                                     \
-    X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
-    X1   += ks[((R) + 2) % 9];                                        \
-    X2   += ks[((R) + 3) % 9];                                        \
-    X3   += ks[((R) + 4) % 9];                                        \
-    X4   += ks[((R) + 5) % 9];                                        \
-    X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
-    X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
-    X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+		X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
+		X1   += ks[((R) + 2) % 9];                                        \
+		X2   += ks[((R) + 3) % 9];                                        \
+		X3   += ks[((R) + 4) % 9];                                        \
+		X4   += ks[((R) + 5) % 9];                                        \
+		X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
+		X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
+		X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
 #define I512(R)                                                     \
-    X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
-    X1   += ks[r + (R) + 1];                                            \
-    X2   += ks[r + (R) + 2];                                            \
-    X3   += ks[r + (R) + 3];                                            \
-    X4   += ks[r + (R) + 4];                                            \
-    X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
-    X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
-    X7   += ks[r + (R) + 7] +         r + (R);                              \
-    ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
-    ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
+		X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
+		X1   += ks[r + (R) + 1];                                            \
+		X2   += ks[r + (R) + 2];                                            \
+		X3   += ks[r + (R) + 3];                                            \
+		X4   += ks[r + (R) + 4];                                            \
+		X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
+		X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
+		X7   += ks[r + (R) + 7] +         r + (R);                              \
+		ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
+		ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+		for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
 #endif                         /* end of looped code definitions */
-        {
+		{
 #define R512_8_rounds(R)  /* do 8 full rounds */  \
-        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
-        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
-        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
-        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
-        I512(2 * (R));                              \
-        R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
-        R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
-        R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
-        R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
-        I512(2 * (R) + 1);        /* and key injection */
-
-        R512_8_rounds(0);
+			R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
+			R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
+			R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
+			R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
+			I512(2 * (R));                              \
+			R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
+			R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
+			R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
+			R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
+			I512(2 * (R) + 1);        /* and key injection */
+
+			R512_8_rounds(0);
 
 #define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
 
-  #if   R512_Unroll_R(1)
-        R512_8_rounds(1);
-  #endif
-  #if   R512_Unroll_R(2)
-        R512_8_rounds(2);
-  #endif
-  #if   R512_Unroll_R(3)
-        R512_8_rounds(3);
-  #endif
-  #if   R512_Unroll_R(4)
-        R512_8_rounds(4);
-  #endif
-  #if   R512_Unroll_R(5)
-        R512_8_rounds(5);
-  #endif
-  #if   R512_Unroll_R(6)
-        R512_8_rounds(6);
-  #endif
-  #if   R512_Unroll_R(7)
-        R512_8_rounds(7);
-  #endif
-  #if   R512_Unroll_R(8)
-        R512_8_rounds(8);
-  #endif
-  #if   R512_Unroll_R(9)
-        R512_8_rounds(9);
-  #endif
-  #if   R512_Unroll_R(10)
-        R512_8_rounds(10);
-  #endif
-  #if   R512_Unroll_R(11)
-        R512_8_rounds(11);
-  #endif
-  #if   R512_Unroll_R(12)
-        R512_8_rounds(12);
-  #endif
-  #if   R512_Unroll_R(13)
-        R512_8_rounds(13);
-  #endif
-  #if   R512_Unroll_R(14)
-        R512_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_512 > 14)
+	#if   R512_Unroll_R(1)
+			R512_8_rounds(1);
+	#endif
+	#if   R512_Unroll_R(2)
+			R512_8_rounds(2);
+	#endif
+	#if   R512_Unroll_R(3)
+			R512_8_rounds(3);
+	#endif
+	#if   R512_Unroll_R(4)
+			R512_8_rounds(4);
+	#endif
+	#if   R512_Unroll_R(5)
+			R512_8_rounds(5);
+	#endif
+	#if   R512_Unroll_R(6)
+			R512_8_rounds(6);
+	#endif
+	#if   R512_Unroll_R(7)
+			R512_8_rounds(7);
+	#endif
+	#if   R512_Unroll_R(8)
+			R512_8_rounds(8);
+	#endif
+	#if   R512_Unroll_R(9)
+			R512_8_rounds(9);
+	#endif
+	#if   R512_Unroll_R(10)
+			R512_8_rounds(10);
+	#endif
+	#if   R512_Unroll_R(11)
+			R512_8_rounds(11);
+	#endif
+	#if   R512_Unroll_R(12)
+			R512_8_rounds(12);
+	#endif
+	#if   R512_Unroll_R(13)
+			R512_8_rounds(13);
+	#endif
+	#if   R512_Unroll_R(14)
+			R512_8_rounds(14);
+	#endif
+	#if  (SKEIN_UNROLL_512 > 14)
 #error  "need more unrolling in Skein_512_Process_Block"
-  #endif
-        }
-
-        /* do the final "feedforward" xor, update context chaining vars */
-        ctx->X[0] = X0 ^ w[0];
-        ctx->X[1] = X1 ^ w[1];
-        ctx->X[2] = X2 ^ w[2];
-        ctx->X[3] = X3 ^ w[3];
-        ctx->X[4] = X4 ^ w[4];
-        ctx->X[5] = X5 ^ w[5];
-        ctx->X[6] = X6 ^ w[6];
-        ctx->X[7] = X7 ^ w[7];
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
+	#endif
+		}
+
+		/* do the final "feedforward" xor, update context chaining vars */
+		ctx->X[0] = X0 ^ w[0];
+		ctx->X[1] = X1 ^ w[1];
+		ctx->X[2] = X2 ^ w[2];
+		ctx->X[3] = X3 ^ w[3];
+		ctx->X[4] = X4 ^ w[4];
+		ctx->X[5] = X5 ^ w[5];
+		ctx->X[6] = X6 ^ w[6];
+		ctx->X[7] = X7 ^ w[7];
+		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+	}
+	while (--blkCnt);
+	ctx->h.T[0] = ts[0];
+	ctx->h.T[1] = ts[1];
+}
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein_512_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein_512_Process_Block_CodeSize) -
-           ((u8 *) Skein_512_Process_Block);
-    }
+{
+	return ((u8 *) Skein_512_Process_Block_CodeSize) -
+		((u8 *) Skein_512_Process_Block);
+}
 unsigned int Skein_512_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_512;
-    }
+{
+	return SKEIN_UNROLL_512;
+}
 #endif
 #endif
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
 void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
-    { /* do it in C, always looping (unrolled is bigger AND slower!) */
-    enum {
-        WCNT = SKEIN1024_STATE_WORDS
-        };
+{ /* do it in C, always looping (unrolled is bigger AND slower!) */
+	enum {
+		WCNT = SKEIN1024_STATE_WORDS
+	};
 #undef  RCNT
 #define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
 
@@ -448,239 +448,239 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 #if (RCNT % SKEIN_UNROLL_1024)
 #error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
 #endif
-    size_t  r;
-    u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	size_t  r;
+	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
 #else
-    u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
 #endif
 
-    u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
-            X08, X09, X10, X11, X12, X13, X14, X15;
-    u64  w[WCNT];                            /* local copy of input block */
+	u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
+		X08, X09, X10, X11, X12, X13, X14, X15;
+	u64  w[WCNT];                            /* local copy of input block */
 #ifdef SKEIN_DEBUG
-    const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
-    Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
-    Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
-    Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
-    Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
+	const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+	Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
+	Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
+	Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
+	Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
 #endif
 
-    Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
-    ts[0] = ctx->h.T[0];
-    ts[1] = ctx->h.T[1];
-    do  {
-        /* this implementation only supports 2**64 input bytes (no carry out here) */
-        ts[0] += byteCntAdd;                    /* update processed length */
-
-        /* precompute the key schedule for this block */
-        ks[0]  = ctx->X[0];
-        ks[1]  = ctx->X[1];
-        ks[2]  = ctx->X[2];
-        ks[3]  = ctx->X[3];
-        ks[4]  = ctx->X[4];
-        ks[5]  = ctx->X[5];
-        ks[6]  = ctx->X[6];
-        ks[7]  = ctx->X[7];
-        ks[8]  = ctx->X[8];
-        ks[9]  = ctx->X[9];
-        ks[10] = ctx->X[10];
-        ks[11] = ctx->X[11];
-        ks[12] = ctx->X[12];
-        ks[13] = ctx->X[13];
-        ks[14] = ctx->X[14];
-        ks[15] = ctx->X[15];
-        ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
-                  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
-                  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
-                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
-
-        ts[2]  = ts[0] ^ ts[1];
-
-        Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
-        DebugSaveTweak(ctx);
-        Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
-
-        X00    =  w[0] +  ks[0];                 /* do the first full key injection */
-        X01    =  w[1] +  ks[1];
-        X02    =  w[2] +  ks[2];
-        X03    =  w[3] +  ks[3];
-        X04    =  w[4] +  ks[4];
-        X05    =  w[5] +  ks[5];
-        X06    =  w[6] +  ks[6];
-        X07    =  w[7] +  ks[7];
-        X08    =  w[8] +  ks[8];
-        X09    =  w[9] +  ks[9];
-        X10    = w[10] + ks[10];
-        X11    = w[11] + ks[11];
-        X12    = w[12] + ks[12];
-        X13    = w[13] + ks[13] + ts[0];
-        X14    = w[14] + ks[14] + ts[1];
-        X15    = w[15] + ks[15];
-
-        Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	ts[0] = ctx->h.T[0];
+	ts[1] = ctx->h.T[1];
+	do  {
+		/* this implementation only supports 2**64 input bytes (no carry out here) */
+		ts[0] += byteCntAdd;                    /* update processed length */
+
+		/* precompute the key schedule for this block */
+		ks[0]  = ctx->X[0];
+		ks[1]  = ctx->X[1];
+		ks[2]  = ctx->X[2];
+		ks[3]  = ctx->X[3];
+		ks[4]  = ctx->X[4];
+		ks[5]  = ctx->X[5];
+		ks[6]  = ctx->X[6];
+		ks[7]  = ctx->X[7];
+		ks[8]  = ctx->X[8];
+		ks[9]  = ctx->X[9];
+		ks[10] = ctx->X[10];
+		ks[11] = ctx->X[11];
+		ks[12] = ctx->X[12];
+		ks[13] = ctx->X[13];
+		ks[14] = ctx->X[14];
+		ks[15] = ctx->X[15];
+		ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
+			  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
+			  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
+			  ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
+
+		ts[2]  = ts[0] ^ ts[1];
+
+		Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+		DebugSaveTweak(ctx);
+		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
+
+		X00    =  w[0] +  ks[0];                 /* do the first full key injection */
+		X01    =  w[1] +  ks[1];
+		X02    =  w[2] +  ks[2];
+		X03    =  w[3] +  ks[3];
+		X04    =  w[4] +  ks[4];
+		X05    =  w[5] +  ks[5];
+		X06    =  w[6] +  ks[6];
+		X07    =  w[7] +  ks[7];
+		X08    =  w[8] +  ks[8];
+		X09    =  w[9] +  ks[9];
+		X10    = w[10] + ks[10];
+		X11    = w[11] + ks[11];
+		X12    = w[12] + ks[12];
+		X13    = w[13] + ks[13] + ts[0];
+		X14    = w[14] + ks[14] + ts[1];
+		X15    = w[15] + ks[15];
+
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
 
 #define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
-    X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
-    X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
-    X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
-    X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
-    X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
-    X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
-    X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
-    X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
+		X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
+		X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
+		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
+		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
+		X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
+		X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
+		X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
+		X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
 
 #if SKEIN_UNROLL_1024 == 0                      
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
 
 #define I1024(R)                                                        \
-    X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
-    X01   += ks[((R) +  2) % 17];                                       \
-    X02   += ks[((R) +  3) % 17];                                       \
-    X03   += ks[((R) +  4) % 17];                                       \
-    X04   += ks[((R) +  5) % 17];                                       \
-    X05   += ks[((R) +  6) % 17];                                       \
-    X06   += ks[((R) +  7) % 17];                                       \
-    X07   += ks[((R) +  8) % 17];                                       \
-    X08   += ks[((R) +  9) % 17];                                       \
-    X09   += ks[((R) + 10) % 17];                                       \
-    X10   += ks[((R) + 11) % 17];                                       \
-    X11   += ks[((R) + 12) % 17];                                       \
-    X12   += ks[((R) + 13) % 17];                                       \
-    X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
-    X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
-    X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
+		X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
+		X01   += ks[((R) +  2) % 17];                                       \
+		X02   += ks[((R) +  3) % 17];                                       \
+		X03   += ks[((R) +  4) % 17];                                       \
+		X04   += ks[((R) +  5) % 17];                                       \
+		X05   += ks[((R) +  6) % 17];                                       \
+		X06   += ks[((R) +  7) % 17];                                       \
+		X07   += ks[((R) +  8) % 17];                                       \
+		X08   += ks[((R) +  9) % 17];                                       \
+		X09   += ks[((R) + 10) % 17];                                       \
+		X10   += ks[((R) + 11) % 17];                                       \
+		X11   += ks[((R) + 12) % 17];                                       \
+		X12   += ks[((R) + 13) % 17];                                       \
+		X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
+		X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
+		X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
 #else                                       /* looping version */
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-    Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
+		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
 
 #define I1024(R)                                                      \
-    X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
-    X01   += ks[r + (R) +  1];                                            \
-    X02   += ks[r + (R) +  2];                                            \
-    X03   += ks[r + (R) +  3];                                            \
-    X04   += ks[r + (R) +  4];                                            \
-    X05   += ks[r + (R) +  5];                                            \
-    X06   += ks[r + (R) +  6];                                            \
-    X07   += ks[r + (R) +  7];                                            \
-    X08   += ks[r + (R) +  8];                                            \
-    X09   += ks[r + (R) +  9];                                            \
-    X10   += ks[r + (R) + 10];                                            \
-    X11   += ks[r + (R) + 11];                                            \
-    X12   += ks[r + (R) + 12];                                            \
-    X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
-    X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
-    X15   += ks[r + (R) + 15] +         r + (R);                          \
-    ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
-    ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
-    Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-    for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
+		X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
+		X01   += ks[r + (R) +  1];                                            \
+		X02   += ks[r + (R) +  2];                                            \
+		X03   += ks[r + (R) +  3];                                            \
+		X04   += ks[r + (R) +  4];                                            \
+		X05   += ks[r + (R) +  5];                                            \
+		X06   += ks[r + (R) +  6];                                            \
+		X07   += ks[r + (R) +  7];                                            \
+		X08   += ks[r + (R) +  8];                                            \
+		X09   += ks[r + (R) +  9];                                            \
+		X10   += ks[r + (R) + 10];                                            \
+		X11   += ks[r + (R) + 11];                                            \
+		X12   += ks[r + (R) + 12];                                            \
+		X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
+		X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
+		X15   += ks[r + (R) + 15] +         r + (R);                          \
+		ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
+		ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
+		Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
 #endif  
-        {
+		{
 #define R1024_8_rounds(R)    /* do 8 full rounds */                               \
-        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
-        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
-        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
-        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
-        I1024(2*(R));                                                             \
-        R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
-        R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
-        R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
-        R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
-        I1024(2*(R)+1);
-
-        R1024_8_rounds(0);
+			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
+			R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
+			R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
+			R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
+			I1024(2*(R));                                                             \
+			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
+			R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
+			R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
+			R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
+			I1024(2*(R)+1);
+
+			R1024_8_rounds(0);
 
 #define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
 
-  #if   R1024_Unroll_R(1)
-        R1024_8_rounds(1);
-  #endif
-  #if   R1024_Unroll_R(2)
-        R1024_8_rounds(2);
-  #endif
-  #if   R1024_Unroll_R(3)
-        R1024_8_rounds(3);
-  #endif
-  #if   R1024_Unroll_R(4)
-        R1024_8_rounds(4);
-  #endif
-  #if   R1024_Unroll_R(5)
-        R1024_8_rounds(5);
-  #endif
-  #if   R1024_Unroll_R(6)
-        R1024_8_rounds(6);
-  #endif
-  #if   R1024_Unroll_R(7)
-        R1024_8_rounds(7);
-  #endif
-  #if   R1024_Unroll_R(8)
-        R1024_8_rounds(8);
-  #endif
-  #if   R1024_Unroll_R(9)
-        R1024_8_rounds(9);
-  #endif
-  #if   R1024_Unroll_R(10)
-        R1024_8_rounds(10);
-  #endif
-  #if   R1024_Unroll_R(11)
-        R1024_8_rounds(11);
-  #endif
-  #if   R1024_Unroll_R(12)
-        R1024_8_rounds(12);
-  #endif
-  #if   R1024_Unroll_R(13)
-        R1024_8_rounds(13);
-  #endif
-  #if   R1024_Unroll_R(14)
-        R1024_8_rounds(14);
-  #endif
-  #if  (SKEIN_UNROLL_1024 > 14)
+	#if   R1024_Unroll_R(1)
+			R1024_8_rounds(1);
+	#endif
+	#if   R1024_Unroll_R(2)
+			R1024_8_rounds(2);
+	#endif
+	#if   R1024_Unroll_R(3)
+			R1024_8_rounds(3);
+	#endif
+	#if   R1024_Unroll_R(4)
+			R1024_8_rounds(4);
+	#endif
+	#if   R1024_Unroll_R(5)
+			R1024_8_rounds(5);
+	#endif
+	#if   R1024_Unroll_R(6)
+			R1024_8_rounds(6);
+	#endif
+	#if   R1024_Unroll_R(7)
+			R1024_8_rounds(7);
+	#endif
+	#if   R1024_Unroll_R(8)
+			R1024_8_rounds(8);
+	#endif
+	#if   R1024_Unroll_R(9)
+			R1024_8_rounds(9);
+	#endif
+	#if   R1024_Unroll_R(10)
+			R1024_8_rounds(10);
+	#endif
+	#if   R1024_Unroll_R(11)
+			R1024_8_rounds(11);
+	#endif
+	#if   R1024_Unroll_R(12)
+			R1024_8_rounds(12);
+	#endif
+	#if   R1024_Unroll_R(13)
+			R1024_8_rounds(13);
+	#endif
+	#if   R1024_Unroll_R(14)
+			R1024_8_rounds(14);
+	#endif
+#if  (SKEIN_UNROLL_1024 > 14)
 #error  "need more unrolling in Skein_1024_Process_Block"
   #endif
-        }
-        /* do the final "feedforward" xor, update context chaining vars */
-
-        ctx->X[0] = X00 ^ w[0];
-        ctx->X[1] = X01 ^ w[1];
-        ctx->X[2] = X02 ^ w[2];
-        ctx->X[3] = X03 ^ w[3];
-        ctx->X[4] = X04 ^ w[4];
-        ctx->X[5] = X05 ^ w[5];
-        ctx->X[6] = X06 ^ w[6];
-        ctx->X[7] = X07 ^ w[7];
-        ctx->X[8] = X08 ^ w[8];
-        ctx->X[9] = X09 ^ w[9];
-        ctx->X[10] = X10 ^ w[10];
-        ctx->X[11] = X11 ^ w[11];
-        ctx->X[12] = X12 ^ w[12];
-        ctx->X[13] = X13 ^ w[13];
-        ctx->X[14] = X14 ^ w[14];
-        ctx->X[15] = X15 ^ w[15];
-
-        Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
-        
-        ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-        blkPtr += SKEIN1024_BLOCK_BYTES;
-        }
-    while (--blkCnt);
-    ctx->h.T[0] = ts[0];
-    ctx->h.T[1] = ts[1];
-    }
+		}
+		/* do the final "feedforward" xor, update context chaining vars */
+
+		ctx->X[0] = X00 ^ w[0];
+		ctx->X[1] = X01 ^ w[1];
+		ctx->X[2] = X02 ^ w[2];
+		ctx->X[3] = X03 ^ w[3];
+		ctx->X[4] = X04 ^ w[4];
+		ctx->X[5] = X05 ^ w[5];
+		ctx->X[6] = X06 ^ w[6];
+		ctx->X[7] = X07 ^ w[7];
+		ctx->X[8] = X08 ^ w[8];
+		ctx->X[9] = X09 ^ w[9];
+		ctx->X[10] = X10 ^ w[10];
+		ctx->X[11] = X11 ^ w[11];
+		ctx->X[12] = X12 ^ w[12];
+		ctx->X[13] = X13 ^ w[13];
+		ctx->X[14] = X14 ^ w[14];
+		ctx->X[15] = X15 ^ w[15];
+
+		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+
+		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
+		blkPtr += SKEIN1024_BLOCK_BYTES;
+	}
+	while (--blkCnt);
+	ctx->h.T[0] = ts[0];
+	ctx->h.T[1] = ts[1];
+}
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
 size_t Skein1024_Process_Block_CodeSize(void)
-    {
-    return ((u8 *) Skein1024_Process_Block_CodeSize) -
-           ((u8 *) Skein1024_Process_Block);
-    }
+{
+	return ((u8 *) Skein1024_Process_Block_CodeSize) -
+		((u8 *) Skein1024_Process_Block);
+}
 unsigned int Skein1024_Unroll_Cnt(void)
-    {
-    return SKEIN_UNROLL_1024;
-    }
+{
+	return SKEIN_UNROLL_1024;
+}
 #endif
 #endif
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index e3be37ea8024..1730a3120a0f 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -3,1382 +3,1380 @@
 
 
 void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
-        {
-
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3],
-      b4 = input[4], b5 = input[5],
-      b6 = input[6], b7 = input[7],
-      b8 = input[8], b9 = input[9],
-      b10 = input[10], b11 = input[11],
-      b12 = input[12], b13 = input[13],
-      b14 = input[14], b15 = input[15];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-      k8 = keyCtx->key[8], k9 = keyCtx->key[9],
-      k10 = keyCtx->key[10], k11 = keyCtx->key[11],
-      k12 = keyCtx->key[12], k13 = keyCtx->key[13],
-      k14 = keyCtx->key[14], k15 = keyCtx->key[15],
-      k16 = keyCtx->key[16];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7],
+	  b8 = input[8], b9 = input[9],
+	  b10 = input[10], b11 = input[11],
+	  b12 = input[12], b13 = input[13],
+	  b14 = input[14], b15 = input[15];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+	  k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+	  k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+	  k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+	  k16 = keyCtx->key[16];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-            b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k3; b2 += b3 + k2; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k5; b4 += b5 + k4; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k7; b6 += b7 + k6; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k9; b8 += b9 + k8; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k11; b10 += b11 + k10; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k13 + t0; b12 += b13 + k12; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k15; b14 += b15 + k14 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k2; b0 += b1 + k1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k4; b2 += b3 + k3; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k6; b4 += b5 + k5; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k8; b6 += b7 + k7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k10; b8 += b9 + k9; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k12; b10 += b11 + k11; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k14 + t1; b12 += b13 + k13; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k16 + 1; b14 += b15 + k15 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k3; b0 += b1 + k2; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k5; b2 += b3 + k4; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k7; b4 += b5 + k6; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k9; b6 += b7 + k8; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k11; b8 += b9 + k10; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k13; b10 += b11 + k12; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k15 + t2; b12 += b13 + k14; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k0 + 2; b14 += b15 + k16 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k4; b0 += b1 + k3; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k6; b2 += b3 + k5; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k8; b4 += b5 + k7; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k10; b6 += b7 + k9; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k12; b8 += b9 + k11; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k14; b10 += b11 + k13; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k16 + t0; b12 += b13 + k15; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k1 + 3; b14 += b15 + k0 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k5; b0 += b1 + k4; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k7; b2 += b3 + k6; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k9; b4 += b5 + k8; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k11; b6 += b7 + k10; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k13; b8 += b9 + k12; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k15; b10 += b11 + k14; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k0 + t1; b12 += b13 + k16; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k2 + 4; b14 += b15 + k1 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k6; b0 += b1 + k5; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k8; b2 += b3 + k7; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k10; b4 += b5 + k9; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k12; b6 += b7 + k11; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k14; b8 += b9 + k13; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k16; b10 += b11 + k15; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k1 + t2; b12 += b13 + k0; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k3 + 5; b14 += b15 + k2 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k7; b0 += b1 + k6; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k9; b2 += b3 + k8; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k11; b4 += b5 + k10; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k13; b6 += b7 + k12; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k15; b8 += b9 + k14; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k0; b10 += b11 + k16; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k2 + t0; b12 += b13 + k1; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k4 + 6; b14 += b15 + k3 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k8; b0 += b1 + k7; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k10; b2 += b3 + k9; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k12; b4 += b5 + k11; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k14; b6 += b7 + k13; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k16; b8 += b9 + k15; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k1; b10 += b11 + k0; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k3 + t1; b12 += b13 + k2; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k5 + 7; b14 += b15 + k4 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k9; b0 += b1 + k8; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k11; b2 += b3 + k10; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k13; b4 += b5 + k12; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k15; b6 += b7 + k14; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k0; b8 += b9 + k16; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k2; b10 += b11 + k1; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k4 + t2; b12 += b13 + k3; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k6 + 8; b14 += b15 + k5 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k10; b0 += b1 + k9; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k12; b2 += b3 + k11; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k14; b4 += b5 + k13; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k16; b6 += b7 + k15; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k1; b8 += b9 + k0; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k3; b10 += b11 + k2; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k5 + t0; b12 += b13 + k4; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k7 + 9; b14 += b15 + k6 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k11; b0 += b1 + k10; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k13; b2 += b3 + k12; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k15; b4 += b5 + k14; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k0; b6 += b7 + k16; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k2; b8 += b9 + k1; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k4; b10 += b11 + k3; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k6 + t1; b12 += b13 + k5; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k8 + 10; b14 += b15 + k7 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k12; b0 += b1 + k11; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k14; b2 += b3 + k13; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k16; b4 += b5 + k15; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k1; b6 += b7 + k0; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k3; b8 += b9 + k2; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k5; b10 += b11 + k4; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k7 + t2; b12 += b13 + k6; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k9 + 11; b14 += b15 + k8 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k13; b0 += b1 + k12; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k15; b2 += b3 + k14; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k0; b4 += b5 + k16; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k2; b6 += b7 + k1; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k4; b8 += b9 + k3; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k6; b10 += b11 + k5; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k8 + t0; b12 += b13 + k7; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k10 + 12; b14 += b15 + k9 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k14; b0 += b1 + k13; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k16; b2 += b3 + k15; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k1; b4 += b5 + k0; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k3; b6 += b7 + k2; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k5; b8 += b9 + k4; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k7; b10 += b11 + k6; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k9 + t1; b12 += b13 + k8; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k11 + 13; b14 += b15 + k10 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k15; b0 += b1 + k14; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k0; b2 += b3 + k16; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k2; b4 += b5 + k1; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k4; b6 += b7 + k3; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k6; b8 += b9 + k5; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k8; b10 += b11 + k7; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k10 + t2; b12 += b13 + k9; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k12 + 14; b14 += b15 + k11 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k16; b0 += b1 + k15; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k1; b2 += b3 + k0; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k3; b4 += b5 + k2; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k5; b6 += b7 + k4; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k7; b8 += b9 + k6; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k9; b10 += b11 + k8; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k11 + t0; b12 += b13 + k10; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k13 + 15; b14 += b15 + k12 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k0; b0 += b1 + k16; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k2; b2 += b3 + k1; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k4; b4 += b5 + k3; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k6; b6 += b7 + k5; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k8; b8 += b9 + k7; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k10; b10 += b11 + k9; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k12 + t1; b12 += b13 + k11; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k14 + 16; b14 += b15 + k13 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k1; b0 += b1 + k0; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k3; b2 += b3 + k2; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k5; b4 += b5 + k4; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k7; b6 += b7 + k6; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k9; b8 += b9 + k8; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k11; b10 += b11 + k10; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k13 + t2; b12 += b13 + k12; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k15 + 17; b14 += b15 + k14 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-            b1 += k2; b0 += b1 + k1; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-            b3 += k4; b2 += b3 + k3; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-            b5 += k6; b4 += b5 + k5; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-            b7 += k8; b6 += b7 + k7; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-            b9 += k10; b8 += b9 + k9; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-            b11 += k12; b10 += b11 + k11; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-            b13 += k14 + t0; b12 += b13 + k13; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-            b15 += k16 + 18; b14 += b15 + k15 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-            b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-            b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-            b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-            b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-            b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-            b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-            b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-            b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-            b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-            b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-            b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-            b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-            b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-            b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-            b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-            b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-            b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-            b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-            b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-            b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-            b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-            b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-            b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-            b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-            b1 += k3; b0 += b1 + k2; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-            b3 += k5; b2 += b3 + k4; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-            b5 += k7; b4 += b5 + k6; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-            b7 += k9; b6 += b7 + k8; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-            b9 += k11; b8 += b9 + k10; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-            b11 += k13; b10 += b11 + k12; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-            b13 += k15 + t1; b12 += b13 + k14; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-            b15 += k0 + 19; b14 += b15 + k16 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-            b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-            b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-            b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-            b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-            b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-            b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-            b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-            b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-            b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-            b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-            b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-            b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-            b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-            b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-            b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-            b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-            b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-            b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-            b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-            b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-            b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-            b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-            b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-            b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k5; b4 += b5 + k4; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k7; b6 += b7 + k6; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k9; b8 += b9 + k8; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k11; b10 += b11 + k10; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k13 + t0; b12 += b13 + k12; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k15; b14 += b15 + k14 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k6; b4 += b5 + k5; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k8; b6 += b7 + k7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k10; b8 += b9 + k9; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k12; b10 += b11 + k11; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k14 + t1; b12 += b13 + k13; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k16 + 1; b14 += b15 + k15 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k7; b4 += b5 + k6; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k9; b6 += b7 + k8; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k11; b8 += b9 + k10; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k13; b10 += b11 + k12; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k15 + t2; b12 += b13 + k14; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k0 + 2; b14 += b15 + k16 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k8; b4 += b5 + k7; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k10; b6 += b7 + k9; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k12; b8 += b9 + k11; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k14; b10 += b11 + k13; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k16 + t0; b12 += b13 + k15; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k1 + 3; b14 += b15 + k0 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k9; b4 += b5 + k8; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k11; b6 += b7 + k10; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k13; b8 += b9 + k12; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k15; b10 += b11 + k14; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k0 + t1; b12 += b13 + k16; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k2 + 4; b14 += b15 + k1 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k10; b4 += b5 + k9; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k12; b6 += b7 + k11; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k14; b8 += b9 + k13; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k16; b10 += b11 + k15; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k1 + t2; b12 += b13 + k0; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k3 + 5; b14 += b15 + k2 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k9; b2 += b3 + k8; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k11; b4 += b5 + k10; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k13; b6 += b7 + k12; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k15; b8 += b9 + k14; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k0; b10 += b11 + k16; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k2 + t0; b12 += b13 + k1; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k4 + 6; b14 += b15 + k3 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k10; b2 += b3 + k9; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k12; b4 += b5 + k11; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k14; b6 += b7 + k13; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k16; b8 += b9 + k15; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k1; b10 += b11 + k0; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k3 + t1; b12 += b13 + k2; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k5 + 7; b14 += b15 + k4 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k9; b0 += b1 + k8; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k11; b2 += b3 + k10; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k13; b4 += b5 + k12; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k15; b6 += b7 + k14; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k0; b8 += b9 + k16; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k2; b10 += b11 + k1; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k4 + t2; b12 += b13 + k3; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k6 + 8; b14 += b15 + k5 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k10; b0 += b1 + k9; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k12; b2 += b3 + k11; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k14; b4 += b5 + k13; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k16; b6 += b7 + k15; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k1; b8 += b9 + k0; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k3; b10 += b11 + k2; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k5 + t0; b12 += b13 + k4; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k7 + 9; b14 += b15 + k6 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k11; b0 += b1 + k10; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k13; b2 += b3 + k12; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k15; b4 += b5 + k14; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k0; b6 += b7 + k16; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k2; b8 += b9 + k1; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k4; b10 += b11 + k3; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k6 + t1; b12 += b13 + k5; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k8 + 10; b14 += b15 + k7 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k12; b0 += b1 + k11; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k14; b2 += b3 + k13; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k16; b4 += b5 + k15; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k1; b6 += b7 + k0; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k3; b8 += b9 + k2; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k5; b10 += b11 + k4; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k7 + t2; b12 += b13 + k6; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k9 + 11; b14 += b15 + k8 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k13; b0 += b1 + k12; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k15; b2 += b3 + k14; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k0; b4 += b5 + k16; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k2; b6 += b7 + k1; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k4; b8 += b9 + k3; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k6; b10 += b11 + k5; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k8 + t0; b12 += b13 + k7; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k10 + 12; b14 += b15 + k9 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k14; b0 += b1 + k13; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k16; b2 += b3 + k15; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k1; b4 += b5 + k0; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k3; b6 += b7 + k2; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k5; b8 += b9 + k4; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k7; b10 += b11 + k6; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k9 + t1; b12 += b13 + k8; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k11 + 13; b14 += b15 + k10 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k15; b0 += b1 + k14; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k0; b2 += b3 + k16; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k2; b4 += b5 + k1; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k4; b6 += b7 + k3; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k6; b8 += b9 + k5; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k8; b10 += b11 + k7; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k10 + t2; b12 += b13 + k9; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k12 + 14; b14 += b15 + k11 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k16; b0 += b1 + k15; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k3; b4 += b5 + k2; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k5; b6 += b7 + k4; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k7; b8 += b9 + k6; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k9; b10 += b11 + k8; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k11 + t0; b12 += b13 + k10; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k13 + 15; b14 += b15 + k12 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k0; b0 += b1 + k16; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k4; b4 += b5 + k3; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k6; b6 += b7 + k5; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k8; b8 += b9 + k7; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k10; b10 += b11 + k9; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k12 + t1; b12 += b13 + k11; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k14 + 16; b14 += b15 + k13 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k5; b4 += b5 + k4; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k7; b6 += b7 + k6; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k9; b8 += b9 + k8; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k11; b10 += b11 + k10; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k13 + t2; b12 += b13 + k12; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k15 + 17; b14 += b15 + k14 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+	b5 += k6; b4 += b5 + k5; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+	b7 += k8; b6 += b7 + k7; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+	b9 += k10; b8 += b9 + k9; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+	b11 += k12; b10 += b11 + k11; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+	b13 += k14 + t0; b12 += b13 + k13; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+	b15 += k16 + 18; b14 += b15 + k15 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+	b5 += k7; b4 += b5 + k6; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+	b7 += k9; b6 += b7 + k8; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+	b9 += k11; b8 += b9 + k10; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+	b11 += k13; b10 += b11 + k12; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+	b13 += k15 + t1; b12 += b13 + k14; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+	b15 += k0 + 19; b14 += b15 + k16 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
 
-            output[0] = b0 + k3;
-            output[1] = b1 + k4;
-            output[2] = b2 + k5;
-            output[3] = b3 + k6;
-            output[4] = b4 + k7;
-            output[5] = b5 + k8;
-            output[6] = b6 + k9;
-            output[7] = b7 + k10;
-            output[8] = b8 + k11;
-            output[9] = b9 + k12;
-            output[10] = b10 + k13;
-            output[11] = b11 + k14;
-            output[12] = b12 + k15;
-            output[13] = b13 + k16 + t2;
-            output[14] = b14 + k0 + t0;
-            output[15] = b15 + k1 + 20;
-        }
+	output[0] = b0 + k3;
+	output[1] = b1 + k4;
+	output[2] = b2 + k5;
+	output[3] = b3 + k6;
+	output[4] = b4 + k7;
+	output[5] = b5 + k8;
+	output[6] = b6 + k9;
+	output[7] = b7 + k10;
+	output[8] = b8 + k11;
+	output[9] = b9 + k12;
+	output[10] = b10 + k13;
+	output[11] = b11 + k14;
+	output[12] = b12 + k15;
+	output[13] = b13 + k16 + t2;
+	output[14] = b14 + k0 + t0;
+	output[15] = b15 + k1 + 20;
+}
 
 void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
 {
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7],
+	  b8 = input[8], b9 = input[9],
+	  b10 = input[10], b11 = input[11],
+	  b12 = input[12], b13 = input[13],
+	  b14 = input[14], b15 = input[15];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+	  k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+	  k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+	  k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+	  k16 = keyCtx->key[16];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
+	u64 tmp;
 
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3],
-      b4 = input[4], b5 = input[5],
-      b6 = input[6], b7 = input[7],
-      b8 = input[8], b9 = input[9],
-      b10 = input[10], b11 = input[11],
-      b12 = input[12], b13 = input[13],
-      b14 = input[14], b15 = input[15];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-      k8 = keyCtx->key[8], k9 = keyCtx->key[9],
-      k10 = keyCtx->key[10], k11 = keyCtx->key[11],
-      k12 = keyCtx->key[12], k13 = keyCtx->key[13],
-      k14 = keyCtx->key[14], k15 = keyCtx->key[15],
-      k16 = keyCtx->key[16];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
-    u64 tmp;
-
-            b0 -= k3;
-            b1 -= k4;
-            b2 -= k5;
-            b3 -= k6;
-            b4 -= k7;
-            b5 -= k8;
-            b6 -= k9;
-            b7 -= k10;
-            b8 -= k11;
-            b9 -= k12;
-            b10 -= k13;
-            b11 -= k14;
-            b12 -= k15;
-            b13 -= k16 + t2;
-            b14 -= k0 + t0;
-            b15 -= k1 + 20;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k16 + t2; b15 -= k0 + 19;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k14; b13 -= k15 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k12; b11 -= k13;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k10; b9 -= k11;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k8; b7 -= k9;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k6; b5 -= k7;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k4; b3 -= k5;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k2; b1 -= k3;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k15 + t1; b15 -= k16 + 18;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k13; b13 -= k14 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k11; b11 -= k12;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k9; b9 -= k10;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k7; b7 -= k8;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k5; b5 -= k6;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k3; b3 -= k4;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k1; b1 -= k2;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k14 + t0; b15 -= k15 + 17;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k12; b13 -= k13 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k10; b11 -= k11;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k8; b9 -= k9;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k6; b7 -= k7;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k4; b5 -= k5;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k2; b3 -= k3;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k0; b1 -= k1;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k13 + t2; b15 -= k14 + 16;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k11; b13 -= k12 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k9; b11 -= k10;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k7; b9 -= k8;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k5; b7 -= k6;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k3; b5 -= k4;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k1; b3 -= k2;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k16; b1 -= k0;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k12 + t1; b15 -= k13 + 15;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k10; b13 -= k11 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k8; b11 -= k9;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k6; b9 -= k7;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k4; b7 -= k5;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k2; b5 -= k3;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k0; b3 -= k1;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k15; b1 -= k16;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k11 + t0; b15 -= k12 + 14;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k9; b13 -= k10 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k7; b11 -= k8;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k5; b9 -= k6;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k3; b7 -= k4;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k1; b5 -= k2;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k16; b3 -= k0;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k14; b1 -= k15;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k10 + t2; b15 -= k11 + 13;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k8; b13 -= k9 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k6; b11 -= k7;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k4; b9 -= k5;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k2; b7 -= k3;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k0; b5 -= k1;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k15; b3 -= k16;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k13; b1 -= k14;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k9 + t1; b15 -= k10 + 12;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k7; b13 -= k8 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k5; b11 -= k6;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k3; b9 -= k4;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k1; b7 -= k2;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k16; b5 -= k0;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k14; b3 -= k15;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k12; b1 -= k13;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k8 + t0; b15 -= k9 + 11;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k6; b13 -= k7 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k4; b11 -= k5;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k2; b9 -= k3;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k0; b7 -= k1;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k15; b5 -= k16;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k13; b3 -= k14;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k11; b1 -= k12;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k7 + t2; b15 -= k8 + 10;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k5; b13 -= k6 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k3; b11 -= k4;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k1; b9 -= k2;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k16; b7 -= k0;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k14; b5 -= k15;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k12; b3 -= k13;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k10; b1 -= k11;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k6 + t1; b15 -= k7 + 9;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k4; b13 -= k5 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k2; b11 -= k3;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k0; b9 -= k1;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k15; b7 -= k16;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k13; b5 -= k14;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k11; b3 -= k12;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k9; b1 -= k10;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k5 + t0; b15 -= k6 + 8;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k3; b13 -= k4 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k1; b11 -= k2;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k16; b9 -= k0;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k14; b7 -= k15;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k12; b5 -= k13;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k10; b3 -= k11;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k8; b1 -= k9;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k4 + t2; b15 -= k5 + 7;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k2; b13 -= k3 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k0; b11 -= k1;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k15; b9 -= k16;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k13; b7 -= k14;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k11; b5 -= k12;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k9; b3 -= k10;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k7; b1 -= k8;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k3 + t1; b15 -= k4 + 6;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k1; b13 -= k2 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k16; b11 -= k0;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k14; b9 -= k15;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k12; b7 -= k13;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k10; b5 -= k11;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k8; b3 -= k9;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k6; b1 -= k7;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k2 + t0; b15 -= k3 + 5;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k0; b13 -= k1 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k15; b11 -= k16;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k13; b9 -= k14;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k11; b7 -= k12;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k9; b5 -= k10;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k7; b3 -= k8;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k5; b1 -= k6;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k1 + t2; b15 -= k2 + 4;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k16; b13 -= k0 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k14; b11 -= k15;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k12; b9 -= k13;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k10; b7 -= k11;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k8; b5 -= k9;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k6; b3 -= k7;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k4; b1 -= k5;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k0 + t1; b15 -= k1 + 3;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k15; b13 -= k16 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k13; b11 -= k14;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k11; b9 -= k12;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k9; b7 -= k10;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k7; b5 -= k8;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k5; b3 -= k6;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k3; b1 -= k4;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k16 + t0; b15 -= k0 + 2;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k14; b13 -= k15 + t2;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k12; b11 -= k13;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k10; b9 -= k11;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k8; b7 -= k9;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k6; b5 -= k7;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k4; b3 -= k5;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k2; b1 -= k3;
-            tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k15 + t2; b15 -= k16 + 1;
-            tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k13; b13 -= k14 + t1;
-            tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k11; b11 -= k12;
-            tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k9; b9 -= k10;
-            tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k7; b7 -= k8;
-            tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k5; b5 -= k6;
-            tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k3; b3 -= k4;
-            tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k1; b1 -= k2;
-            tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-            tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-            tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-            tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-            tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-            tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-            tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-            tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-            tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-            tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-            tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-            tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-            tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-            tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-            tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-            tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-            tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-            tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-            tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-            tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-            tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-            tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-            tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-            tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-            tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k14 + t1; b15 -= k15;
-            tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k12; b13 -= k13 + t0;
-            tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k10; b11 -= k11;
-            tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k8; b9 -= k9;
-            tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k6; b7 -= k7;
-            tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k4; b5 -= k5;
-            tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k2; b3 -= k3;
-            tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k0; b1 -= k1;
+	b0 -= k3;
+	b1 -= k4;
+	b2 -= k5;
+	b3 -= k6;
+	b4 -= k7;
+	b5 -= k8;
+	b6 -= k9;
+	b7 -= k10;
+	b8 -= k11;
+	b9 -= k12;
+	b10 -= k13;
+	b11 -= k14;
+	b12 -= k15;
+	b13 -= k16 + t2;
+	b14 -= k0 + t0;
+	b15 -= k1 + 20;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k16 + t2; b15 -= k0 + 19;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k14; b13 -= k15 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k12; b11 -= k13;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k10; b9 -= k11;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k8; b7 -= k9;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k6; b5 -= k7;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k4; b3 -= k5;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k2; b1 -= k3;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k15 + t1; b15 -= k16 + 18;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k13; b13 -= k14 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k11; b11 -= k12;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k9; b9 -= k10;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k7; b7 -= k8;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k5; b5 -= k6;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k3; b3 -= k4;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k1; b1 -= k2;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k14 + t0; b15 -= k15 + 17;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k12; b13 -= k13 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k10; b11 -= k11;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k8; b9 -= k9;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k6; b7 -= k7;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k4; b5 -= k5;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k2; b3 -= k3;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k0; b1 -= k1;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k13 + t2; b15 -= k14 + 16;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k11; b13 -= k12 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k9; b11 -= k10;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k7; b9 -= k8;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k5; b7 -= k6;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k3; b5 -= k4;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k1; b3 -= k2;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k16; b1 -= k0;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k12 + t1; b15 -= k13 + 15;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k10; b13 -= k11 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k8; b11 -= k9;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k6; b9 -= k7;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k4; b7 -= k5;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k2; b5 -= k3;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k0; b3 -= k1;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k15; b1 -= k16;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k11 + t0; b15 -= k12 + 14;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k9; b13 -= k10 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k7; b11 -= k8;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k5; b9 -= k6;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k3; b7 -= k4;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k1; b5 -= k2;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k16; b3 -= k0;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k14; b1 -= k15;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k10 + t2; b15 -= k11 + 13;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k8; b13 -= k9 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k6; b11 -= k7;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k4; b9 -= k5;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k2; b7 -= k3;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k0; b5 -= k1;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k15; b3 -= k16;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k13; b1 -= k14;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k9 + t1; b15 -= k10 + 12;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k7; b13 -= k8 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k5; b11 -= k6;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k3; b9 -= k4;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k1; b7 -= k2;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k16; b5 -= k0;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k14; b3 -= k15;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k12; b1 -= k13;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k8 + t0; b15 -= k9 + 11;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k6; b13 -= k7 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k4; b11 -= k5;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k2; b9 -= k3;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k0; b7 -= k1;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k15; b5 -= k16;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k13; b3 -= k14;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k11; b1 -= k12;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k7 + t2; b15 -= k8 + 10;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k5; b13 -= k6 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k3; b11 -= k4;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k1; b9 -= k2;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k16; b7 -= k0;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k14; b5 -= k15;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k12; b3 -= k13;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k10; b1 -= k11;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k6 + t1; b15 -= k7 + 9;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k4; b13 -= k5 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k2; b11 -= k3;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k0; b9 -= k1;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k15; b7 -= k16;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k13; b5 -= k14;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k11; b3 -= k12;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k9; b1 -= k10;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k5 + t0; b15 -= k6 + 8;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k3; b13 -= k4 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k1; b11 -= k2;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k16; b9 -= k0;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k14; b7 -= k15;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k12; b5 -= k13;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k10; b3 -= k11;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k8; b1 -= k9;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k4 + t2; b15 -= k5 + 7;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k2; b13 -= k3 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k0; b11 -= k1;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k15; b9 -= k16;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k13; b7 -= k14;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k11; b5 -= k12;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k9; b3 -= k10;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k7; b1 -= k8;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k3 + t1; b15 -= k4 + 6;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k1; b13 -= k2 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k16; b11 -= k0;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k14; b9 -= k15;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k12; b7 -= k13;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k10; b5 -= k11;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k8; b3 -= k9;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k6; b1 -= k7;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k2 + t0; b15 -= k3 + 5;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k0; b13 -= k1 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k15; b11 -= k16;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k13; b9 -= k14;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k11; b7 -= k12;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k9; b5 -= k10;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k7; b3 -= k8;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k5; b1 -= k6;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k1 + t2; b15 -= k2 + 4;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k16; b13 -= k0 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k14; b11 -= k15;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k12; b9 -= k13;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k10; b7 -= k11;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k8; b5 -= k9;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k6; b3 -= k7;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k4; b1 -= k5;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k0 + t1; b15 -= k1 + 3;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k15; b13 -= k16 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k13; b11 -= k14;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k11; b9 -= k12;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k9; b7 -= k10;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k7; b5 -= k8;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k5; b3 -= k6;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k3; b1 -= k4;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k16 + t0; b15 -= k0 + 2;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k14; b13 -= k15 + t2;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k12; b11 -= k13;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k10; b9 -= k11;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k8; b7 -= k9;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k6; b5 -= k7;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k4; b3 -= k5;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k2; b1 -= k3;
+	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k15 + t2; b15 -= k16 + 1;
+	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k13; b13 -= k14 + t1;
+	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k11; b11 -= k12;
+	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k9; b9 -= k10;
+	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k7; b7 -= k8;
+	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k5; b5 -= k6;
+	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k3; b3 -= k4;
+	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k1; b1 -= k2;
+	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
+	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
+	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
+	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
+	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
+	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
+	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
+	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
+	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
+	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
+	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
+	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
+	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
+	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
+	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
+	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
+	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
+	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
+	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
+	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
+	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
+	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
+	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k14 + t1; b15 -= k15;
+	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k12; b13 -= k13 + t0;
+	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k10; b11 -= k11;
+	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k8; b9 -= k9;
+	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k6; b7 -= k7;
+	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k4; b5 -= k5;
+	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k2; b3 -= k3;
+	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k0; b1 -= k1;
 
-            output[15] = b15;
-            output[14] = b14;
-            output[13] = b13;
-            output[12] = b12;
-            output[11] = b11;
-            output[10] = b10;
-            output[9] = b9;
-            output[8] = b8;
-            output[7] = b7;
-            output[6] = b6;
-            output[5] = b5;
-            output[4] = b4;
-            output[3] = b3;
-            output[2] = b2;
-            output[1] = b1;
-            output[0] = b0;
+	output[15] = b15;
+	output[14] = b14;
+	output[13] = b13;
+	output[12] = b12;
+	output[11] = b11;
+	output[10] = b10;
+	output[9] = b9;
+	output[8] = b8;
+	output[7] = b7;
+	output[6] = b6;
+	output[5] = b5;
+	output[4] = b4;
+	output[3] = b3;
+	output[2] = b2;
+	output[1] = b1;
+	output[0] = b0;
 }
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index 09ea5099bc76..da3b8357e47f 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -3,346 +3,345 @@
 
 
 void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
-  {
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+	b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k3; b2 += b3 + k2 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k4 + 1; b2 += b3 + k3 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k3; b2 += b3 + k2 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k4 + 1; b2 += b3 + k3 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k0 + 2; b2 += b3 + k4 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k4 + t0; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k1 + 3; b2 += b3 + k0 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k0 + 2; b2 += b3 + k4 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k4 + t0; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k1 + 3; b2 += b3 + k0 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k0 + t1; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k2 + 4; b2 += b3 + k1 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k1 + t2; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k3 + 5; b2 += b3 + k2 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k0 + t1; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k2 + 4; b2 += b3 + k1 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k1 + t2; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k3 + 5; b2 += b3 + k2 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k2 + t0; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k4 + 6; b2 += b3 + k3 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k3 + t1; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k0 + 7; b2 += b3 + k4 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k2 + t0; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k4 + 6; b2 += b3 + k3 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k3 + t1; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k0 + 7; b2 += b3 + k4 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k4 + t2; b0 += b1 + k3; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k1 + 8; b2 += b3 + k0 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k0 + t0; b0 += b1 + k4; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k2 + 9; b2 += b3 + k1 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k4 + t2; b0 += b1 + k3; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k1 + 8; b2 += b3 + k0 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k0 + t0; b0 += b1 + k4; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k2 + 9; b2 += b3 + k1 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k1 + t1; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k3 + 10; b2 += b3 + k2 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k2 + t2; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k4 + 11; b2 += b3 + k3 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k1 + t1; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k3 + 10; b2 += b3 + k2 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k2 + t2; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k4 + 11; b2 += b3 + k3 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k3 + t0; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k0 + 12; b2 += b3 + k4 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k4 + t1; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k1 + 13; b2 += b3 + k0 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k3 + t0; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k0 + 12; b2 += b3 + k4 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k4 + t1; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k1 + 13; b2 += b3 + k0 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k0 + t2; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k2 + 14; b2 += b3 + k1 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k3 + 15; b2 += b3 + k2 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k0 + t2; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k2 + 14; b2 += b3 + k1 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k3 + 15; b2 += b3 + k2 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+	b3 += k4 + 16; b2 += b3 + k3 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+	b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+	b3 += k0 + 17; b2 += b3 + k4 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
-    b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-    b3 += k4 + 16; b2 += b3 + k3 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-    b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-    b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-    b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-    b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-    b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-    b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-    b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-    b3 += k0 + 17; b2 += b3 + k4 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-    b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-    b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-    b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-    b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-    b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-    b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-    output[0] = b0 + k3;
-    output[1] = b1 + k4 + t0;
-    output[2] = b2 + k0 + t1;
-    output[3] = b3 + k1 + 18;
-  }
+	output[0] = b0 + k3;
+	output[1] = b1 + k4 + t0;
+	output[2] = b2 + k0 + t1;
+	output[3] = b3 + k1 + 18;
+}
 
 void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
-  {
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-    u64 tmp;
+	u64 tmp;
 
-    b0 -= k3;
-    b1 -= k4 + t0;
-    b2 -= k0 + t1;
-    b3 -= k1 + 18;
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t0; b3 -= k0 + 17;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t2; b3 -= k4 + 16;
+	b0 -= k3;
+	b1 -= k4 + t0;
+	b2 -= k0 + t1;
+	b3 -= k1 + 18;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t0; b3 -= k0 + 17;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t2; b3 -= k4 + 16;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t1; b3 -= k3 + 15;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t0; b3 -= k2 + 14;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t1; b3 -= k3 + 15;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t0; b3 -= k2 + 14;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t2; b3 -= k1 + 13;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t1; b3 -= k0 + 12;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t2; b3 -= k1 + 13;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t1; b3 -= k0 + 12;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t0; b3 -= k4 + 11;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t2; b3 -= k3 + 10;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t0; b3 -= k4 + 11;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t2; b3 -= k3 + 10;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k4; b1 -= k0 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k1 + t1; b3 -= k2 + 9;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k3; b1 -= k4 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k0 + t0; b3 -= k1 + 8;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k4; b1 -= k0 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k1 + t1; b3 -= k2 + 9;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k3; b1 -= k4 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k0 + t0; b3 -= k1 + 8;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t2; b3 -= k0 + 7;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t1; b3 -= k4 + 6;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t2; b3 -= k0 + 7;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t1; b3 -= k4 + 6;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t0; b3 -= k3 + 5;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t2; b3 -= k2 + 4;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t0; b3 -= k3 + 5;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t2; b3 -= k2 + 4;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t1; b3 -= k1 + 3;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t2;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t0; b3 -= k0 + 2;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t1; b3 -= k1 + 3;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t0; b3 -= k0 + 2;
 
-    tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t1;
-    tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t2; b3 -= k4 + 1;
-    tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-    tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-    tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-    tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-    tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t0;
-    tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t1; b3 -= k3;
+	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t2; b3 -= k4 + 1;
+	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
+	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
+	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
+	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
+	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t1; b3 -= k3;
 
-    output[0] = b0;
-    output[1] = b1;
-    output[2] = b2;
-    output[3] = b3;
-  }
+	output[0] = b0;
+	output[1] = b1;
+	output[2] = b2;
+	output[3] = b3;
+}
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 5262f5a8f21b..dc96ba279720 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -3,640 +3,638 @@
 
 
 void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
-    {
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3],
-      b4 = input[4], b5 = input[5],
-      b6 = input[6], b7 = input[7];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-      k8 = keyCtx->key[8];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k7; b6 += b7 + k6 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k8 + 1; b6 += b7 + k7 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k0 + 2; b6 += b7 + k8 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k1 + 3; b6 += b7 + k0 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k2 + 4; b6 += b7 + k1 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k3 + 5; b6 += b7 + k2 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k0; b2 += b3 + k8; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k4 + 6; b6 += b7 + k3 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k5 + 7; b6 += b7 + k4 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k0; b0 += b1 + k8; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k6 + 8; b6 += b7 + k5 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k7 + 9; b6 += b7 + k6 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k8 + 10; b6 += b7 + k7 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k0 + 11; b6 += b7 + k8 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k1 + 12; b6 += b7 + k0 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k2 + 13; b6 += b7 + k1 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k3 + 14; b6 += b7 + k2 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k0; b2 += b3 + k8; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k4 + 15; b6 += b7 + k3 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+	b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+	b7 += k5 + 16; b6 += b7 + k4 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+	b1 += k0; b0 += b1 + k8; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+	b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+	b7 += k6 + 17; b6 += b7 + k5 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
 
-        b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k3; b2 += b3 + k2; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k7; b6 += b7 + k6 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k2; b0 += b1 + k1; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k4; b2 += b3 + k3; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k8 + 1; b6 += b7 + k7 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k3; b0 += b1 + k2; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k5; b2 += b3 + k4; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k0 + 2; b6 += b7 + k8 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k4; b0 += b1 + k3; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k6; b2 += b3 + k5; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k1 + 3; b6 += b7 + k0 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k5; b0 += b1 + k4; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k7; b2 += b3 + k6; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k2 + 4; b6 += b7 + k1 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k6; b0 += b1 + k5; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k8; b2 += b3 + k7; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k3 + 5; b6 += b7 + k2 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k7; b0 += b1 + k6; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k0; b2 += b3 + k8; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k4 + 6; b6 += b7 + k3 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k8; b0 += b1 + k7; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k1; b2 += b3 + k0; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k5 + 7; b6 += b7 + k4 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k0; b0 += b1 + k8; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k2; b2 += b3 + k1; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k6 + 8; b6 += b7 + k5 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k1; b0 += b1 + k0; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k3; b2 += b3 + k2; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k7 + 9; b6 += b7 + k6 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k2; b0 += b1 + k1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k4; b2 += b3 + k3; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k8 + 10; b6 += b7 + k7 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k3; b0 += b1 + k2; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k5; b2 += b3 + k4; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k0 + 11; b6 += b7 + k8 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k4; b0 += b1 + k3; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k6; b2 += b3 + k5; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k1 + 12; b6 += b7 + k0 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k5; b0 += b1 + k4; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k7; b2 += b3 + k6; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k2 + 13; b6 += b7 + k1 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k6; b0 += b1 + k5; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k8; b2 += b3 + k7; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k3 + 14; b6 += b7 + k2 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k7; b0 += b1 + k6; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k0; b2 += b3 + k8; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k4 + 15; b6 += b7 + k3 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-        b1 += k8; b0 += b1 + k7; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-        b3 += k1; b2 += b3 + k0; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-        b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-        b7 += k5 + 16; b6 += b7 + k4 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-        b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-        b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-        b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-        b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-        b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-        b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-        b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-        b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-        b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-        b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-        b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-        b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-        b1 += k0; b0 += b1 + k8; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-        b3 += k2; b2 += b3 + k1; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-        b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-        b7 += k6 + 17; b6 += b7 + k5 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-        b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-        b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-        b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-        b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-        b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-        b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-        b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-        b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-        b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-        b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-        b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-        b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-        output[0] = b0 + k0;
-        output[1] = b1 + k1;
-        output[2] = b2 + k2;
-        output[3] = b3 + k3;
-        output[4] = b4 + k4;
-        output[5] = b5 + k5 + t0;
-        output[6] = b6 + k6 + t1;
-        output[7] = b7 + k7 + 18;
-    }
+	output[0] = b0 + k0;
+	output[1] = b1 + k1;
+	output[2] = b2 + k2;
+	output[3] = b3 + k3;
+	output[4] = b4 + k4;
+	output[5] = b5 + k5 + t0;
+	output[6] = b6 + k6 + t1;
+	output[7] = b7 + k7 + 18;
+}
 
 void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
-    {
-
-    u64 b0 = input[0], b1 = input[1],
-      b2 = input[2], b3 = input[3],
-      b4 = input[4], b5 = input[5],
-      b6 = input[6], b7 = input[7];
-    u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-      k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-      k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-      k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-      k8 = keyCtx->key[8];
-    u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-      t2 = keyCtx->tweak[2];
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
 
-      u64 tmp;
+	u64 tmp;
 
-        b0 -= k0;
-        b1 -= k1;
-        b2 -= k2;
-        b3 -= k3;
-        b4 -= k4;
-        b5 -= k5 + t0;
-        b6 -= k6 + t1;
-        b7 -= k7 + 18;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k5 + t0; b7 -= k6 + 17;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k3; b5 -= k4 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k1; b3 -= k2;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k8; b1 -= k0;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k4 + t2; b7 -= k5 + 16;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k2; b5 -= k3 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k0; b3 -= k1;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k7; b1 -= k8;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k3 + t1; b7 -= k4 + 15;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k1; b5 -= k2 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k8; b3 -= k0;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k6; b1 -= k7;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k2 + t0; b7 -= k3 + 14;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k0; b5 -= k1 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k7; b3 -= k8;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k5; b1 -= k6;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k1 + t2; b7 -= k2 + 13;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k8; b5 -= k0 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k6; b3 -= k7;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k4; b1 -= k5;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k0 + t1; b7 -= k1 + 12;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k7; b5 -= k8 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k5; b3 -= k6;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k3; b1 -= k4;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k8 + t0; b7 -= k0 + 11;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k6; b5 -= k7 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k4; b3 -= k5;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k2; b1 -= k3;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k7 + t2; b7 -= k8 + 10;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k5; b5 -= k6 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k3; b3 -= k4;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k1; b1 -= k2;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k6 + t1; b7 -= k7 + 9;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k4; b5 -= k5 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k2; b3 -= k3;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k0; b1 -= k1;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k5 + t0; b7 -= k6 + 8;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k3; b5 -= k4 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k1; b3 -= k2;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k8; b1 -= k0;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k4 + t2; b7 -= k5 + 7;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k2; b5 -= k3 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k0; b3 -= k1;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k7; b1 -= k8;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k3 + t1; b7 -= k4 + 6;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k1; b5 -= k2 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k8; b3 -= k0;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k6; b1 -= k7;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k2 + t0; b7 -= k3 + 5;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k0; b5 -= k1 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k7; b3 -= k8;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k5; b1 -= k6;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k1 + t2; b7 -= k2 + 4;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k8; b5 -= k0 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k6; b3 -= k7;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k4; b1 -= k5;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k0 + t1; b7 -= k1 + 3;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k7; b5 -= k8 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k5; b3 -= k6;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k3; b1 -= k4;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k8 + t0; b7 -= k0 + 2;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k6; b5 -= k7 + t2;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k4; b3 -= k5;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k2; b1 -= k3;
-        tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k7 + t2; b7 -= k8 + 1;
-        tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k5; b5 -= k6 + t1;
-        tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k3; b3 -= k4;
-        tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k1; b1 -= k2;
-        tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-        tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-        tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-        tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-        tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-        tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-        tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-        tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-        tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-        tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-        tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-        tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-        tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k6 + t1; b7 -= k7;
-        tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k4; b5 -= k5 + t0;
-        tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k2; b3 -= k3;
-        tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k0; b1 -= k1;
+	b0 -= k0;
+	b1 -= k1;
+	b2 -= k2;
+	b3 -= k3;
+	b4 -= k4;
+	b5 -= k5 + t0;
+	b6 -= k6 + t1;
+	b7 -= k7 + 18;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k5 + t0; b7 -= k6 + 17;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k3; b5 -= k4 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k1; b3 -= k2;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k8; b1 -= k0;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k4 + t2; b7 -= k5 + 16;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k2; b5 -= k3 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k0; b3 -= k1;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k7; b1 -= k8;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k3 + t1; b7 -= k4 + 15;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k1; b5 -= k2 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k8; b3 -= k0;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k6; b1 -= k7;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k2 + t0; b7 -= k3 + 14;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k0; b5 -= k1 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k7; b3 -= k8;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k5; b1 -= k6;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k1 + t2; b7 -= k2 + 13;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k8; b5 -= k0 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k6; b3 -= k7;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k4; b1 -= k5;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k0 + t1; b7 -= k1 + 12;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k7; b5 -= k8 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k5; b3 -= k6;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k3; b1 -= k4;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k8 + t0; b7 -= k0 + 11;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k6; b5 -= k7 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k4; b3 -= k5;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k2; b1 -= k3;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k7 + t2; b7 -= k8 + 10;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k5; b5 -= k6 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k3; b3 -= k4;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k1; b1 -= k2;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k6 + t1; b7 -= k7 + 9;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k4; b5 -= k5 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k2; b3 -= k3;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k0; b1 -= k1;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k5 + t0; b7 -= k6 + 8;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k3; b5 -= k4 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k1; b3 -= k2;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k8; b1 -= k0;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k4 + t2; b7 -= k5 + 7;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k2; b5 -= k3 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k0; b3 -= k1;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k7; b1 -= k8;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k3 + t1; b7 -= k4 + 6;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k1; b5 -= k2 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k8; b3 -= k0;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k6; b1 -= k7;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k2 + t0; b7 -= k3 + 5;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k0; b5 -= k1 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k7; b3 -= k8;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k5; b1 -= k6;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k1 + t2; b7 -= k2 + 4;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k8; b5 -= k0 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k6; b3 -= k7;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k4; b1 -= k5;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k0 + t1; b7 -= k1 + 3;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k7; b5 -= k8 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k5; b3 -= k6;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k3; b1 -= k4;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k8 + t0; b7 -= k0 + 2;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k6; b5 -= k7 + t2;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k4; b3 -= k5;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k2; b1 -= k3;
+	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k7 + t2; b7 -= k8 + 1;
+	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k5; b5 -= k6 + t1;
+	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k3; b3 -= k4;
+	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k1; b1 -= k2;
+	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
+	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
+	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
+	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
+	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
+	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
+	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
+	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
+	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
+	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
+	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
+	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
+	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k6 + t1; b7 -= k7;
+	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k4; b5 -= k5 + t0;
+	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k2; b3 -= k3;
+	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k0; b1 -= k1;
 
-    output[0] = b0;
-    output[1] = b1;
-    output[2] = b2;
-    output[3] = b3;
+	output[0] = b0;
+	output[1] = b1;
+	output[2] = b2;
+	output[3] = b3;
 
-        output[7] = b7;
-        output[6] = b6;
-        output[5] = b5;
-        output[4] = b4;
+	output[7] = b7;
+	output[6] = b6;
+	output[5] = b5;
+	output[4] = b4;
 }
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 53f46f6cb9ca..e8ce06a9122f 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -4,75 +4,75 @@
 #include <threefishApi.h>
 
 void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize,
-                     u64 *keyData, u64 *tweak)
+		     u64 *keyData, u64 *tweak)
 {
-    int keyWords = stateSize / 64;
-    int i;
-    u64 parity = KeyScheduleConst;
+	int keyWords = stateSize / 64;
+	int i;
+	u64 parity = KeyScheduleConst;
 
-    keyCtx->tweak[0] = tweak[0];
-    keyCtx->tweak[1] = tweak[1];
-    keyCtx->tweak[2] = tweak[0] ^ tweak[1];
+	keyCtx->tweak[0] = tweak[0];
+	keyCtx->tweak[1] = tweak[1];
+	keyCtx->tweak[2] = tweak[0] ^ tweak[1];
 
-    for (i = 0; i < keyWords; i++) {
-        keyCtx->key[i] = keyData[i];
-        parity ^= keyData[i];
-    }
-    keyCtx->key[i] = parity;
-    keyCtx->stateSize = stateSize;
+	for (i = 0; i < keyWords; i++) {
+		keyCtx->key[i] = keyData[i];
+		parity ^= keyData[i];
+	}
+	keyCtx->key[i] = parity;
+	keyCtx->stateSize = stateSize;
 }
 
 void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
-                                u8 *out)
+				u8 *out)
 {
-    u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
-    u64 cipher[SKEIN_MAX_STATE_WORDS];
-    
-    Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
-    threefishEncryptBlockWords(keyCtx, plain, cipher);
-    Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
+	u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+	u64 cipher[SKEIN_MAX_STATE_WORDS];
+
+	Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
+	threefishEncryptBlockWords(keyCtx, plain, cipher);
+	Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
 }
 
 void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in,
-                                u64 *out)
+				u64 *out)
 {
-    switch (keyCtx->stateSize) {
-        case Threefish256:
-            threefishEncrypt256(keyCtx, in, out);
-            break;
-        case Threefish512:
-            threefishEncrypt512(keyCtx, in, out);
-            break;
-        case Threefish1024:
-            threefishEncrypt1024(keyCtx, in, out);
-            break;
-    }
+	switch (keyCtx->stateSize) {
+	case Threefish256:
+		threefishEncrypt256(keyCtx, in, out);
+		break;
+	case Threefish512:
+		threefishEncrypt512(keyCtx, in, out);
+		break;
+	case Threefish1024:
+		threefishEncrypt1024(keyCtx, in, out);
+		break;
+	}
 }
 
 void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
-                                u8 *out)
+				u8 *out)
 {
-    u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
-    u64 cipher[SKEIN_MAX_STATE_WORDS];
-    
-    Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
-    threefishDecryptBlockWords(keyCtx, cipher, plain);
-    Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
+	u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
+	u64 cipher[SKEIN_MAX_STATE_WORDS];
+
+	Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
+	threefishDecryptBlockWords(keyCtx, cipher, plain);
+	Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
 }
 
 void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in,
-                                u64 *out)
+				u64 *out)
 {
-    switch (keyCtx->stateSize) {
-        case Threefish256:
-            threefishDecrypt256(keyCtx, in, out);
-            break;
-        case Threefish512:
-            threefishDecrypt512(keyCtx, in, out);
-            break;
-        case Threefish1024:
-            threefishDecrypt1024(keyCtx, in, out);
-            break;
-    }
+	switch (keyCtx->stateSize) {
+	case Threefish256:
+		threefishDecrypt256(keyCtx, in, out);
+		break;
+	case Threefish512:
+		threefishDecrypt512(keyCtx, in, out);
+		break;
+	case Threefish1024:
+		threefishDecrypt1024(keyCtx, in, out);
+		break;
+	}
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 13/21] staging: crypto: skein: remove trailing whitespace
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (11 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 12/21] staging: crypto: skein: fix leading whitespace Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 14/21] staging: crypto: skein: cleanup >80 character lines Jason Cooper
                     ` (7 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        | 16 +++++-----
 drivers/staging/skein/include/skeinApi.h     | 44 ++++++++++++++--------------
 drivers/staging/skein/include/threefishApi.h | 40 ++++++++++++-------------
 drivers/staging/skein/skeinBlockNo3F.c       |  6 ++--
 drivers/staging/skein/skein_block.c          | 20 ++++++-------
 5 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index 906bcee41c39..dd9a210cf5dd 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -9,7 +9,7 @@
 ** This algorithm and source code is released to the public domain.
 **
 ***************************************************************************
-** 
+**
 ** The following compile-time switches may be defined to control some
 ** tradeoffs between speed, code size, error checking, and security.
 **
@@ -20,8 +20,8 @@
 **                            [default: no callouts (no overhead)]
 **
 **  SKEIN_ERR_CHECK        -- how error checking is handled inside Skein
-**                            code. If not defined, most error checking 
-**                            is disabled (for performance). Otherwise, 
+**                            code. If not defined, most error checking
+**                            is disabled (for performance). Otherwise,
 **                            the switch value is interpreted as:
 **                                0: use assert()      to flag errors
 **                                1: return SKEIN_FAIL to flag errors
@@ -109,12 +109,12 @@ int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
 **   After an InitExt() call, just use Update/Final calls as with Init().
 **
 **   Notes: Same parameters as _Init() calls, plus treeInfo/key/keyBytes.
-**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL, 
+**          When keyBytes == 0 and treeInfo == SKEIN_SEQUENTIAL,
 **              the results of InitExt() are identical to calling Init().
 **          The function Init() may be called once to "precompute" the IV for
 **              a given hashBitLen value, then by saving a copy of the context
 **              the IV computation may be avoided in later calls.
-**          Similarly, the function InitExt() may be called once per MAC key 
+**          Similarly, the function InitExt() may be called once per MAC key
 **              to precompute the MAC IV, then a copy of the context saved and
 **              reused for each new MAC computation.
 **/
@@ -142,7 +142,7 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 
 /*****************************************************************
 ** "Internal" Skein definitions
-**    -- not needed for sequential hashing API, but will be 
+**    -- not needed for sequential hashing API, but will be
 **           helpful for other uses of Skein (e.g., tree hash mode).
 **    -- included here so that they can be shared between
 **           reference and optimized code.
@@ -269,8 +269,8 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 /*****************************************************************
 ** Skein block function constants (shared across Ref and Opt code)
 ******************************************************************/
-enum    
-	{   
+enum
+	{
 	    /* Skein_256 round rotation constants */
 	R_256_0_0 = 14, R_256_0_1 = 16,
 	R_256_1_0 = 52, R_256_1_1 = 57,
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 0d7d59eff460..ace931a67c23 100644
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -36,46 +36,46 @@ OTHER DEALINGS IN THE SOFTWARE.
  * of Skein. The design and the way to use the functions follow the openSSL
  * design but at the same time take care of some Skein specific behaviour
  * and possibilities.
- * 
+ *
  * The functions enable applications to create a normal Skein hashes and
  * message authentication codes (MAC).
- * 
+ *
  * Using these functions is simple and straight forward:
- * 
+ *
  * @code
- * 
+ *
  * #include <skeinApi.h>
- * 
+ *
  * ...
  * struct skein_ctx ctx;             // a Skein hash or MAC context
- * 
+ *
  * // prepare context, here for a Skein with a state size of 512 bits.
  * skeinCtxPrepare(&ctx, Skein512);
- * 
+ *
  * // Initialize the context to set the requested hash length in bits
  * // here request a output hash size of 31 bits (Skein supports variable
  * // output sizes even very strange sizes)
  * skeinInit(&ctx, 31);
- * 
+ *
  * // Now update Skein with any number of message bits. A function that
  * // takes a number of bytes is also available.
  * skeinUpdateBits(&ctx, message, msgLength);
- * 
+ *
  * // Now get the result of the Skein hash. The output buffer must be
  * // large enough to hold the request number of output bits. The application
  * // may now extract the bits.
  * skeinFinal(&ctx, result);
  * ...
  * @endcode
- * 
+ *
  * An application may use @c skeinReset to reset a Skein context and use
  * it for creation of another hash with the same Skein state size and output
  * bit length. In this case the API implementation restores some internal
  * internal state data and saves a full Skein initialization round.
- * 
- * To create a MAC the application just uses @c skeinMacInit instead of 
+ *
+ * To create a MAC the application just uses @c skeinMacInit instead of
  * @c skeinInit. All other functions calls remain the same.
- * 
+ *
  */
 
 #include <linux/types.h>
@@ -111,7 +111,7 @@ struct skein_ctx {
 
 /**
  * Prepare a Skein context.
- * 
+ *
  * An application must call this function before it can use the Skein
  * context. The functions clears memory and initializes size dependent
  * variables.
@@ -128,7 +128,7 @@ int skeinCtxPrepare(struct skein_ctx *ctx, enum skein_size size);
 /**
  * Initialize a Skein context.
  *
- * Initializes the context with this data and saves the resulting Skein 
+ * Initializes the context with this data and saves the resulting Skein
  * state variables for further use.
  *
  * @param ctx
@@ -143,11 +143,11 @@ int skeinInit(struct skein_ctx *ctx, size_t hashBitLen);
 
 /**
  * Resets a Skein context for further use.
- * 
- * Restores the saved chaining variables to reset the Skein context. 
- * Thus applications can reuse the same setup to  process several 
+ *
+ * Restores the saved chaining variables to reset the Skein context.
+ * Thus applications can reuse the same setup to  process several
  * messages. This saves a complete Skein initialization cycle.
- * 
+ *
  * @param ctx
  *     Pointer to a pre-initialized Skein MAC context
  */
@@ -155,8 +155,8 @@ void skeinReset(struct skein_ctx *ctx);
 
 /**
  * Initializes a Skein context for MAC usage.
- * 
- * Initializes the context with this data and saves the resulting Skein 
+ *
+ * Initializes the context with this data and saves the resulting Skein
  * state variables for further use.
  *
  * Applications call the normal Skein functions to update the MAC and
@@ -209,7 +209,7 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
 
 /**
  * Finalize Skein and return the hash.
- * 
+ *
  * Before an application can reuse a Skein setup the application must
  * reset the Skein context.
  *
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 199257e37813..5d92bbff8c9f 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -8,14 +8,14 @@
  * @{
  *
  * This API and the functions that implement this API simplify the usage
- * of the Threefish cipher. The design and the way to use the functions 
+ * of the Threefish cipher. The design and the way to use the functions
  * follow the openSSL design but at the same time take care of some Threefish
  * specific behaviour and possibilities.
  *
  * These are the low level functions that deal with Threefisch blocks only.
- * Implementations for cipher modes such as ECB, CFB, or CBC may use these 
+ * Implementations for cipher modes such as ECB, CFB, or CBC may use these
  * functions.
- * 
+ *
 @code
     // Threefish cipher context data
     struct threefish_key keyCtx;
@@ -44,7 +44,7 @@ enum threefish_size {
 
 /**
  * Context for Threefish key and tweak words.
- * 
+ *
  * This structure was setup with some know-how of the internal
  * Skein structures, in particular ordering of header and size dependent
  * variables. If Skein implementation changes this, the adapt these
@@ -58,10 +58,10 @@ struct threefish_key {
 
 /**
  * Set Threefish key and tweak data.
- * 
+ *
  * This function sets the key and tweak data for the Threefish cipher of
  * the given size. The key data must have the same length (number of bits)
- * as the state size 
+ * as the state size
  *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
@@ -76,12 +76,12 @@ void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize
 
 /**
  * Encrypt Threefisch block (bytes).
- * 
- * The buffer must have at least the same length (number of bits) aas the 
+ *
+ * The buffer must have at least the same length (number of bits) aas the
  * state size for this key. The function uses the first @c stateSize bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
- * 
+ *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
  * @param in
@@ -93,14 +93,14 @@ void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
 /**
  * Encrypt Threefisch block (words).
- * 
- * The buffer must have at least the same length (number of bits) aas the 
+ *
+ * The buffer must have at least the same length (number of bits) aas the
  * state size for this key. The function uses the first @c stateSize bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
- * 
+ *
  * The wordsize ist set to 64 bits.
- * 
+ *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
  * @param in
@@ -112,12 +112,12 @@ void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out)
 
 /**
  * Decrypt Threefisch block (bytes).
- * 
- * The buffer must have at least the same length (number of bits) aas the 
+ *
+ * The buffer must have at least the same length (number of bits) aas the
  * state size for this key. The function uses the first @c stateSize bits
  * of the input buffer, decrypts them and stores the result in the output
  * buffer
- * 
+ *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
  * @param in
@@ -129,14 +129,14 @@ void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
 
 /**
  * Decrypt Threefisch block (words).
- * 
- * The buffer must have at least the same length (number of bits) aas the 
+ *
+ * The buffer must have at least the same length (number of bits) aas the
  * state size for this key. The function uses the first @c stateSize bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
- * 
+ *
  * The wordsize ist set to 64 bits.
- * 
+ *
  * @param keyCtx
  *     Pointer to a Threefish key structure.
  * @param in
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 3c2878c966e1..376cd63d8f83 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -29,7 +29,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 			carry += words[i];
 			words[i] = carry;
 			carry >>= 32;
-		}        
+		}
 		tweak[0] = words[0] & 0xffffffffL;
 		tweak[0] |= (words[1] & 0xffffffffL) << 32;
 		tweak[1] |= words[2] & 0xffffffffL;
@@ -79,7 +79,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 			carry += words[i];
 			words[i] = carry;
 			carry >>= 32;
-		}        
+		}
 		tweak[0] = words[0] & 0xffffffffL;
 		tweak[0] |= (words[1] & 0xffffffffL) << 32;
 		tweak[1] |= words[2] & 0xffffffffL;
@@ -133,7 +133,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 			carry += words[i];
 			words[i] = carry;
 			carry >>= 32;
-		}        
+		}
 		tweak[0] = words[0] & 0xffffffffL;
 		tweak[0] |= (words[1] & 0xffffffffL) << 32;
 		tweak[1] |= words[2] & 0xffffffffL;
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index bb36860fafdf..d315f547feae 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -28,7 +28,7 @@
 #define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
 #define KW_TWK_BASE     (0)
 #define KW_KEY_BASE     (3)
-#define ks              (kw + KW_KEY_BASE)                
+#define ks              (kw + KW_KEY_BASE)
 #define ts              (kw + KW_TWK_BASE)
 
 #ifdef SKEIN_DEBUG
@@ -76,7 +76,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 		ts[0] += byteCntAdd;                    /* update processed length */
 
 		/* precompute the key schedule for this block */
-		ks[0] = ctx->X[0];     
+		ks[0] = ctx->X[0];
 		ks[1] = ctx->X[1];
 		ks[2] = ctx->X[2];
 		ks[3] = ctx->X[3];
@@ -103,7 +103,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 
-#if SKEIN_UNROLL_256 == 0                       
+#if SKEIN_UNROLL_256 == 0
 #define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
 	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
@@ -129,8 +129,8 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 
 	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
-#endif  
-		{    
+#endif
+		{
 #define R256_8_rounds(R)                  \
 		R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
 		R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
@@ -270,7 +270,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 		ks[5] = ctx->X[5];
 		ks[6] = ctx->X[6];
 		ks[7] = ctx->X[7];
-		ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ 
+		ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
 			ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
 
 		ts[2] = ts[0] ^ ts[1];
@@ -298,7 +298,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
 		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
 
-#if SKEIN_UNROLL_512 == 0                       
+#if SKEIN_UNROLL_512 == 0
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
 		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
 		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
@@ -529,7 +529,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 		X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
 		X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
 
-#if SKEIN_UNROLL_1024 == 0                      
+#if SKEIN_UNROLL_1024 == 0
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
 		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
 		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
@@ -551,7 +551,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 		X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
 		X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
 		X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); 
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 #else                                       /* looping version */
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
 		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
@@ -579,7 +579,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 		Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 
 		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
-#endif  
+#endif
 		{
 #define R1024_8_rounds(R)    /* do 8 full rounds */                               \
 			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 14/21] staging: crypto: skein: cleanup >80 character lines
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (12 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 13/21] staging: crypto: skein: remove trailing whitespace Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 15/21] staging: crypto: skein: fix do/while brace formatting Jason Cooper
                     ` (6 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h        |  175 +-
 drivers/staging/skein/include/threefishApi.h |   16 +-
 drivers/staging/skein/skein.c                |  586 ++-
 drivers/staging/skein/skeinApi.c             |   58 +-
 drivers/staging/skein/skeinBlockNo3F.c       |   27 +-
 drivers/staging/skein/skein_block.c          |  427 +-
 drivers/staging/skein/threefish1024Block.c   | 6152 ++++++++++++++++++++------
 drivers/staging/skein/threefish256Block.c    | 1398 ++++--
 drivers/staging/skein/threefish512Block.c    | 2775 +++++++++---
 drivers/staging/skein/threefishApi.c         |   13 +-
 10 files changed, 8919 insertions(+), 2708 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index dd9a210cf5dd..f92dc40711d1 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -39,12 +39,12 @@
 
 enum
 	{
-	SKEIN_SUCCESS         =      0,          /* return codes from Skein calls */
+	SKEIN_SUCCESS         =      0, /* return codes from Skein calls */
 	SKEIN_FAIL            =      1,
 	SKEIN_BAD_HASHLEN     =      2
 	};
 
-#define  SKEIN_MODIFIER_WORDS   (2)          /* number of modifier (tweak) words */
+#define  SKEIN_MODIFIER_WORDS   (2) /* number of modifier (tweak) words */
 
 #define  SKEIN_256_STATE_WORDS  (4)
 #define  SKEIN_512_STATE_WORDS  (8)
@@ -65,30 +65,30 @@ enum
 
 struct skein_ctx_hdr
 	{
-	size_t  hashBitLen;                      /* size of hash result, in bits */
-	size_t  bCnt;                            /* current byte count in buffer b[] */
-	u64  T[SKEIN_MODIFIER_WORDS];         /* tweak words: T[0]=byte cnt, T[1]=flags */
+	size_t  hashBitLen;		/* size of hash result, in bits */
+	size_t  bCnt;			/* current byte count in buffer b[] */
+	u64  T[SKEIN_MODIFIER_WORDS];	/* tweak: T[0]=byte cnt, T[1]=flags */
 	};
 
-struct skein_256_ctx                               /*  256-bit Skein hash context structure */
+struct skein_256_ctx /* 256-bit Skein hash context structure */
 	{
-	struct skein_ctx_hdr h;                      /* common header context variables */
-	u64  X[SKEIN_256_STATE_WORDS];        /* chaining variables */
-	u8  b[SKEIN_256_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	struct skein_ctx_hdr h;		/* common header context variables */
+	u64  X[SKEIN_256_STATE_WORDS];	/* chaining variables */
+	u8  b[SKEIN_256_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 	};
 
-struct skein_512_ctx                             /*  512-bit Skein hash context structure */
+struct skein_512_ctx /* 512-bit Skein hash context structure */
 	{
-	struct skein_ctx_hdr h;                      /* common header context variables */
-	u64  X[SKEIN_512_STATE_WORDS];        /* chaining variables */
-	u8  b[SKEIN_512_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	struct skein_ctx_hdr h;		/* common header context variables */
+	u64  X[SKEIN_512_STATE_WORDS];	/* chaining variables */
+	u8  b[SKEIN_512_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 	};
 
-struct skein1024_ctx                              /* 1024-bit Skein hash context structure */
+struct skein1024_ctx /* 1024-bit Skein hash context structure */
 	{
-	struct skein_ctx_hdr h;                      /* common header context variables */
-	u64  X[SKEIN1024_STATE_WORDS];        /* chaining variables */
-	u8  b[SKEIN1024_BLOCK_BYTES];        /* partial block buffer (8-byte aligned) */
+	struct skein_ctx_hdr h;		/* common header context variables */
+	u64  X[SKEIN1024_STATE_WORDS];	/* chaining variables */
+	u8  b[SKEIN1024_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 	};
 
 /*   Skein APIs for (incremental) "straight hashing" */
@@ -96,9 +96,12 @@ int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
 int  Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen);
 int  Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen);
 
-int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt);
-int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt);
+int  Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt);
+int  Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt);
+int  Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt);
 
 int  Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal);
 int  Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal);
@@ -118,9 +121,12 @@ int  Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal);
 **              to precompute the MAC IV, then a copy of the context saved and
 **              reused for each new MAC computation.
 **/
-int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
-int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes);
+int  Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes);
 
 /*
 **   Skein APIs for MAC and tree hash:
@@ -149,13 +155,13 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 ******************************************************************/
 
 /* tweak word T[1]: bit field starting positions */
-#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
+#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)      /* second word  */
 
-#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
-#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
-#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
-#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
-#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
+#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112) /* 112..118 hash tree level */
+#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119) /* 119 part. final in byte */
+#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120) /* 120..125 type field `*/
+#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126) /* 126      first blk flag */
+#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127) /* 127      final blk flag */
 
 /* tweak word T[1]: flag bit definition(s) */
 #define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
@@ -167,34 +173,37 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
 
 /* tweak word T[1]: block type field */
-#define SKEIN_BLK_TYPE_KEY       (0)                    /* key, for MAC and KDF */
-#define SKEIN_BLK_TYPE_CFG       (4)                    /* configuration block */
-#define SKEIN_BLK_TYPE_PERS      (8)                    /* personalization string */
-#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
-#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
-#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
-#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
-#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
-#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */
-
-#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
-#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
-#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
-#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
-#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
-#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
+#define SKEIN_BLK_TYPE_KEY       (0) /* key, for MAC and KDF */
+#define SKEIN_BLK_TYPE_CFG       (4) /* configuration block */
+#define SKEIN_BLK_TYPE_PERS      (8) /* personalization string */
+#define SKEIN_BLK_TYPE_PK       (12) /* pubkey (for digital sigs) */
+#define SKEIN_BLK_TYPE_KDF      (16) /* key identifier for KDF */
+#define SKEIN_BLK_TYPE_NONCE    (20) /* nonce for PRNG */
+#define SKEIN_BLK_TYPE_MSG      (48) /* message processing */
+#define SKEIN_BLK_TYPE_OUT      (63) /* output stage */
+#define SKEIN_BLK_TYPE_MASK     (63) /* bit field mask */
+
+#define SKEIN_T1_BLK_TYPE(T)   (((u64) (SKEIN_BLK_TYPE_##T)) << \
+					SKEIN_T1_POS_BLK_TYPE)
+#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* for MAC and KDF */
+#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* config block */
+#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization */
+#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* pubkey (for sigs) */
+#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key ident for KDF */
 #define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
 #define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
 #define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
 #define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */
 
-#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
-#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)
+#define SKEIN_T1_BLK_TYPE_CFG_FINAL    (SKEIN_T1_BLK_TYPE_CFG | \
+					SKEIN_T1_FLAG_FINAL)
+#define SKEIN_T1_BLK_TYPE_OUT_FINAL    (SKEIN_T1_BLK_TYPE_OUT | \
+					SKEIN_T1_FLAG_FINAL)
 
 #define SKEIN_VERSION           (1)
 
 #ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
-#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
+#define SKEIN_ID_STRING_LE      (0x33414853) /* "SHA3" (little-endian)*/
 #endif
 
 #define SKEIN_MK_64(hi32, lo32)  ((lo32) + (((u64) (hi32)) << 32))
@@ -208,23 +217,29 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define SKEIN_CFG_TREE_NODE_SIZE_POS  (8)
 #define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)
 
-#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
-#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
-#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)
+#define SKEIN_CFG_TREE_LEAF_SIZE_MSK (((u64)0xFF) << \
+					SKEIN_CFG_TREE_LEAF_SIZE_POS)
+#define SKEIN_CFG_TREE_NODE_SIZE_MSK (((u64)0xFF) << \
+					SKEIN_CFG_TREE_NODE_SIZE_POS)
+#define SKEIN_CFG_TREE_MAX_LEVEL_MSK (((u64)0xFF) << \
+					SKEIN_CFG_TREE_MAX_LEVEL_POS)
 
 #define SKEIN_CFG_TREE_INFO(leaf, node, maxLvl)                   \
 	((((u64)(leaf))   << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
 	 (((u64)(node))   << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
 	 (((u64)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS))
 
-#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0) /* use as treeInfo in InitExt() call for sequential processing */
+/* use as treeInfo in InitExt() call for sequential processing */
+#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0, 0, 0)
 
 /*
 **   Skein macros for getting/setting tweak words, etc.
 **   These are useful for partial input bytes, hash tree init/update, etc.
 **/
 #define Skein_Get_Tweak(ctxPtr, TWK_NUM)          ((ctxPtr)->h.T[TWK_NUM])
-#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal); }
+#define Skein_Set_Tweak(ctxPtr, TWK_NUM, tVal) { \
+		(ctxPtr)->h.T[TWK_NUM] = (tVal); \
+	}
 
 #define Skein_Get_T0(ctxPtr)     Skein_Get_Tweak(ctxPtr, 0)
 #define Skein_Get_T1(ctxPtr)     Skein_Get_Tweak(ctxPtr, 1)
@@ -241,14 +256,26 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define Skein_Set_Type(ctxPtr, BLK_TYPE)         \
 	Skein_Set_T1(ctxPtr, SKEIN_T1_BLK_TYPE_##BLK_TYPE)
 
-/* set up for starting with a new type: h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0; */
-#define Skein_Start_New_Type(ctxPtr, BLK_TYPE)   \
-	{ Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt = 0; }
+/*
+ * setup for starting with a new type:
+ * h.T[0]=0; h.T[1] = NEW_TYPE; h.bCnt=0;
+ */
+#define Skein_Start_New_Type(ctxPtr, BLK_TYPE) { \
+		Skein_Set_T0_T1(ctxPtr, 0, SKEIN_T1_FLAG_FIRST | \
+				SKEIN_T1_BLK_TYPE_##BLK_TYPE); \
+		(ctxPtr)->h.bCnt = 0; \
+	}
 
-#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
-#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }
+#define Skein_Clear_First_Flag(hdr) { \
+		(hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST; \
+	}
+#define Skein_Set_Bit_Pad_Flag(hdr) { \
+		(hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD; \
+	}
 
-#define Skein_Set_Tree_Level(hdr, height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); }
+#define Skein_Set_Tree_Level(hdr, height) { \
+		(hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); \
+	}
 
 /*****************************************************************
 ** "Internal" Skein definitions for debugging and error checking
@@ -263,7 +290,7 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 #define Skein_Show_Key(bits, ctx, key, keyBytes)
 #endif
 
-#define Skein_Assert(x, retCode)/* default: ignore all Asserts, for performance */
+#define Skein_Assert(x, retCode)/* ignore all Asserts, for performance */
 #define Skein_assert(x)
 
 /*****************************************************************
@@ -292,21 +319,29 @@ enum
 	R_512_7_0 =  8, R_512_7_1 = 35, R_512_7_2 = 56, R_512_7_3 = 22,
 
 	    /* Skein1024 round rotation constants */
-	R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47, R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
-	R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55, R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
-	R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13, R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
-	R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41, R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
-	R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31, R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
-	R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51, R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
-	R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46, R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
-	R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52, R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
+	R1024_0_0 = 24, R1024_0_1 = 13, R1024_0_2 =  8, R1024_0_3 = 47,
+	R1024_0_4 =  8, R1024_0_5 = 17, R1024_0_6 = 22, R1024_0_7 = 37,
+	R1024_1_0 = 38, R1024_1_1 = 19, R1024_1_2 = 10, R1024_1_3 = 55,
+	R1024_1_4 = 49, R1024_1_5 = 18, R1024_1_6 = 23, R1024_1_7 = 52,
+	R1024_2_0 = 33, R1024_2_1 =  4, R1024_2_2 = 51, R1024_2_3 = 13,
+	R1024_2_4 = 34, R1024_2_5 = 41, R1024_2_6 = 59, R1024_2_7 = 17,
+	R1024_3_0 =  5, R1024_3_1 = 20, R1024_3_2 = 48, R1024_3_3 = 41,
+	R1024_3_4 = 47, R1024_3_5 = 28, R1024_3_6 = 16, R1024_3_7 = 25,
+	R1024_4_0 = 41, R1024_4_1 =  9, R1024_4_2 = 37, R1024_4_3 = 31,
+	R1024_4_4 = 12, R1024_4_5 = 47, R1024_4_6 = 44, R1024_4_7 = 30,
+	R1024_5_0 = 16, R1024_5_1 = 34, R1024_5_2 = 56, R1024_5_3 = 51,
+	R1024_5_4 =  4, R1024_5_5 = 53, R1024_5_6 = 42, R1024_5_7 = 41,
+	R1024_6_0 = 31, R1024_6_1 = 44, R1024_6_2 = 47, R1024_6_3 = 46,
+	R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
+	R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52,
+	R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
 	};
 
 #ifndef SKEIN_ROUNDS
-#define SKEIN_256_ROUNDS_TOTAL (72)          /* number of rounds for the different block sizes */
+#define SKEIN_256_ROUNDS_TOTAL (72)	/* # rounds for diff block sizes */
 #define SKEIN_512_ROUNDS_TOTAL (72)
 #define SKEIN1024_ROUNDS_TOTAL (80)
-#else                                        /* allow command-line define in range 8*(5..14)   */
+#else			/* allow command-line define in range 8*(5..14)   */
 #define SKEIN_256_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/100) + 5) % 10) + 5))
 #define SKEIN_512_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS/10)  + 5) % 10) + 5))
 #define SKEIN1024_ROUNDS_TOTAL (8*((((SKEIN_ROUNDS)     + 5) % 10) + 5))
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 5d92bbff8c9f..e81675d7eac9 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -72,7 +72,9 @@ struct threefish_key {
  * @param tweak
  *     Pointer to the two tweak words (word has 64 bits).
  */
-void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize, u64 *keyData, u64 *tweak);
+void threefishSetKey(struct threefish_key *keyCtx,
+			enum threefish_size stateSize,
+			u64 *keyData, u64 *tweak);
 
 /**
  * Encrypt Threefisch block (bytes).
@@ -108,7 +110,8 @@ void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
  * @param out
  *     Pointer to cipher buffer.
  */
-void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
+void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in,
+				u64 *out);
 
 /**
  * Decrypt Threefisch block (bytes).
@@ -144,14 +147,17 @@ void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in, u8 *out);
  * @param out
  *     Pointer to plaintext buffer.
  */
-void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in, u64 *out);
+void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in,
+				u64 *out);
 
 void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
 void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
-void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input,
+			u64 *output);
 void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output);
 void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output);
-void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output);
+void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input,
+			u64 *output);
 /**
  * @}
  */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 3f0f32806181..ed603ee7b170 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -16,9 +16,12 @@
 
 /*****************************************************************/
 /* External function to process blkCnt (nonzero) full block(s) of data. */
-void    Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-void    Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
-void    Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd);
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
@@ -53,20 +56,28 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
-		/* build/process the config block, type == CONFIG (could be precomputed) */
-		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		/*
+		 * build/process the config block, type == CONFIG (could be
+		 * precomputed)
+		 */
+		/* set tweaks: T0=0; T1=CFG | FINAL */
+		Skein_Start_New_Type(ctx, CFG_FINAL);
+
+		/* set the schema, version */
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+		/* hash result length in bits */
+		cfg.w[1] = Skein_Swap64(hashBitLen);
 		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+		/* zero pad config block */
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0]));
 
 		/* compute the initial chaining values from config block */
-		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		/* zero the chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
 		Skein_256_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
-	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* The chaining vars ctx->X are now initialized for hashBitLen. */
 	/* Set up to process the data message portion of the hash (default) */
 	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
@@ -75,42 +86,58 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein_256_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+/* [identical to Skein_256_Init() when keyBytes == 0 && \
+ *	treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
 	union
 	{
 		u8  b[SKEIN_256_STATE_BYTES];
 		u64  w[SKEIN_256_STATE_WORDS];
-	} cfg;                              /* config block */
+	} cfg; /* config block */
 
 	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
 	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
 	/* compute the initial chaining values ctx->X[], based on key */
-	if (keyBytes == 0)                          /* is there a key? */
+	if (keyBytes == 0) /* is there a key? */
 	{
-		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+		/* no key: use all zeroes as key for config block */
+		memset(ctx->X, 0, sizeof(ctx->X));
 	}
-	else                                        /* here to pre-process a key */
+	else /* here to pre-process a key */
 	{
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
-		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-		Skein_256_Update(ctx, key, keyBytes);     /* hash the key */
-		Skein_256_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+		/* set output hash bit count = state size */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);
+		/* set tweaks: T0 = 0; T1 = KEY type */
+		Skein_Start_New_Type(ctx, KEY);
+		/* zero the initial chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
+		/* hash the key */
+		Skein_256_Update(ctx, key, keyBytes);
+		/* put result into cfg.b[] */
+		Skein_256_Final_Pad(ctx, cfg.b);
+		/* copy over into ctx->X[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
 	}
-	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
-	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	/*
+	 * build/process the config block, type == CONFIG (could be
+	 * precomputed for each key)
+	 */
+	/* output hash bit count */
+	ctx->h.hashBitLen = hashBitLen;
 	Skein_Start_New_Type(ctx, CFG_FINAL);
 
-	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	/* pre-pad cfg.w[] with zeroes */
+	memset(&cfg.w, 0, sizeof(cfg.w));
 	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	/* hash result length in bits */
+	cfg.w[1] = Skein_Swap64(hashBitLen);
+	/* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	cfg.w[2] = Skein_Swap64(treeInfo);
 
 	Skein_Show_Key(256, &ctx->h, key, keyBytes);
 
@@ -126,35 +153,46 @@ int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen, u64 treeInfo
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt)
 {
 	size_t n;
 
-	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
 	if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
 	{
-		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		/* finish up any buffered message data */
+		if (ctx->h.bCnt)
 		{
-			n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			/* # bytes free in buffer b[] */
+			n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;
 			if (n)
 			{
-				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				/* check on our logic here */
+				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
 				msgByteCnt  -= n;
 				msg         += n;
 				ctx->h.bCnt += n;
 			}
 			Skein_assert(ctx->h.bCnt == SKEIN_256_BLOCK_BYTES);
-			Skein_256_Process_Block(ctx, ctx->b, 1, SKEIN_256_BLOCK_BYTES);
+			Skein_256_Process_Block(ctx, ctx->b, 1,
+						SKEIN_256_BLOCK_BYTES);
 			ctx->h.bCnt = 0;
 		}
-		/* now process any remaining full blocks, directly from input message data */
+		/*
+		 * now process any remaining full blocks, directly from input
+		 * message data
+		 */
 		if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
 		{
-			n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;   /* number of full blocks to process */
-			Skein_256_Process_Block(ctx, msg, n, SKEIN_256_BLOCK_BYTES);
+			/* number of full blocks to process */
+			n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;
+			Skein_256_Process_Block(ctx, msg, n,
+						SKEIN_256_BLOCK_BYTES);
 			msgByteCnt -= n * SKEIN_256_BLOCK_BYTES;
 			msg        += n * SKEIN_256_BLOCK_BYTES;
 		}
@@ -178,31 +216,46 @@ int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN_256_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)            /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
 
-	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+	/* process the final block */
+	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;
 		if (n >= SKEIN_256_BLOCK_BYTES)
 			n  = SKEIN_256_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(256, &ctx->h, n,
+				 hashVal+i*SKEIN_256_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -240,21 +293,32 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
-		/* build/process the config block, type == CONFIG (could be precomputed) */
-		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		/*
+		 * build/process the config block, type == CONFIG (could be
+		 * precomputed)
+		 */
+		/* set tweaks: T0=0; T1=CFG | FINAL */
+		Skein_Start_New_Type(ctx, CFG_FINAL);
+
+		/* set the schema, version */
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+		/* hash result length in bits */
+		cfg.w[1] = Skein_Swap64(hashBitLen);
 		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+		/* zero pad config block */
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0]));
 
 		/* compute the initial chaining values from config block */
-		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		/* zero the chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
 		Skein_512_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
 
-	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/*
+	 * The chaining vars ctx->X are now initialized for the given
+	 * hashBitLen.
+	 */
 	/* Set up to process the data message portion of the hash (default) */
 	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
@@ -263,8 +327,10 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein_512_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+/* [identical to Skein_512_Init() when keyBytes == 0 && \
+ *	treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
 	union
 	{
@@ -278,27 +344,40 @@ int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo
 	/* compute the initial chaining values ctx->X[], based on key */
 	if (keyBytes == 0)                          /* is there a key? */
 	{
-		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+		/* no key: use all zeroes as key for config block */
+		memset(ctx->X, 0, sizeof(ctx->X));
 	}
-	else                                        /* here to pre-process a key */
+	else /* here to pre-process a key */
 	{
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
-		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-		Skein_512_Update(ctx, key, keyBytes);     /* hash the key */
-		Skein_512_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+		/* set output hash bit count = state size */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);
+		/* set tweaks: T0 = 0; T1 = KEY type */
+		Skein_Start_New_Type(ctx, KEY);
+		/* zero the initial chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
+		/* hash the key */
+		Skein_512_Update(ctx, key, keyBytes);
+		/* put result into cfg.b[] */
+		Skein_512_Final_Pad(ctx, cfg.b);
+		/* copy over into ctx->X[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
 	}
-	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
+	/*
+	 * build/process the config block, type == CONFIG (could be
+	 * precomputed for each key)
+	 */
 	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
 	Skein_Start_New_Type(ctx, CFG_FINAL);
 
-	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	/* pre-pad cfg.w[] with zeroes */
+	memset(&cfg.w, 0, sizeof(cfg.w));
 	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	/* hash result length in bits */
+	cfg.w[1] = Skein_Swap64(hashBitLen);
+	/* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	cfg.w[2] = Skein_Swap64(treeInfo);
 
 	Skein_Show_Key(512, &ctx->h, key, keyBytes);
 
@@ -314,35 +393,46 @@ int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen, u64 treeInfo
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt)
 {
 	size_t n;
 
-	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
 	if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
 	{
-		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		/* finish up any buffered message data */
+		if (ctx->h.bCnt)
 		{
-			n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			/* # bytes free in buffer b[] */
+			n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;
 			if (n)
 			{
-				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				/* check on our logic here */
+				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
 				msgByteCnt  -= n;
 				msg         += n;
 				ctx->h.bCnt += n;
 			}
 			Skein_assert(ctx->h.bCnt == SKEIN_512_BLOCK_BYTES);
-			Skein_512_Process_Block(ctx, ctx->b, 1, SKEIN_512_BLOCK_BYTES);
+			Skein_512_Process_Block(ctx, ctx->b, 1,
+						SKEIN_512_BLOCK_BYTES);
 			ctx->h.bCnt = 0;
 		}
-		/* now process any remaining full blocks, directly from input message data */
+		/*
+		 * now process any remaining full blocks, directly from input
+		 * message data
+		 */
 		if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
 		{
-			n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;   /* number of full blocks to process */
-			Skein_512_Process_Block(ctx, msg, n, SKEIN_512_BLOCK_BYTES);
+			/* number of full blocks to process */
+			n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;
+			Skein_512_Process_Block(ctx, msg, n,
+						SKEIN_512_BLOCK_BYTES);
 			msgByteCnt -= n * SKEIN_512_BLOCK_BYTES;
 			msg        += n * SKEIN_512_BLOCK_BYTES;
 		}
@@ -366,31 +456,46 @@ int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN_512_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)            /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
 
-	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+	/* process the final block */
+	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;
 		if (n >= SKEIN_512_BLOCK_BYTES)
 			n  = SKEIN_512_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(512, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(512, &ctx->h, n,
+				 hashVal+i*SKEIN_512_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -425,21 +530,29 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
-		/* build/process the config block, type == CONFIG (could be precomputed) */
-		Skein_Start_New_Type(ctx, CFG_FINAL);        /* set tweaks: T0=0; T1=CFG | FINAL */
-
-		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);  /* set the schema, version */
-		cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
+		/*
+		 * build/process the config block, type == CONFIG
+		 * (could be precomputed)
+		 */
+		/* set tweaks: T0=0; T1=CFG | FINAL */
+		Skein_Start_New_Type(ctx, CFG_FINAL);
+
+		/* set the schema, version */
+		cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
+		/* hash result length in bits */
+		cfg.w[1] = Skein_Swap64(hashBitLen);
 		cfg.w[2] = Skein_Swap64(SKEIN_CFG_TREE_INFO_SEQUENTIAL);
-		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0])); /* zero pad config block */
+		/* zero pad config block */
+		memset(&cfg.w[3], 0, sizeof(cfg) - 3*sizeof(cfg.w[0]));
 
 		/* compute the initial chaining values from config block */
-		memset(ctx->X, 0, sizeof(ctx->X));            /* zero the chaining variables */
+		/* zero the chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
 		Skein1024_Process_Block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
 
-	/* The chaining vars ctx->X are now initialized for the given hashBitLen. */
+	/* The chaining vars ctx->X are now initialized for the hashBitLen. */
 	/* Set up to process the data message portion of the hash (default) */
 	Skein_Start_New_Type(ctx, MSG);              /* T0=0, T1= MSG type */
 
@@ -448,8 +561,10 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* init the context for a MAC and/or tree hash operation */
-/* [identical to Skein1024_Init() when keyBytes == 0 && treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
-int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo, const u8 *key, size_t keyBytes)
+/* [identical to Skein1024_Init() when keyBytes == 0 && \
+ *	treeInfo == SKEIN_CFG_TREE_INFO_SEQUENTIAL] */
+int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen,
+			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
 	union
 	{
@@ -463,27 +578,41 @@ int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo
 	/* compute the initial chaining values ctx->X[], based on key */
 	if (keyBytes == 0)                          /* is there a key? */
 	{
-		memset(ctx->X, 0, sizeof(ctx->X));        /* no key: use all zeroes as key for config block */
+		/* no key: use all zeroes as key for config block */
+		memset(ctx->X, 0, sizeof(ctx->X));
 	}
-	else                                        /* here to pre-process a key */
+	else /* here to pre-process a key */
 	{
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
-		ctx->h.hashBitLen = 8*sizeof(ctx->X);     /* set output hash bit count = state size */
-		Skein_Start_New_Type(ctx, KEY);          /* set tweaks: T0 = 0; T1 = KEY type */
-		memset(ctx->X, 0, sizeof(ctx->X));        /* zero the initial chaining variables */
-		Skein1024_Update(ctx, key, keyBytes);     /* hash the key */
-		Skein1024_Final_Pad(ctx, cfg.b);         /* put result into cfg.b[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));     /* copy over into ctx->X[] */
+		/* set output hash bit count = state size */
+		ctx->h.hashBitLen = 8*sizeof(ctx->X);
+		/* set tweaks: T0 = 0; T1 = KEY type */
+		Skein_Start_New_Type(ctx, KEY);
+		/* zero the initial chaining variables */
+		memset(ctx->X, 0, sizeof(ctx->X));
+		/* hash the key */
+		Skein1024_Update(ctx, key, keyBytes);
+		/* put result into cfg.b[] */
+		Skein1024_Final_Pad(ctx, cfg.b);
+		/* copy over into ctx->X[] */
+		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
 	}
-	/* build/process the config block, type == CONFIG (could be precomputed for each key) */
-	ctx->h.hashBitLen = hashBitLen;             /* output hash bit count */
+	/*
+	 * build/process the config block, type == CONFIG (could be
+	 * precomputed for each key)
+	 */
+	/* output hash bit count */
+	ctx->h.hashBitLen = hashBitLen;
 	Skein_Start_New_Type(ctx, CFG_FINAL);
 
-	memset(&cfg.w, 0, sizeof(cfg.w));             /* pre-pad cfg.w[] with zeroes */
+	/* pre-pad cfg.w[] with zeroes */
+	memset(&cfg.w, 0, sizeof(cfg.w));
 	cfg.w[0] = Skein_Swap64(SKEIN_SCHEMA_VER);
-	cfg.w[1] = Skein_Swap64(hashBitLen);        /* hash result length in bits */
-	cfg.w[2] = Skein_Swap64(treeInfo);          /* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	/* hash result length in bits */
+	cfg.w[1] = Skein_Swap64(hashBitLen);
+	/* tree hash config info (or SKEIN_CFG_TREE_INFO_SEQUENTIAL) */
+	cfg.w[2] = Skein_Swap64(treeInfo);
 
 	Skein_Show_Key(1024, &ctx->h, key, keyBytes);
 
@@ -499,35 +628,46 @@ int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen, u64 treeInfo
 
 /*++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*/
 /* process the input bytes */
-int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg, size_t msgByteCnt)
+int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
+			size_t msgByteCnt)
 {
 	size_t n;
 
-	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
 	if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
 	{
-		if (ctx->h.bCnt)                              /* finish up any buffered message data */
+		/* finish up any buffered message data */
+		if (ctx->h.bCnt)
 		{
-			n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;  /* # bytes free in buffer b[] */
+			/* # bytes free in buffer b[] */
+			n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;
 			if (n)
 			{
-				Skein_assert(n < msgByteCnt);         /* check on our logic here */
+				/* check on our logic here */
+				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
 				msgByteCnt  -= n;
 				msg         += n;
 				ctx->h.bCnt += n;
 			}
 			Skein_assert(ctx->h.bCnt == SKEIN1024_BLOCK_BYTES);
-			Skein1024_Process_Block(ctx, ctx->b, 1, SKEIN1024_BLOCK_BYTES);
+			Skein1024_Process_Block(ctx, ctx->b, 1,
+						SKEIN1024_BLOCK_BYTES);
 			ctx->h.bCnt = 0;
 		}
-		/* now process any remaining full blocks, directly from input message data */
+		/*
+		 * now process any remaining full blocks, directly from input
+		 * message data
+		 */
 		if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
 		{
-			n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;   /* number of full blocks to process */
-			Skein1024_Process_Block(ctx, msg, n, SKEIN1024_BLOCK_BYTES);
+			/* number of full blocks to process */
+			n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;
+			Skein1024_Process_Block(ctx, msg, n,
+						SKEIN1024_BLOCK_BYTES);
 			msgByteCnt -= n * SKEIN1024_BLOCK_BYTES;
 			msg        += n * SKEIN1024_BLOCK_BYTES;
 		}
@@ -551,31 +691,46 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN1024_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;                 /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)            /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
 
-	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);  /* process the final block */
+	/* process the final block */
+	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;             /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;
 		if (n >= SKEIN1024_BLOCK_BYTES)
 			n  = SKEIN1024_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(1024, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(1024, &ctx->h, n,
+				 hashVal+i*SKEIN1024_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -587,14 +742,20 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 {
-	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)   /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
-	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN_256_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN_256_BLOCK_BYTES - ctx->h.bCnt);
+	/* process the final block */
+	Skein_256_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
-	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);   /* "output" the state bytes */
+	/* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_256_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -603,14 +764,20 @@ int Skein_256_Final_Pad(struct skein_256_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 {
-	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)   /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
-	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN_512_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN_512_BLOCK_BYTES - ctx->h.bCnt);
+	/* process the final block */
+	Skein_512_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
-	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);   /* "output" the state bytes */
+	/* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN_512_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -619,14 +786,20 @@ int Skein_512_Final_Pad(struct skein_512_ctx *ctx, u8 *hashVal)
 /* finalize the hash computation and output the block, no OUTPUT stage */
 int Skein1024_Final_Pad(struct skein1024_ctx *ctx, u8 *hashVal)
 {
-	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;        /* tag as the final block */
-	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)   /* zero pad b[] if necessary */
-		memset(&ctx->b[ctx->h.bCnt], 0, SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
-	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);    /* process the final block */
+	/* tag as the final block */
+	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	/* zero pad b[] if necessary */
+	if (ctx->h.bCnt < SKEIN1024_BLOCK_BYTES)
+		memset(&ctx->b[ctx->h.bCnt], 0,
+			SKEIN1024_BLOCK_BYTES - ctx->h.bCnt);
+	/* process the final block */
+	Skein1024_Process_Block(ctx, ctx->b, 1, ctx->h.bCnt);
 
-	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);   /* "output" the state bytes */
+	/* "output" the state bytes */
+	Skein_Put64_LSB_First(hashVal, ctx->X, SKEIN1024_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -638,25 +811,36 @@ int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN_256_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein_256_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN_256_BLOCK_BYTES;
 		if (n >= SKEIN_256_BLOCK_BYTES)
 			n  = SKEIN_256_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_256_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_256_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(256, &ctx->h, n,
+				 hashVal+i*SKEIN_256_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -667,25 +851,36 @@ int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN_512_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein_512_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN_512_BLOCK_BYTES;
 		if (n >= SKEIN_512_BLOCK_BYTES)
 			n  = SKEIN_512_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN_512_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN_512_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(256, &ctx->h, n,
+				 hashVal+i*SKEIN_512_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -696,25 +891,36 @@ int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 {
 	size_t i, n, byteCnt;
 	u64 X[SKEIN1024_STATE_WORDS];
-	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);    /* catch uninitialized context */
+	/* catch uninitialized context */
+	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* now output the result */
-	byteCnt = (ctx->h.hashBitLen + 7) >> 3;    /* total number of output bytes */
+	/* total number of output bytes */
+	byteCnt = (ctx->h.hashBitLen + 7) >> 3;
 
 	/* run Threefish in "counter mode" to generate output */
-	memset(ctx->b, 0, sizeof(ctx->b));  /* zero out b[], so it can hold the counter */
-	memcpy(X, ctx->X, sizeof(X));       /* keep a local copy of counter mode "key" */
+	/* zero out b[], so it can hold the counter */
+	memset(ctx->b, 0, sizeof(ctx->b));
+	/* keep a local copy of counter mode "key" */
+	memcpy(X, ctx->X, sizeof(X));
 	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
 	{
-		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i); /* build the counter block */
+		/* build the counter block */
+		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64)); /* run "counter mode" */
-		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;   /* number of output bytes left to go */
+		/* run "counter mode" */
+		Skein1024_Process_Block(ctx, ctx->b, 1, sizeof(u64));
+		/* number of output bytes left to go */
+		n = byteCnt - i*SKEIN1024_BLOCK_BYTES;
 		if (n >= SKEIN1024_BLOCK_BYTES)
 			n  = SKEIN1024_BLOCK_BYTES;
-		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X, n);   /* "output" the ctr mode bytes */
-		Skein_Show_Final(256, &ctx->h, n, hashVal+i*SKEIN1024_BLOCK_BYTES);
-		memcpy(ctx->X, X, sizeof(X));   /* restore the counter mode key for next time */
+		/* "output" the ctr mode bytes */
+		Skein_Put64_LSB_First(hashVal+i*SKEIN1024_BLOCK_BYTES, ctx->X,
+				      n);
+		Skein_Show_Final(256, &ctx->h, n,
+				 hashVal+i*SKEIN1024_BLOCK_BYTES);
+		/* restore the counter mode key for next time */
+		memcpy(ctx->X, X, sizeof(X));
 	}
 	return SKEIN_SUCCESS;
 }
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 3ebb1d60ef93..f0015d5b10f5 100644
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -46,9 +46,9 @@ int skeinInit(struct skein_ctx *ctx, size_t hashBitLen)
 
 	Skein_Assert(ctx, SKEIN_FAIL);
 	/*
-	 * The following two lines rely of the fact that the real Skein contexts are
-	 * a union in out context and thus have tha maximum memory available.
-	 * The beauty of C :-) .
+	 * The following two lines rely of the fact that the real Skein
+	 * contexts are a union in out context and thus have tha maximum
+	 * memory available.  The beauty of C :-) .
 	 */
 	X = ctx->m.s256.X;
 	Xlen = ctx->skeinSize/8;
@@ -72,7 +72,10 @@ int skeinInit(struct skein_ctx *ctx, size_t hashBitLen)
 	}
 
 	if (ret == SKEIN_SUCCESS) {
-		/* Save chaining variables for this combination of size and hashBitLen */
+		/*
+		 * Save chaining variables for this combination of size and
+		 * hashBitLen
+		 */
 		memcpy(ctx->XSave, X, Xlen);
 	}
 	return ret;
@@ -113,7 +116,10 @@ int skeinMacInit(struct skein_ctx *ctx, const u8 *key, size_t keyLen,
 		break;
 	}
 	if (ret == SKEIN_SUCCESS) {
-		/* Save chaining variables for this combination of key, keyLen, hashBitLen */
+		/*
+		 * Save chaining variables for this combination of key,
+		 * keyLen, hashBitLen
+		 */
 		memcpy(ctx->XSave, X, Xlen);
 	}
 	return ret;
@@ -125,9 +131,9 @@ void skeinReset(struct skein_ctx *ctx)
 	u64 *X = NULL;
 
 	/*
-	 * The following two lines rely of the fact that the real Skein contexts are
-	 * a union in out context and thus have tha maximum memory available.
-	 * The beautiy of C :-) .
+	 * The following two lines rely of the fact that the real Skein
+	 * contexts are a union in out context and thus have tha maximum
+	 * memory available.  The beautiy of C :-) .
 	 */
 	X = ctx->m.s256.X;
 	Xlen = ctx->skeinSize/8;
@@ -146,13 +152,16 @@ int skeinUpdate(struct skein_ctx *ctx, const u8 *msg,
 
 	switch (ctx->skeinSize) {
 	case Skein256:
-		ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg, msgByteCnt);
+		ret = Skein_256_Update(&ctx->m.s256, (const u8 *)msg,
+					msgByteCnt);
 		break;
 	case Skein512:
-		ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg, msgByteCnt);
+		ret = Skein_512_Update(&ctx->m.s512, (const u8 *)msg,
+					msgByteCnt);
 		break;
 	case Skein1024:
-		ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg, msgByteCnt);
+		ret = Skein1024_Update(&ctx->m.s1024, (const u8 *)msg,
+					msgByteCnt);
 		break;
 	}
 	return ret;
@@ -164,15 +173,19 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
 {
 	/*
 	 * I've used the bit pad implementation from skein_test.c (see NIST CD)
-	 * and modified it to use the convenience functions and added some pointer
-	 * arithmetic.
+	 * and modified it to use the convenience functions and added some
+	 * pointer arithmetic.
 	 */
 	size_t length;
 	u8 mask;
 	u8 *up;
 
-	/* only the final Update() call is allowed do partial bytes, else assert an error */
-	Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 || msgBitCnt == 0, SKEIN_FAIL);
+	/*
+	 * only the final Update() call is allowed do partial bytes, else
+	 * assert an error
+	 */
+	Skein_Assert((ctx->m.h.T[1] & SKEIN_T1_FLAG_BIT_PAD) == 0 ||
+			msgBitCnt == 0, SKEIN_FAIL);
 
 	/* if number of bits is a multiple of bytes - that's easy */
 	if ((msgBitCnt & 0x7) == 0) {
@@ -188,13 +201,18 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
 	 */
 	up = (u8 *)ctx->m.s256.X + ctx->skeinSize / 8;
 
-	Skein_Set_Bit_Pad_Flag(ctx->m.h);                       /* set tweak flag for the skeinFinal call */
+	/* set tweak flag for the skeinFinal call */
+	Skein_Set_Bit_Pad_Flag(ctx->m.h);
 
 	/* now "pad" the final partial byte the way NIST likes */
-	length = ctx->m.h.bCnt;                                 /* get the bCnt value (same location for all block sizes) */
-	Skein_assert(length != 0);                              /* internal sanity check: there IS a partial byte in the buffer! */
-	mask = (u8) (1u << (7 - (msgBitCnt & 7)));         /* partial byte bit mask */
-	up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);   /* apply bit padding on final byte (in the buffer) */
+	/* get the bCnt value (same location for all block sizes) */
+	length = ctx->m.h.bCnt;
+	/* internal sanity check: there IS a partial byte in the buffer! */
+	Skein_assert(length != 0);
+	/* partial byte bit mask */
+	mask = (u8) (1u << (7 - (msgBitCnt & 7)));
+	/* apply bit padding on final byte (in the buffer) */
+	up[length-1]  = (u8)((up[length-1] & (0-mask))|mask);
 
 	return SKEIN_SUCCESS;
 }
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 376cd63d8f83..69176389fef9 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -11,10 +11,10 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 	struct threefish_key key;
 	u64 tweak[2];
 	int i;
-	u64  w[SKEIN_256_STATE_WORDS];           /* local copy of input block */
+	u64  w[SKEIN_256_STATE_WORDS]; /* local copy of input block */
 	u64 words[3];
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	tweak[0] = ctx->h.T[0];
 	tweak[1] = ctx->h.T[1];
 
@@ -36,13 +36,14 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 
 		threefishSetKey(&key, Threefish256, ctx->X, tweak);
 
-		Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);   /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN_256_STATE_WORDS);
 
 		threefishEncryptBlockWords(&key, w, ctx->X);
 
 		blkPtr += SKEIN_256_BLOCK_BYTES;
 
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update ctx chaining vars */
 		ctx->X[0] = ctx->X[0] ^ w[0];
 		ctx->X[1] = ctx->X[1] ^ w[1];
 		ctx->X[2] = ctx->X[2] ^ w[2];
@@ -62,9 +63,9 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 	u64 tweak[2];
 	int i;
 	u64 words[3];
-	u64  w[SKEIN_512_STATE_WORDS];           /* local copy of input block */
+	u64  w[SKEIN_512_STATE_WORDS]; /* local copy of input block */
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	tweak[0] = ctx->h.T[0];
 	tweak[1] = ctx->h.T[1];
 
@@ -86,13 +87,14 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 
 		threefishSetKey(&key, Threefish512, ctx->X, tweak);
 
-		Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);   /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN_512_STATE_WORDS);
 
 		threefishEncryptBlockWords(&key, w, ctx->X);
 
 		blkPtr += SKEIN_512_BLOCK_BYTES;
 
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update ctx chaining vars */
 		ctx->X[0] = ctx->X[0] ^ w[0];
 		ctx->X[1] = ctx->X[1] ^ w[1];
 		ctx->X[2] = ctx->X[2] ^ w[2];
@@ -116,9 +118,9 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 	u64 tweak[2];
 	int i;
 	u64 words[3];
-	u64  w[SKEIN1024_STATE_WORDS];           /* local copy of input block */
+	u64  w[SKEIN1024_STATE_WORDS]; /* local copy of input block */
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	tweak[0] = ctx->h.T[0];
 	tweak[1] = ctx->h.T[1];
 
@@ -140,13 +142,14 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 
 		threefishSetKey(&key, Threefish1024, ctx->X, tweak);
 
-		Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);   /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, SKEIN1024_STATE_WORDS);
 
 		threefishEncryptBlockWords(&key, w, ctx->X);
 
 		blkPtr += SKEIN1024_BLOCK_BYTES;
 
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update ctx chaining vars */
 		ctx->X[0]  = ctx->X[0]  ^ w[0];
 		ctx->X[1]  = ctx->X[1]  ^ w[1];
 		ctx->X[2]  = ctx->X[2]  ^ w[2];
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index d315f547feae..780b4936f783 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -18,14 +18,14 @@
 #include <skein.h>
 
 #ifndef SKEIN_USE_ASM
-#define SKEIN_USE_ASM   (0)                     /* default is all C code (no ASM) */
+#define SKEIN_USE_ASM   (0) /* default is all C code (no ASM) */
 #endif
 
 #ifndef SKEIN_LOOP
-#define SKEIN_LOOP 001                          /* default: unroll 256 and 512, but not 1024 */
+#define SKEIN_LOOP 001 /* default: unroll 256 and 512, but not 1024 */
 #endif
 
-#define BLK_BITS        (WCNT*64)               /* some useful definitions for code here */
+#define BLK_BITS        (WCNT*64) /* some useful definitions for code here */
 #define KW_TWK_BASE     (0)
 #define KW_KEY_BASE     (3)
 #define ks              (kw + KW_KEY_BASE)
@@ -39,7 +39,8 @@
 
 /*****************************  Skein_256 ******************************/
 #if !(SKEIN_USE_ASM & 256)
-void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd)
 	{ /* do it in C */
 	enum {
 		WCNT = SKEIN_256_STATE_WORDS
@@ -47,7 +48,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 #undef  RCNT
 #define RCNT  (SKEIN_256_ROUNDS_TOTAL/8)
 
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_256 (((SKEIN_LOOP)/100)%10)
 #else
 #define SKEIN_UNROLL_256 (0)
@@ -55,25 +56,28 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 
 #if SKEIN_UNROLL_256
 #if (RCNT % SKEIN_UNROLL_256)
-#error "Invalid SKEIN_UNROLL_256"               /* sanity check on unroll count */
+#error "Invalid SKEIN_UNROLL_256" /* sanity check on unroll count */
 #endif
 	size_t  r;
-	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	u64  kw[WCNT+4+RCNT*2]; /* key schedule: chaining vars + tweak + "rot"*/
 #else
-	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4]; /* key schedule words : chaining vars + tweak */
 #endif
-	u64  X0, X1, X2, X3;                        /* local copy of context vars, for speed */
-	u64  w[WCNT];                           /* local copy of input block */
+	u64  X0, X1, X2, X3; /* local copy of context vars, for speed */
+	u64  w[WCNT]; /* local copy of input block */
 #ifdef SKEIN_DEBUG
-	const u64 *Xptr[4];                      /* use for debugging (help compiler put Xn in registers) */
+	const u64 *Xptr[4]; /* use for debugging (help cc put Xn in regs) */
 	Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
 #endif
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	ts[0] = ctx->h.T[0];
 	ts[1] = ctx->h.T[1];
 	do  {
-		/* this implementation only supports 2**64 input bytes (no carry out here) */
-		ts[0] += byteCntAdd;                    /* update processed length */
+		/*
+		 * this implementation only supports 2**64 input bytes
+		 * (no carry out here)
+		 */
+		ts[0] += byteCntAdd; /* update processed length */
 
 		/* precompute the key schedule for this block */
 		ks[0] = ctx->X[0];
@@ -84,16 +88,19 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 
 		ts[2] = ts[0] ^ ts[1];
 
-		Skein_Get64_LSB_First(w, blkPtr, WCNT);   /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, WCNT);
 		DebugSaveTweak(ctx);
 		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
-		X0 = w[0] + ks[0];                      /* do the first full key injection */
+		X0 = w[0] + ks[0]; /* do the first full key injection */
 		X1 = w[1] + ks[1] + ts[0];
 		X2 = w[2] + ks[2] + ts[1];
 		X3 = w[3] + ks[3];
 
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);    /* show starting state values */
+		/* show starting state values */
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL,
+				 Xptr);
 
 		blkPtr += SKEIN_256_BLOCK_BYTES;
 
@@ -104,31 +111,34 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 
 #if SKEIN_UNROLL_256 == 0
-#define R256(p0, p1, p2, p3, ROT, rNum)           /* fully unrolled */   \
-	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+#define R256(p0, p1, p2, p3, ROT, rNum) /* fully unrolled */ \
+	Round256(p0, p1, p2, p3, ROT, rNum) \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
 
-#define I256(R)                                                     \
-	X0   += ks[((R)+1) % 5];    /* inject the key schedule value */ \
-	X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3];                      \
-	X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3];                      \
-	X3   += ks[((R)+4) % 5] +     (R)+1;                            \
+#define I256(R) \
+	/* inject the key schedule value */ \
+	X0   += ks[((R)+1) % 5]; \
+	X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3]; \
+	X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3]; \
+	X3   += ks[((R)+4) % 5] +     (R)+1;       \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R256(p0, p1, p2, p3, ROT, rNum)                                  \
-	Round256(p0, p1, p2, p3, ROT, rNum)                                  \
+#else /* looping version */
+#define R256(p0, p1, p2, p3, ROT, rNum) \
+	Round256(p0, p1, p2, p3, ROT, rNum) \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
 
-#define I256(R)                                                     \
-	X0   += ks[r+(R)+0];        /* inject the key schedule value */ \
-	X1   += ks[r+(R)+1] + ts[r+(R)+0];                              \
-	X2   += ks[r+(R)+2] + ts[r+(R)+1];                              \
-	X3   += ks[r+(R)+3] +    r+(R);                              \
-	ks[r + (R) + 4]   = ks[r + (R) - 1];     /* rotate key schedule */\
-	ts[r + (R) + 2]   = ts[r + (R) - 1];                              \
+#define I256(R) \
+	/* inject the key schedule value */ \
+	X0   += ks[r+(R)+0]; \
+	X1   += ks[r+(R)+1] + ts[r+(R)+0]; \
+	X2   += ks[r+(R)+2] + ts[r+(R)+1]; \
+	X3   += ks[r+(R)+3] +    r+(R);    \
+	/* rotate key schedule */ \
+	ks[r + (R) + 4]   = ks[r + (R) - 1]; \
+	ts[r + (R) + 2]   = ts[r + (R) - 1]; \
 	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
 
-	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)  /* loop thru it */
+	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)
 #endif
 		{
 #define R256_8_rounds(R)                  \
@@ -145,7 +155,10 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 
 		R256_8_rounds(0);
 
-#define R256_Unroll_R(NN) ((SKEIN_UNROLL_256 == 0 && SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_256 > (NN)))
+#define R256_Unroll_R(NN) \
+	((SKEIN_UNROLL_256 == 0 && \
+	  SKEIN_256_ROUNDS_TOTAL/8 > (NN)) || \
+	 (SKEIN_UNROLL_256 > (NN)))
 
 	#if   R256_Unroll_R(1)
 		R256_8_rounds(1);
@@ -193,7 +206,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr, size_t
 #error  "need more unrolling in Skein_256_Process_Block"
 	#endif
 		}
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update context chaining */
 		ctx->X[0] = X0 ^ w[0];
 		ctx->X[1] = X1 ^ w[1];
 		ctx->X[2] = X2 ^ w[2];
@@ -223,7 +236,8 @@ unsigned int Skein_256_Unroll_Cnt(void)
 
 /*****************************  Skein_512 ******************************/
 #if !(SKEIN_USE_ASM & 512)
-void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd)
 { /* do it in C */
 	enum {
 		WCNT = SKEIN_512_STATE_WORDS
@@ -231,7 +245,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 #undef  RCNT
 #define RCNT  (SKEIN_512_ROUNDS_TOTAL/8)
 
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_512 (((SKEIN_LOOP)/10)%10)
 #else
 #define SKEIN_UNROLL_512 (0)
@@ -239,27 +253,30 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 
 #if SKEIN_UNROLL_512
 #if (RCNT % SKEIN_UNROLL_512)
-#error "Invalid SKEIN_UNROLL_512"               /* sanity check on unroll count */
+#error "Invalid SKEIN_UNROLL_512" /* sanity check on unroll count */
 #endif
 	size_t  r;
-	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	u64  kw[WCNT+4+RCNT*2]; /* key sched: chaining vars + tweak + "rot"*/
 #else
-	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4]; /* key schedule words : chaining vars + tweak */
 #endif
-	u64  X0, X1, X2, X3, X4, X5, X6, X7;            /* local copy of vars,  for speed */
-	u64  w[WCNT];                           /* local copy of input block */
+	u64  X0, X1, X2, X3, X4, X5, X6, X7; /* local copies, for speed */
+	u64  w[WCNT]; /* local copy of input block */
 #ifdef SKEIN_DEBUG
-	const u64 *Xptr[8];                      /* use for debugging (help compiler put Xn in registers) */
+	const u64 *Xptr[8]; /* use for debugging (help cc put Xn in regs) */
 	Xptr[0] = &X0;  Xptr[1] = &X1;  Xptr[2] = &X2;  Xptr[3] = &X3;
 	Xptr[4] = &X4;  Xptr[5] = &X5;  Xptr[6] = &X6;  Xptr[7] = &X7;
 #endif
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	ts[0] = ctx->h.T[0];
 	ts[1] = ctx->h.T[1];
 	do  {
-		/* this implementation only supports 2**64 input bytes (no carry out here) */
-		ts[0] += byteCntAdd;                    /* update processed length */
+		/*
+		 * this implementation only supports 2**64 input bytes
+		 * (no carry out here)
+		 */
+		ts[0] += byteCntAdd; /* update processed length */
 
 		/* precompute the key schedule for this block */
 		ks[0] = ctx->X[0];
@@ -275,11 +292,12 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 
 		ts[2] = ts[0] ^ ts[1];
 
-		Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, WCNT);
 		DebugSaveTweak(ctx);
 		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
-		X0   = w[0] + ks[0];                    /* do the first full key injection */
+		X0   = w[0] + ks[0]; /* do the first full key injection */
 		X1   = w[1] + ks[1];
 		X2   = w[2] + ks[2];
 		X3   = w[3] + ks[3];
@@ -290,65 +308,72 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 
 		blkPtr += SKEIN_512_BLOCK_BYTES;
 
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL,
+				 Xptr);
 		/* run the rounds */
-#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                  \
-		X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
-		X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
-		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
-		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
+#define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
+	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+	X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
+	X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
 
 #if SKEIN_UNROLL_512 == 0
-#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)      /* unrolled */  \
-		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
-
-#define I512(R)                                                     \
-		X0   += ks[((R) + 1) % 9];   /* inject the key schedule value */  \
-		X1   += ks[((R) + 2) % 9];                                        \
-		X2   += ks[((R) + 3) % 9];                                        \
-		X3   += ks[((R) + 4) % 9];                                        \
-		X4   += ks[((R) + 5) % 9];                                        \
-		X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3];                      \
-		X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3];                      \
-		X7   += ks[((R) + 8) % 9] +     (R) + 1;                            \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-		Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum)                      \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
-
-#define I512(R)                                                     \
-		X0   += ks[r + (R) + 0];        /* inject the key schedule value */ \
-		X1   += ks[r + (R) + 1];                                            \
-		X2   += ks[r + (R) + 2];                                            \
-		X3   += ks[r + (R) + 3];                                            \
-		X4   += ks[r + (R) + 4];                                            \
-		X5   += ks[r + (R) + 5] + ts[r + (R) + 0];                              \
-		X6   += ks[r + (R) + 6] + ts[r + (R) + 1];                              \
-		X7   += ks[r + (R) + 7] +         r + (R);                              \
-		ks[r +         (R) + 8] = ks[r + (R) - 1];  /* rotate key schedule */   \
-		ts[r +         (R) + 2] = ts[r + (R) - 1];                              \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-		for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)   /* loop thru it */
-#endif                         /* end of looped code definitions */
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) /* unrolled */ \
+	Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+
+#define I512(R) \
+	/* inject the key schedule value */ \
+	X0   += ks[((R) + 1) % 9]; \
+	X1   += ks[((R) + 2) % 9]; \
+	X2   += ks[((R) + 3) % 9]; \
+	X3   += ks[((R) + 4) % 9]; \
+	X4   += ks[((R) + 5) % 9]; \
+	X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3]; \
+	X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3]; \
+	X7   += ks[((R) + 8) % 9] +     (R) + 1;       \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+#else /* looping version */
+#define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+	Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+
+#define I512(R) \
+	/* inject the key schedule value */ \
+	X0   += ks[r + (R) + 0]; \
+	X1   += ks[r + (R) + 1]; \
+	X2   += ks[r + (R) + 2]; \
+	X3   += ks[r + (R) + 3]; \
+	X4   += ks[r + (R) + 4]; \
+	X5   += ks[r + (R) + 5] + ts[r + (R) + 0]; \
+	X6   += ks[r + (R) + 6] + ts[r + (R) + 1]; \
+	X7   += ks[r + (R) + 7] +         r + (R); \
+	/* rotate key schedule */ \
+	ks[r +         (R) + 8] = ks[r + (R) - 1]; \
+	ts[r +         (R) + 2] = ts[r + (R) - 1]; \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+		for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)
+#endif /* end of looped code definitions */
 		{
 #define R512_8_rounds(R)  /* do 8 full rounds */  \
-			R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
-			R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
-			R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
-			R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
-			I512(2 * (R));                              \
-			R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
-			R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
-			R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
-			R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
-			I512(2 * (R) + 1);        /* and key injection */
+		R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
+		R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
+		R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
+		R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_3, 8 * (R) + 4);   \
+		I512(2 * (R));                              \
+		R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_4, 8 * (R) + 5);   \
+		R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
+		R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
+		R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
+		I512(2 * (R) + 1);        /* and key injection */
 
 			R512_8_rounds(0);
 
-#define R512_Unroll_R(NN) ((SKEIN_UNROLL_512 == 0 && SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_512 > (NN)))
+#define R512_Unroll_R(NN) \
+		((SKEIN_UNROLL_512 == 0 && \
+		  SKEIN_512_ROUNDS_TOTAL/8 > (NN)) || \
+		 (SKEIN_UNROLL_512 > (NN)))
 
 	#if   R512_Unroll_R(1)
 			R512_8_rounds(1);
@@ -397,7 +422,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr, size_t
 	#endif
 		}
 
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update context chaining */
 		ctx->X[0] = X0 ^ w[0];
 		ctx->X[1] = X1 ^ w[1];
 		ctx->X[2] = X2 ^ w[2];
@@ -430,7 +455,8 @@ unsigned int Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t blkCnt, size_t byteCntAdd)
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
+				size_t blkCnt, size_t byteCntAdd)
 { /* do it in C, always looping (unrolled is bigger AND slower!) */
 	enum {
 		WCNT = SKEIN1024_STATE_WORDS
@@ -438,7 +464,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 #undef  RCNT
 #define RCNT  (SKEIN1024_ROUNDS_TOTAL/8)
 
-#ifdef SKEIN_LOOP                              /* configure how much to unroll the loop */
+#ifdef SKEIN_LOOP /* configure how much to unroll the loop */
 #define SKEIN_UNROLL_1024 ((SKEIN_LOOP)%10)
 #else
 #define SKEIN_UNROLL_1024 (0)
@@ -446,31 +472,35 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 
 #if (SKEIN_UNROLL_1024 != 0)
 #if (RCNT % SKEIN_UNROLL_1024)
-#error "Invalid SKEIN_UNROLL_1024"              /* sanity check on unroll count */
+#error "Invalid SKEIN_UNROLL_1024" /* sanity check on unroll count */
 #endif
 	size_t  r;
-	u64  kw[WCNT+4+RCNT*2];                  /* key schedule words : chaining vars + tweak + "rotation"*/
+	u64  kw[WCNT+4+RCNT*2]; /* key sched: chaining vars + tweak + "rot" */
 #else
-	u64  kw[WCNT+4];                         /* key schedule words : chaining vars + tweak */
+	u64  kw[WCNT+4]; /* key schedule words : chaining vars + tweak */
 #endif
 
-	u64  X00, X01, X02, X03, X04, X05, X06, X07,     /* local copy of vars, for speed */
-		X08, X09, X10, X11, X12, X13, X14, X15;
-	u64  w[WCNT];                            /* local copy of input block */
+	/* local copy of vars, for speed */
+	u64  X00, X01, X02, X03, X04, X05, X06, X07,
+	     X08, X09, X10, X11, X12, X13, X14, X15;
+	u64  w[WCNT]; /* local copy of input block */
 #ifdef SKEIN_DEBUG
-	const u64 *Xptr[16];                     /* use for debugging (help compiler put Xn in registers) */
+	const u64 *Xptr[16]; /* use for debugging (help cc put Xn in regs) */
 	Xptr[0]  = &X00;  Xptr[1]  = &X01;  Xptr[2]  = &X02;  Xptr[3]  = &X03;
 	Xptr[4]  = &X04;  Xptr[5]  = &X05;  Xptr[6]  = &X06;  Xptr[7]  = &X07;
 	Xptr[8]  = &X08;  Xptr[9]  = &X09;  Xptr[10] = &X10;  Xptr[11] = &X11;
 	Xptr[12] = &X12;  Xptr[13] = &X13;  Xptr[14] = &X14;  Xptr[15] = &X15;
 #endif
 
-	Skein_assert(blkCnt != 0);                  /* never call with blkCnt == 0! */
+	Skein_assert(blkCnt != 0); /* never call with blkCnt == 0! */
 	ts[0] = ctx->h.T[0];
 	ts[1] = ctx->h.T[1];
 	do  {
-		/* this implementation only supports 2**64 input bytes (no carry out here) */
-		ts[0] += byteCntAdd;                    /* update processed length */
+		/*
+		 * this implementation only supports 2**64 input bytes
+		 * (no carry out here)
+		 */
+		ts[0] += byteCntAdd; /* update processed length */
 
 		/* precompute the key schedule for this block */
 		ks[0]  = ctx->X[0];
@@ -496,11 +526,12 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 
 		ts[2]  = ts[0] ^ ts[1];
 
-		Skein_Get64_LSB_First(w, blkPtr, WCNT); /* get input block in little-endian format */
+		/* get input block in little-endian format */
+		Skein_Get64_LSB_First(w, blkPtr, WCNT);
 		DebugSaveTweak(ctx);
 		Skein_Show_Block(BLK_BITS, &ctx->h, ctx->X, blkPtr, w, ks, ts);
 
-		X00    =  w[0] +  ks[0];                 /* do the first full key injection */
+		X00    =  w[0] +  ks[0]; /* do the first full key injection */
 		X01    =  w[1] +  ks[1];
 		X02    =  w[2] +  ks[2];
 		X03    =  w[3] +  ks[3];
@@ -517,85 +548,105 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 		X14    = w[14] + ks[14] + ts[1];
 		X15    = w[15] + ks[15];
 
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL, Xptr);
+		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL,
+				 Xptr);
 
-#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rNum) \
-		X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
-		X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
-		X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
-		X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
-		X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
-		X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
-		X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
-		X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
+#define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
+			pF, ROT, rNum) \
+	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
+	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
+	X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
+	X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6;   \
+	X##p8 += X##p9; X##p9 = RotL_64(X##p9, ROT##_4); X##p9 ^= X##p8;   \
+	X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
+	X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
+	X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
 
 #if SKEIN_UNROLL_1024 == 0
-#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
-
-#define I1024(R)                                                        \
-		X00   += ks[((R) +  1) % 17]; /* inject the key schedule value */   \
-		X01   += ks[((R) +  2) % 17];                                       \
-		X02   += ks[((R) +  3) % 17];                                       \
-		X03   += ks[((R) +  4) % 17];                                       \
-		X04   += ks[((R) +  5) % 17];                                       \
-		X05   += ks[((R) +  6) % 17];                                       \
-		X06   += ks[((R) +  7) % 17];                                       \
-		X07   += ks[((R) +  8) % 17];                                       \
-		X08   += ks[((R) +  9) % 17];                                       \
-		X09   += ks[((R) + 10) % 17];                                       \
-		X10   += ks[((R) + 11) % 17];                                       \
-		X11   += ks[((R) + 12) % 17];                                       \
-		X12   += ks[((R) + 13) % 17];                                       \
-		X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3];                   \
-		X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3];                   \
-		X15   += ks[((R) + 16) % 17] +     (R) + 1;                         \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-#else                                       /* looping version */
-#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-		Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, ROT, rn) \
-		Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
-
-#define I1024(R)                                                      \
-		X00   += ks[r + (R) +  0];    /* inject the key schedule value */     \
-		X01   += ks[r + (R) +  1];                                            \
-		X02   += ks[r + (R) +  2];                                            \
-		X03   += ks[r + (R) +  3];                                            \
-		X04   += ks[r + (R) +  4];                                            \
-		X05   += ks[r + (R) +  5];                                            \
-		X06   += ks[r + (R) +  6];                                            \
-		X07   += ks[r + (R) +  7];                                            \
-		X08   += ks[r + (R) +  8];                                            \
-		X09   += ks[r + (R) +  9];                                            \
-		X10   += ks[r + (R) + 10];                                            \
-		X11   += ks[r + (R) + 11];                                            \
-		X12   += ks[r + (R) + 12];                                            \
-		X13   += ks[r + (R) + 13] + ts[r + (R) + 0];                          \
-		X14   += ks[r + (R) + 14] + ts[r + (R) + 1];                          \
-		X15   += ks[r + (R) + 15] +         r + (R);                          \
-		ks[r  +         (R) + 16] = ks[r + (R) - 1]; /* rotate key schedule */\
-		ts[r  +         (R) +  2] = ts[r + (R) - 1];                          \
-		Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
-
-		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)    /* loop thru it */
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, \
+		ROT, rn) \
+	Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
+			pF, ROT, rn) \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+
+#define I1024(R) \
+	/* inject the key schedule value */ \
+	X00   += ks[((R) +  1) % 17]; \
+	X01   += ks[((R) +  2) % 17]; \
+	X02   += ks[((R) +  3) % 17]; \
+	X03   += ks[((R) +  4) % 17]; \
+	X04   += ks[((R) +  5) % 17]; \
+	X05   += ks[((R) +  6) % 17]; \
+	X06   += ks[((R) +  7) % 17]; \
+	X07   += ks[((R) +  8) % 17]; \
+	X08   += ks[((R) +  9) % 17]; \
+	X09   += ks[((R) + 10) % 17]; \
+	X10   += ks[((R) + 11) % 17]; \
+	X11   += ks[((R) + 12) % 17]; \
+	X12   += ks[((R) + 13) % 17]; \
+	X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3]; \
+	X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3]; \
+	X15   += ks[((R) + 16) % 17] +     (R) + 1;       \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+#else /* looping version */
+#define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, \
+		ROT, rn) \
+	Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
+			pF, ROT, rn) \
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
+
+#define I1024(R) \
+	/* inject the key schedule value */ \
+	X00   += ks[r + (R) +  0]; \
+	X01   += ks[r + (R) +  1]; \
+	X02   += ks[r + (R) +  2]; \
+	X03   += ks[r + (R) +  3]; \
+	X04   += ks[r + (R) +  4]; \
+	X05   += ks[r + (R) +  5]; \
+	X06   += ks[r + (R) +  6]; \
+	X07   += ks[r + (R) +  7]; \
+	X08   += ks[r + (R) +  8]; \
+	X09   += ks[r + (R) +  9]; \
+	X10   += ks[r + (R) + 10]; \
+	X11   += ks[r + (R) + 11]; \
+	X12   += ks[r + (R) + 12]; \
+	X13   += ks[r + (R) + 13] + ts[r + (R) + 0]; \
+	X14   += ks[r + (R) + 14] + ts[r + (R) + 1]; \
+	X15   += ks[r + (R) + 15] +         r + (R); \
+	/* rotate key schedule */ \
+	ks[r  +         (R) + 16] = ks[r + (R) - 1]; \
+	ts[r  +         (R) +  2] = ts[r + (R) - 1]; \
+	Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+
+		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)
 #endif
 		{
-#define R1024_8_rounds(R)    /* do 8 full rounds */                               \
-			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_0, 8*(R) + 1); \
-			R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_1, 8*(R) + 2); \
-			R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_2, 8*(R) + 3); \
-			R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_3, 8*(R) + 4); \
-			I1024(2*(R));                                                             \
-			R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, R1024_4, 8*(R) + 5); \
-			R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, R1024_5, 8*(R) + 6); \
-			R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, R1024_6, 8*(R) + 7); \
-			R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, R1024_7, 8*(R) + 8); \
-			I1024(2*(R)+1);
+#define R1024_8_rounds(R) \
+	R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, \
+		R1024_0, 8*(R) + 1); \
+	R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, \
+		R1024_1, 8*(R) + 2); \
+	R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, \
+		R1024_2, 8*(R) + 3); \
+	R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, \
+		R1024_3, 8*(R) + 4); \
+	I1024(2*(R)); \
+	R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, \
+		R1024_4, 8*(R) + 5); \
+	R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, \
+		R1024_5, 8*(R) + 6); \
+	R1024(00, 07, 02, 05, 04, 03, 06, 01, 12, 15, 14, 13, 08, 11, 10, 09, \
+		R1024_6, 8*(R) + 7); \
+	R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, \
+		R1024_7, 8*(R) + 8); \
+	I1024(2*(R)+1);
 
 			R1024_8_rounds(0);
 
-#define R1024_Unroll_R(NN) ((SKEIN_UNROLL_1024 == 0 && SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || (SKEIN_UNROLL_1024 > (NN)))
+#define R1024_Unroll_R(NN) \
+		((SKEIN_UNROLL_1024 == 0 && \
+		  SKEIN1024_ROUNDS_TOTAL/8 > (NN)) || \
+		 (SKEIN_UNROLL_1024 > (NN)))
 
 	#if   R1024_Unroll_R(1)
 			R1024_8_rounds(1);
@@ -643,7 +694,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, size_t
 #error  "need more unrolling in Skein_1024_Process_Block"
   #endif
 		}
-		/* do the final "feedforward" xor, update context chaining vars */
+		/* do the final "feedforward" xor, update context chaining */
 
 		ctx->X[0] = X00 ^ w[0];
 		ctx->X[1] = X01 ^ w[1];
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 1730a3120a0f..fe7517b2008c 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -24,646 +24,2085 @@ void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
 	  t2 = keyCtx->tweak[2];
 
-	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k5; b4 += b5 + k4; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k7; b6 += b7 + k6; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k9; b8 += b9 + k8; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k11; b10 += b11 + k10; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k13 + t0; b12 += b13 + k12; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k15; b14 += b15 + k14 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k6; b4 += b5 + k5; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k8; b6 += b7 + k7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k10; b8 += b9 + k9; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k12; b10 += b11 + k11; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k14 + t1; b12 += b13 + k13; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k16 + 1; b14 += b15 + k15 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k7; b4 += b5 + k6; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k9; b6 += b7 + k8; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k11; b8 += b9 + k10; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k13; b10 += b11 + k12; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k15 + t2; b12 += b13 + k14; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k0 + 2; b14 += b15 + k16 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k8; b4 += b5 + k7; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k10; b6 += b7 + k9; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k12; b8 += b9 + k11; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k14; b10 += b11 + k13; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k16 + t0; b12 += b13 + k15; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k1 + 3; b14 += b15 + k0 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k9; b4 += b5 + k8; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k11; b6 += b7 + k10; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k13; b8 += b9 + k12; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k15; b10 += b11 + k14; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k0 + t1; b12 += b13 + k16; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k2 + 4; b14 += b15 + k1 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k10; b4 += b5 + k9; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k12; b6 += b7 + k11; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k14; b8 += b9 + k13; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k16; b10 += b11 + k15; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k1 + t2; b12 += b13 + k0; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k3 + 5; b14 += b15 + k2 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k9; b2 += b3 + k8; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k11; b4 += b5 + k10; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k13; b6 += b7 + k12; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k15; b8 += b9 + k14; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k0; b10 += b11 + k16; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k2 + t0; b12 += b13 + k1; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k4 + 6; b14 += b15 + k3 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k10; b2 += b3 + k9; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k12; b4 += b5 + k11; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k14; b6 += b7 + k13; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k16; b8 += b9 + k15; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k1; b10 += b11 + k0; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k3 + t1; b12 += b13 + k2; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k5 + 7; b14 += b15 + k4 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k9; b0 += b1 + k8; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k11; b2 += b3 + k10; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k13; b4 += b5 + k12; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k15; b6 += b7 + k14; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k0; b8 += b9 + k16; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k2; b10 += b11 + k1; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k4 + t2; b12 += b13 + k3; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k6 + 8; b14 += b15 + k5 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k10; b0 += b1 + k9; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k12; b2 += b3 + k11; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k14; b4 += b5 + k13; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k16; b6 += b7 + k15; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k1; b8 += b9 + k0; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k3; b10 += b11 + k2; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k5 + t0; b12 += b13 + k4; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k7 + 9; b14 += b15 + k6 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k11; b0 += b1 + k10; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k13; b2 += b3 + k12; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k15; b4 += b5 + k14; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k0; b6 += b7 + k16; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k2; b8 += b9 + k1; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k4; b10 += b11 + k3; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k6 + t1; b12 += b13 + k5; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k8 + 10; b14 += b15 + k7 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k12; b0 += b1 + k11; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k14; b2 += b3 + k13; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k16; b4 += b5 + k15; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k1; b6 += b7 + k0; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k3; b8 += b9 + k2; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k5; b10 += b11 + k4; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k7 + t2; b12 += b13 + k6; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k9 + 11; b14 += b15 + k8 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k13; b0 += b1 + k12; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k15; b2 += b3 + k14; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k0; b4 += b5 + k16; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k2; b6 += b7 + k1; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k4; b8 += b9 + k3; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k6; b10 += b11 + k5; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k8 + t0; b12 += b13 + k7; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k10 + 12; b14 += b15 + k9 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k14; b0 += b1 + k13; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k16; b2 += b3 + k15; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k1; b4 += b5 + k0; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k3; b6 += b7 + k2; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k5; b8 += b9 + k4; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k7; b10 += b11 + k6; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k9 + t1; b12 += b13 + k8; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k11 + 13; b14 += b15 + k10 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k15; b0 += b1 + k14; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k0; b2 += b3 + k16; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k2; b4 += b5 + k1; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k4; b6 += b7 + k3; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k6; b8 += b9 + k5; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k8; b10 += b11 + k7; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k10 + t2; b12 += b13 + k9; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k12 + 14; b14 += b15 + k11 + t0; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k16; b0 += b1 + k15; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k3; b4 += b5 + k2; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k5; b6 += b7 + k4; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k7; b8 += b9 + k6; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k9; b10 += b11 + k8; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k11 + t0; b12 += b13 + k10; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k13 + 15; b14 += b15 + k12 + t1; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k0; b0 += b1 + k16; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k4; b4 += b5 + k3; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k6; b6 += b7 + k5; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k8; b8 += b9 + k7; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k10; b10 += b11 + k9; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k12 + t1; b12 += b13 + k11; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k14 + 16; b14 += b15 + k13 + t2; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k5; b4 += b5 + k4; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k7; b6 += b7 + k6; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k9; b8 += b9 + k8; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k11; b10 += b11 + k10; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k13 + t2; b12 += b13 + k12; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k15 + 17; b14 += b15 + k14 + t0; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-	b5 += k6; b4 += b5 + k5; b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-	b7 += k8; b6 += b7 + k7; b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-	b9 += k10; b8 += b9 + k9; b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-	b11 += k12; b10 += b11 + k11; b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-	b13 += k14 + t0; b12 += b13 + k13; b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-	b15 += k16 + 18; b14 += b15 + k15 + t1; b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-	b0 += b9; b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-	b2 += b13; b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-	b6 += b11; b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-	b4 += b15; b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-	b10 += b7; b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-	b12 += b3; b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-	b14 += b5; b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-	b8 += b1; b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-	b0 += b7; b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-	b2 += b5; b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-	b4 += b3; b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-	b6 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-	b12 += b15; b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-	b14 += b13; b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-	b8 += b11; b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-	b10 += b9; b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-	b0 += b15; b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-	b2 += b11; b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-	b6 += b13; b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-	b4 += b9; b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-	b14 += b1; b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-	b8 += b5; b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-	b10 += b3; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-	b12 += b7; b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-	b5 += k7; b4 += b5 + k6; b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-	b7 += k9; b6 += b7 + k8; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-	b9 += k11; b8 += b9 + k10; b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-	b11 += k13; b10 += b11 + k12; b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-	b13 += k15 + t1; b12 += b13 + k14; b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-	b15 += k0 + 19; b14 += b15 + k16 + t2; b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-	b0 += b9; b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-	b2 += b13; b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-	b6 += b11; b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-	b4 += b15; b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-	b10 += b7; b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-	b12 += b3; b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-	b14 += b5; b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-	b8 += b1; b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-	b0 += b7; b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-	b2 += b5; b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-	b4 += b3; b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-	b6 += b1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-	b12 += b15; b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-	b14 += b13; b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-	b8 += b11; b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-	b10 += b9; b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-	b0 += b15; b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-	b2 += b11; b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-	b6 += b13; b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-	b4 += b9; b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-	b14 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-	b8 += b5; b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-	b10 += b3; b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-	b12 += b7; b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k5;
+	b4 += b5 + k4;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k9;
+	b8 += b9 + k8;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k11;
+	b10 += b11 + k10;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k13 + t0;
+	b12 += b13 + k12;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k15;
+	b14 += b15 + k14 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k6;
+	b4 += b5 + k5;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k8;
+	b6 += b7 + k7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k10;
+	b8 += b9 + k9;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k12;
+	b10 += b11 + k11;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k14 + t1;
+	b12 += b13 + k13;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k16 + 1;
+	b14 += b15 + k15 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k7;
+	b4 += b5 + k6;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k9;
+	b6 += b7 + k8;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k11;
+	b8 += b9 + k10;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k13;
+	b10 += b11 + k12;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k15 + t2;
+	b12 += b13 + k14;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k0 + 2;
+	b14 += b15 + k16 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k8;
+	b4 += b5 + k7;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k10;
+	b6 += b7 + k9;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k12;
+	b8 += b9 + k11;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k14;
+	b10 += b11 + k13;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k16 + t0;
+	b12 += b13 + k15;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k1 + 3;
+	b14 += b15 + k0 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k9;
+	b4 += b5 + k8;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k11;
+	b6 += b7 + k10;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k13;
+	b8 += b9 + k12;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k15;
+	b10 += b11 + k14;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k0 + t1;
+	b12 += b13 + k16;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k2 + 4;
+	b14 += b15 + k1 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k10;
+	b4 += b5 + k9;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k12;
+	b6 += b7 + k11;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k14;
+	b8 += b9 + k13;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k16;
+	b10 += b11 + k15;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k1 + t2;
+	b12 += b13 + k0;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k3 + 5;
+	b14 += b15 + k2 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k9;
+	b2 += b3 + k8;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k11;
+	b4 += b5 + k10;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k13;
+	b6 += b7 + k12;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k15;
+	b8 += b9 + k14;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k0;
+	b10 += b11 + k16;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k2 + t0;
+	b12 += b13 + k1;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k4 + 6;
+	b14 += b15 + k3 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k10;
+	b2 += b3 + k9;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k12;
+	b4 += b5 + k11;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k14;
+	b6 += b7 + k13;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k16;
+	b8 += b9 + k15;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k1;
+	b10 += b11 + k0;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k3 + t1;
+	b12 += b13 + k2;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k5 + 7;
+	b14 += b15 + k4 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k9;
+	b0 += b1 + k8;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k11;
+	b2 += b3 + k10;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k13;
+	b4 += b5 + k12;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k15;
+	b6 += b7 + k14;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k0;
+	b8 += b9 + k16;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k2;
+	b10 += b11 + k1;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k4 + t2;
+	b12 += b13 + k3;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k6 + 8;
+	b14 += b15 + k5 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k10;
+	b0 += b1 + k9;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k12;
+	b2 += b3 + k11;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k14;
+	b4 += b5 + k13;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k16;
+	b6 += b7 + k15;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k1;
+	b8 += b9 + k0;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k3;
+	b10 += b11 + k2;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k5 + t0;
+	b12 += b13 + k4;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k7 + 9;
+	b14 += b15 + k6 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k11;
+	b0 += b1 + k10;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k13;
+	b2 += b3 + k12;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k15;
+	b4 += b5 + k14;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k0;
+	b6 += b7 + k16;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k2;
+	b8 += b9 + k1;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k4;
+	b10 += b11 + k3;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k6 + t1;
+	b12 += b13 + k5;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k8 + 10;
+	b14 += b15 + k7 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k12;
+	b0 += b1 + k11;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k14;
+	b2 += b3 + k13;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k16;
+	b4 += b5 + k15;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k1;
+	b6 += b7 + k0;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k3;
+	b8 += b9 + k2;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k5;
+	b10 += b11 + k4;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k7 + t2;
+	b12 += b13 + k6;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k9 + 11;
+	b14 += b15 + k8 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k13;
+	b0 += b1 + k12;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k15;
+	b2 += b3 + k14;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k0;
+	b4 += b5 + k16;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k2;
+	b6 += b7 + k1;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k4;
+	b8 += b9 + k3;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k6;
+	b10 += b11 + k5;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k8 + t0;
+	b12 += b13 + k7;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k10 + 12;
+	b14 += b15 + k9 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k14;
+	b0 += b1 + k13;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k16;
+	b2 += b3 + k15;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k1;
+	b4 += b5 + k0;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k3;
+	b6 += b7 + k2;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k5;
+	b8 += b9 + k4;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k7;
+	b10 += b11 + k6;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k9 + t1;
+	b12 += b13 + k8;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k11 + 13;
+	b14 += b15 + k10 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k15;
+	b0 += b1 + k14;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k16;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k2;
+	b4 += b5 + k1;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k4;
+	b6 += b7 + k3;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k6;
+	b8 += b9 + k5;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k8;
+	b10 += b11 + k7;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k10 + t2;
+	b12 += b13 + k9;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k12 + 14;
+	b14 += b15 + k11 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k16;
+	b0 += b1 + k15;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k3;
+	b4 += b5 + k2;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k5;
+	b6 += b7 + k4;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k7;
+	b8 += b9 + k6;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k9;
+	b10 += b11 + k8;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k11 + t0;
+	b12 += b13 + k10;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k13 + 15;
+	b14 += b15 + k12 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k0;
+	b0 += b1 + k16;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k4;
+	b4 += b5 + k3;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k6;
+	b6 += b7 + k5;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k8;
+	b8 += b9 + k7;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k10;
+	b10 += b11 + k9;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k12 + t1;
+	b12 += b13 + k11;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k14 + 16;
+	b14 += b15 + k13 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k5;
+	b4 += b5 + k4;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k9;
+	b8 += b9 + k8;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k11;
+	b10 += b11 + k10;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k13 + t2;
+	b12 += b13 + k12;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k15 + 17;
+	b14 += b15 + k14 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k6;
+	b4 += b5 + k5;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k8;
+	b6 += b7 + k7;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k10;
+	b8 += b9 + k9;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k12;
+	b10 += b11 + k11;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k14 + t0;
+	b12 += b13 + k13;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k16 + 18;
+	b14 += b15 + k15 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k7;
+	b4 += b5 + k6;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k9;
+	b6 += b7 + k8;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k11;
+	b8 += b9 + k10;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k13;
+	b10 += b11 + k12;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k15 + t1;
+	b12 += b13 + k14;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k0 + 19;
+	b14 += b15 + k16 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
 
 	output[0] = b0 + k3;
 	output[1] = b1 + k4;
@@ -683,685 +2122,2764 @@ void threefishEncrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	output[15] = b15 + k1 + 20;
 }
 
-void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	  b2 = input[2], b3 = input[3],
-	  b4 = input[4], b5 = input[5],
-	  b6 = input[6], b7 = input[7],
-	  b8 = input[8], b9 = input[9],
-	  b10 = input[10], b11 = input[11],
-	  b12 = input[12], b13 = input[13],
-	  b14 = input[14], b15 = input[15];
-	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-	  k8 = keyCtx->key[8], k9 = keyCtx->key[9],
-	  k10 = keyCtx->key[10], k11 = keyCtx->key[11],
-	  k12 = keyCtx->key[12], k13 = keyCtx->key[13],
-	  k14 = keyCtx->key[14], k15 = keyCtx->key[15],
-	  k16 = keyCtx->key[16];
-	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-	  t2 = keyCtx->tweak[2];
-	u64 tmp;
+void threefishDecrypt1024(struct threefish_key *keyCtx, u64 *input, u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7],
+	  b8 = input[8], b9 = input[9],
+	  b10 = input[10], b11 = input[11],
+	  b12 = input[12], b13 = input[13],
+	  b14 = input[14], b15 = input[15];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8], k9 = keyCtx->key[9],
+	  k10 = keyCtx->key[10], k11 = keyCtx->key[11],
+	  k12 = keyCtx->key[12], k13 = keyCtx->key[13],
+	  k14 = keyCtx->key[14], k15 = keyCtx->key[15],
+	  k16 = keyCtx->key[16];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
+	u64 tmp;
+
+	b0 -= k3;
+	b1 -= k4;
+	b2 -= k5;
+	b3 -= k6;
+	b4 -= k7;
+	b5 -= k8;
+	b6 -= k9;
+	b7 -= k10;
+	b8 -= k11;
+	b9 -= k12;
+	b10 -= k13;
+	b11 -= k14;
+	b12 -= k15;
+	b13 -= k16 + t2;
+	b14 -= k0 + t0;
+	b15 -= k1 + 20;
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k16 + t2;
+	b15 -= k0 + 19;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k14;
+	b13 -= k15 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k12;
+	b11 -= k13;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k10;
+	b9 -= k11;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k8;
+	b7 -= k9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k6;
+	b5 -= k7;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k15 + t1;
+	b15 -= k16 + 18;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k13;
+	b13 -= k14 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k11;
+	b11 -= k12;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k9;
+	b9 -= k10;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k7;
+	b7 -= k8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k5;
+	b5 -= k6;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k14 + t0;
+	b15 -= k15 + 17;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k12;
+	b13 -= k13 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k10;
+	b11 -= k11;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k8;
+	b9 -= k9;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k6;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k4;
+	b5 -= k5;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k0;
+	b1 -= k1;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k13 + t2;
+	b15 -= k14 + 16;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k11;
+	b13 -= k12 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k9;
+	b11 -= k10;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k7;
+	b9 -= k8;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k5;
+	b7 -= k6;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k3;
+	b5 -= k4;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k16;
+	b1 -= k0;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k12 + t1;
+	b15 -= k13 + 15;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k10;
+	b13 -= k11 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k8;
+	b11 -= k9;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k6;
+	b9 -= k7;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k4;
+	b7 -= k5;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k2;
+	b5 -= k3;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k15;
+	b1 -= k16;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k11 + t0;
+	b15 -= k12 + 14;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k9;
+	b13 -= k10 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k7;
+	b11 -= k8;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k5;
+	b9 -= k6;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k3;
+	b7 -= k4;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k1;
+	b5 -= k2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k16;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k14;
+	b1 -= k15;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k10 + t2;
+	b15 -= k11 + 13;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k8;
+	b13 -= k9 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k6;
+	b11 -= k7;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k4;
+	b9 -= k5;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k2;
+	b7 -= k3;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k0;
+	b5 -= k1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k15;
+	b3 -= k16;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k13;
+	b1 -= k14;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k9 + t1;
+	b15 -= k10 + 12;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k7;
+	b13 -= k8 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k5;
+	b11 -= k6;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k3;
+	b9 -= k4;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k1;
+	b7 -= k2;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k16;
+	b5 -= k0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k14;
+	b3 -= k15;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k12;
+	b1 -= k13;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k8 + t0;
+	b15 -= k9 + 11;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k6;
+	b13 -= k7 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k4;
+	b11 -= k5;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k2;
+	b9 -= k3;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k0;
+	b7 -= k1;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k15;
+	b5 -= k16;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k13;
+	b3 -= k14;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k11;
+	b1 -= k12;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k7 + t2;
+	b15 -= k8 + 10;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k5;
+	b13 -= k6 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k3;
+	b11 -= k4;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k1;
+	b9 -= k2;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k16;
+	b7 -= k0;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k14;
+	b5 -= k15;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k12;
+	b3 -= k13;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k10;
+	b1 -= k11;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k6 + t1;
+	b15 -= k7 + 9;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k4;
+	b13 -= k5 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k2;
+	b11 -= k3;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k0;
+	b9 -= k1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k15;
+	b7 -= k16;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k13;
+	b5 -= k14;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k11;
+	b3 -= k12;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k9;
+	b1 -= k10;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k5 + t0;
+	b15 -= k6 + 8;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k3;
+	b13 -= k4 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k1;
+	b11 -= k2;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k16;
+	b9 -= k0;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k14;
+	b7 -= k15;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k12;
+	b5 -= k13;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k10;
+	b3 -= k11;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k8;
+	b1 -= k9;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k4 + t2;
+	b15 -= k5 + 7;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k2;
+	b13 -= k3 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k0;
+	b11 -= k1;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k15;
+	b9 -= k16;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k13;
+	b7 -= k14;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k11;
+	b5 -= k12;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k9;
+	b3 -= k10;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
 
-	b0 -= k3;
-	b1 -= k4;
-	b2 -= k5;
-	b3 -= k6;
-	b4 -= k7;
-	b5 -= k8;
-	b6 -= k9;
-	b7 -= k10;
-	b8 -= k11;
-	b9 -= k12;
-	b10 -= k13;
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k3 + t1;
+	b15 -= k4 + 6;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k1;
+	b13 -= k2 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k16;
+	b11 -= k0;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k14;
+	b9 -= k15;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k12;
+	b7 -= k13;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k10;
+	b5 -= k11;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k8;
+	b3 -= k9;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k2 + t0;
+	b15 -= k3 + 5;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k0;
+	b13 -= k1 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k15;
+	b11 -= k16;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k13;
+	b9 -= k14;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k11;
+	b7 -= k12;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k9;
+	b5 -= k10;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k1 + t2;
+	b15 -= k2 + 4;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k16;
+	b13 -= k0 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k14;
+	b11 -= k15;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k12;
+	b9 -= k13;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k10;
+	b7 -= k11;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k8;
+	b5 -= k9;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k0 + t1;
+	b15 -= k1 + 3;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k15;
+	b13 -= k16 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k13;
 	b11 -= k14;
-	b12 -= k15;
-	b13 -= k16 + t2;
-	b14 -= k0 + t0;
-	b15 -= k1 + 20;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k16 + t2; b15 -= k0 + 19;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k14; b13 -= k15 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k12; b11 -= k13;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k10; b9 -= k11;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k8; b7 -= k9;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k6; b5 -= k7;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k4; b3 -= k5;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k2; b1 -= k3;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k15 + t1; b15 -= k16 + 18;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k13; b13 -= k14 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k11; b11 -= k12;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k9; b9 -= k10;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k7; b7 -= k8;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k5; b5 -= k6;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k3; b3 -= k4;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k1; b1 -= k2;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k14 + t0; b15 -= k15 + 17;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k12; b13 -= k13 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k10; b11 -= k11;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k8; b9 -= k9;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k6; b7 -= k7;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k4; b5 -= k5;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k2; b3 -= k3;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k0; b1 -= k1;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k13 + t2; b15 -= k14 + 16;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k11; b13 -= k12 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k9; b11 -= k10;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k7; b9 -= k8;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k5; b7 -= k6;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k3; b5 -= k4;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k1; b3 -= k2;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k16; b1 -= k0;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k12 + t1; b15 -= k13 + 15;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k10; b13 -= k11 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k8; b11 -= k9;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k6; b9 -= k7;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k4; b7 -= k5;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k2; b5 -= k3;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k0; b3 -= k1;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k15; b1 -= k16;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k11 + t0; b15 -= k12 + 14;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k9; b13 -= k10 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k7; b11 -= k8;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k5; b9 -= k6;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k3; b7 -= k4;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k1; b5 -= k2;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k16; b3 -= k0;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k14; b1 -= k15;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k10 + t2; b15 -= k11 + 13;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k8; b13 -= k9 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k6; b11 -= k7;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k4; b9 -= k5;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k2; b7 -= k3;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k0; b5 -= k1;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k15; b3 -= k16;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k13; b1 -= k14;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k9 + t1; b15 -= k10 + 12;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k7; b13 -= k8 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k5; b11 -= k6;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k3; b9 -= k4;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k1; b7 -= k2;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k16; b5 -= k0;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k14; b3 -= k15;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k12; b1 -= k13;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k8 + t0; b15 -= k9 + 11;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k6; b13 -= k7 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k4; b11 -= k5;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k2; b9 -= k3;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k0; b7 -= k1;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k15; b5 -= k16;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k13; b3 -= k14;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k11; b1 -= k12;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k7 + t2; b15 -= k8 + 10;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k5; b13 -= k6 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k3; b11 -= k4;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k1; b9 -= k2;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k16; b7 -= k0;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k14; b5 -= k15;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k12; b3 -= k13;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k10; b1 -= k11;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k6 + t1; b15 -= k7 + 9;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k4; b13 -= k5 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k2; b11 -= k3;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k0; b9 -= k1;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k15; b7 -= k16;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k13; b5 -= k14;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k11; b3 -= k12;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k9; b1 -= k10;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k5 + t0; b15 -= k6 + 8;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k3; b13 -= k4 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k1; b11 -= k2;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k16; b9 -= k0;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k14; b7 -= k15;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k12; b5 -= k13;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k10; b3 -= k11;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k8; b1 -= k9;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k4 + t2; b15 -= k5 + 7;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k2; b13 -= k3 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k0; b11 -= k1;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k15; b9 -= k16;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k13; b7 -= k14;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k11; b5 -= k12;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k9; b3 -= k10;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k7; b1 -= k8;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k3 + t1; b15 -= k4 + 6;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k1; b13 -= k2 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k16; b11 -= k0;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k14; b9 -= k15;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k12; b7 -= k13;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k10; b5 -= k11;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k8; b3 -= k9;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k6; b1 -= k7;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k2 + t0; b15 -= k3 + 5;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k0; b13 -= k1 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k15; b11 -= k16;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k13; b9 -= k14;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k11; b7 -= k12;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k9; b5 -= k10;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k7; b3 -= k8;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k5; b1 -= k6;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k1 + t2; b15 -= k2 + 4;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k16; b13 -= k0 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k14; b11 -= k15;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k12; b9 -= k13;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k10; b7 -= k11;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k8; b5 -= k9;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k6; b3 -= k7;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k4; b1 -= k5;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k0 + t1; b15 -= k1 + 3;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k15; b13 -= k16 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k13; b11 -= k14;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k11; b9 -= k12;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k9; b7 -= k10;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k7; b5 -= k8;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k5; b3 -= k6;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k3; b1 -= k4;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k16 + t0; b15 -= k0 + 2;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k14; b13 -= k15 + t2;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k12; b11 -= k13;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k10; b9 -= k11;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k8; b7 -= k9;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k6; b5 -= k7;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k4; b3 -= k5;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k2; b1 -= k3;
-	tmp = b7 ^ b12; b7 = (tmp >> 20) | (tmp << (64 - 20)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 37) | (tmp << (64 - 37)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 31) | (tmp << (64 - 31)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 52) | (tmp << (64 - 52)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 35) | (tmp << (64 - 35)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 48) | (tmp << (64 - 48)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 25) | (tmp << (64 - 25)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 44) | (tmp << (64 - 44)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 19) | (tmp << (64 - 19)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 46) | (tmp << (64 - 46)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 47) | (tmp << (64 - 47)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 44) | (tmp << (64 - 44)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 31) | (tmp << (64 - 31)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 41) | (tmp << (64 - 41)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 42) | (tmp << (64 - 42)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 53) | (tmp << (64 - 53)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 4) | (tmp << (64 - 4)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 56) | (tmp << (64 - 56)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 34) | (tmp << (64 - 34)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 16) | (tmp << (64 - 16)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 30) | (tmp << (64 - 30)); b14 -= b15 + k15 + t2; b15 -= k16 + 1;
-	tmp = b13 ^ b12; b13 = (tmp >> 44) | (tmp << (64 - 44)); b12 -= b13 + k13; b13 -= k14 + t1;
-	tmp = b11 ^ b10; b11 = (tmp >> 47) | (tmp << (64 - 47)); b10 -= b11 + k11; b11 -= k12;
-	tmp = b9 ^ b8; b9 = (tmp >> 12) | (tmp << (64 - 12)); b8 -= b9 + k9; b9 -= k10;
-	tmp = b7 ^ b6; b7 = (tmp >> 31) | (tmp << (64 - 31)); b6 -= b7 + k7; b7 -= k8;
-	tmp = b5 ^ b4; b5 = (tmp >> 37) | (tmp << (64 - 37)); b4 -= b5 + k5; b5 -= k6;
-	tmp = b3 ^ b2; b3 = (tmp >> 9) | (tmp << (64 - 9)); b2 -= b3 + k3; b3 -= k4;
-	tmp = b1 ^ b0; b1 = (tmp >> 41) | (tmp << (64 - 41)); b0 -= b1 + k1; b1 -= k2;
-	tmp = b7 ^ b12; b7 = (tmp >> 25) | (tmp << (64 - 25)); b12 -= b7;
-	tmp = b3 ^ b10; b3 = (tmp >> 16) | (tmp << (64 - 16)); b10 -= b3;
-	tmp = b5 ^ b8; b5 = (tmp >> 28) | (tmp << (64 - 28)); b8 -= b5;
-	tmp = b1 ^ b14; b1 = (tmp >> 47) | (tmp << (64 - 47)); b14 -= b1;
-	tmp = b9 ^ b4; b9 = (tmp >> 41) | (tmp << (64 - 41)); b4 -= b9;
-	tmp = b13 ^ b6; b13 = (tmp >> 48) | (tmp << (64 - 48)); b6 -= b13;
-	tmp = b11 ^ b2; b11 = (tmp >> 20) | (tmp << (64 - 20)); b2 -= b11;
-	tmp = b15 ^ b0; b15 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b15;
-	tmp = b9 ^ b10; b9 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b9;
-	tmp = b11 ^ b8; b11 = (tmp >> 59) | (tmp << (64 - 59)); b8 -= b11;
-	tmp = b13 ^ b14; b13 = (tmp >> 41) | (tmp << (64 - 41)); b14 -= b13;
-	tmp = b15 ^ b12; b15 = (tmp >> 34) | (tmp << (64 - 34)); b12 -= b15;
-	tmp = b1 ^ b6; b1 = (tmp >> 13) | (tmp << (64 - 13)); b6 -= b1;
-	tmp = b3 ^ b4; b3 = (tmp >> 51) | (tmp << (64 - 51)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 4) | (tmp << (64 - 4)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 33) | (tmp << (64 - 33)); b0 -= b7;
-	tmp = b1 ^ b8; b1 = (tmp >> 52) | (tmp << (64 - 52)); b8 -= b1;
-	tmp = b5 ^ b14; b5 = (tmp >> 23) | (tmp << (64 - 23)); b14 -= b5;
-	tmp = b3 ^ b12; b3 = (tmp >> 18) | (tmp << (64 - 18)); b12 -= b3;
-	tmp = b7 ^ b10; b7 = (tmp >> 49) | (tmp << (64 - 49)); b10 -= b7;
-	tmp = b15 ^ b4; b15 = (tmp >> 55) | (tmp << (64 - 55)); b4 -= b15;
-	tmp = b11 ^ b6; b11 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b11;
-	tmp = b13 ^ b2; b13 = (tmp >> 19) | (tmp << (64 - 19)); b2 -= b13;
-	tmp = b9 ^ b0; b9 = (tmp >> 38) | (tmp << (64 - 38)); b0 -= b9;
-	tmp = b15 ^ b14; b15 = (tmp >> 37) | (tmp << (64 - 37)); b14 -= b15 + k14 + t1; b15 -= k15;
-	tmp = b13 ^ b12; b13 = (tmp >> 22) | (tmp << (64 - 22)); b12 -= b13 + k12; b13 -= k13 + t0;
-	tmp = b11 ^ b10; b11 = (tmp >> 17) | (tmp << (64 - 17)); b10 -= b11 + k10; b11 -= k11;
-	tmp = b9 ^ b8; b9 = (tmp >> 8) | (tmp << (64 - 8)); b8 -= b9 + k8; b9 -= k9;
-	tmp = b7 ^ b6; b7 = (tmp >> 47) | (tmp << (64 - 47)); b6 -= b7 + k6; b7 -= k7;
-	tmp = b5 ^ b4; b5 = (tmp >> 8) | (tmp << (64 - 8)); b4 -= b5 + k4; b5 -= k5;
-	tmp = b3 ^ b2; b3 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b3 + k2; b3 -= k3;
-	tmp = b1 ^ b0; b1 = (tmp >> 24) | (tmp << (64 - 24)); b0 -= b1 + k0; b1 -= k1;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k11;
+	b9 -= k12;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k9;
+	b7 -= k10;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k7;
+	b5 -= k8;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k16 + t0;
+	b15 -= k0 + 2;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k14;
+	b13 -= k15 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k12;
+	b11 -= k13;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k10;
+	b9 -= k11;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k8;
+	b7 -= k9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k6;
+	b5 -= k7;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k15 + t2;
+	b15 -= k16 + 1;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k13;
+	b13 -= k14 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k11;
+	b11 -= k12;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k9;
+	b9 -= k10;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k7;
+	b7 -= k8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k5;
+	b5 -= k6;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k14 + t1;
+	b15 -= k15;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k12;
+	b13 -= k13 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k10;
+	b11 -= k11;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k8;
+	b9 -= k9;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k6;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k4;
+	b5 -= k5;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k0;
+	b1 -= k1;
 
 	output[15] = b15;
 	output[14] = b14;
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index da3b8357e47f..2ae746a641ae 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -12,158 +12,481 @@ void threefishEncrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
 	  t2 = keyCtx->tweak[2];
 
-	b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k3; b2 += b3 + k2 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k4 + 1; b2 += b3 + k3 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k0 + 2; b2 += b3 + k4 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k4 + t0; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k1 + 3; b2 += b3 + k0 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k0 + t1; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k2 + 4; b2 += b3 + k1 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k1 + t2; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k3 + 5; b2 += b3 + k2 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k2 + t0; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k4 + 6; b2 += b3 + k3 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k3 + t1; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k0 + 7; b2 += b3 + k4 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k4 + t2; b0 += b1 + k3; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k1 + 8; b2 += b3 + k0 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k0 + t0; b0 += b1 + k4; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k2 + 9; b2 += b3 + k1 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k1 + t1; b0 += b1 + k0; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k3 + 10; b2 += b3 + k2 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k2 + t2; b0 += b1 + k1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k4 + 11; b2 += b3 + k3 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k3 + t0; b0 += b1 + k2; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k0 + 12; b2 += b3 + k4 + t1; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k4 + t1; b0 += b1 + k3; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k1 + 13; b2 += b3 + k0 + t2; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k0 + t2; b0 += b1 + k4; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k2 + 14; b2 += b3 + k1 + t0; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k1 + t0; b0 += b1 + k0; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k3 + 15; b2 += b3 + k2 + t1; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	b1 += k2 + t1; b0 += b1 + k1; b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-	b3 += k4 + 16; b2 += b3 + k3 + t2; b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-	b0 += b3; b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-	b2 += b1; b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-	b0 += b1; b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-	b2 += b3; b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-	b0 += b3; b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-	b2 += b1; b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-	b1 += k3 + t2; b0 += b1 + k2; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-	b3 += k0 + 17; b2 += b3 + k4 + t0; b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-	b0 += b3; b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-	b2 += b1; b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-	b0 += b1; b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-	b2 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-	b0 += b3; b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-	b2 += b1; b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+	b1 += k1 + t0;
+	b0 += b1 + k0;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k2 + t1;
+	b0 += b1 + k1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k4 + 1;
+	b2 += b3 + k3 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k3 + t2;
+	b0 += b1 + k2;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k0 + 2;
+	b2 += b3 + k4 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k4 + t0;
+	b0 += b1 + k3;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k1 + 3;
+	b2 += b3 + k0 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k0 + t1;
+	b0 += b1 + k4;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k2 + 4;
+	b2 += b3 + k1 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k1 + t2;
+	b0 += b1 + k0;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k3 + 5;
+	b2 += b3 + k2 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k2 + t0;
+	b0 += b1 + k1;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k4 + 6;
+	b2 += b3 + k3 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k3 + t1;
+	b0 += b1 + k2;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k0 + 7;
+	b2 += b3 + k4 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k4 + t2;
+	b0 += b1 + k3;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k1 + 8;
+	b2 += b3 + k0 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k0 + t0;
+	b0 += b1 + k4;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k2 + 9;
+	b2 += b3 + k1 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k1 + t1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k3 + 10;
+	b2 += b3 + k2 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k2 + t2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k4 + 11;
+	b2 += b3 + k3 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k3 + t0;
+	b0 += b1 + k2;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k0 + 12;
+	b2 += b3 + k4 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k4 + t1;
+	b0 += b1 + k3;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k1 + 13;
+	b2 += b3 + k0 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k0 + t2;
+	b0 += b1 + k4;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k2 + 14;
+	b2 += b3 + k1 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k1 + t0;
+	b0 += b1 + k0;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k3 + 15;
+	b2 += b3 + k2 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k2 + t1;
+	b0 += b1 + k1;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k4 + 16;
+	b2 += b3 + k3 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k3 + t2;
+	b0 += b1 + k2;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k0 + 17;
+	b2 += b3 + k4 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
 
 	output[0] = b0 + k3;
 	output[1] = b1 + k4 + t0;
@@ -187,158 +510,625 @@ void threefishDecrypt256(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	b1 -= k4 + t0;
 	b2 -= k0 + t1;
 	b3 -= k1 + 18;
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t0; b3 -= k0 + 17;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t2; b3 -= k4 + 16;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t1; b3 -= k3 + 15;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t0; b3 -= k2 + 14;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t2; b3 -= k1 + 13;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t1; b3 -= k0 + 12;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t0; b3 -= k4 + 11;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t2; b3 -= k3 + 10;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k4; b1 -= k0 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k1 + t1; b3 -= k2 + 9;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k3; b1 -= k4 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k0 + t0; b3 -= k1 + 8;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k2; b1 -= k3 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k4 + t2; b3 -= k0 + 7;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k1; b1 -= k2 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k3 + t1; b3 -= k4 + 6;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k0; b1 -= k1 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k2 + t0; b3 -= k3 + 5;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k4; b1 -= k0 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k1 + t2; b3 -= k2 + 4;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k3; b1 -= k4 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k0 + t1; b3 -= k1 + 3;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k2; b1 -= k3 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k4 + t0; b3 -= k0 + 2;
-
-	tmp = b3 ^ b0; b3 = (tmp >> 32) | (tmp << (64 - 32)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 32) | (tmp << (64 - 32)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 58) | (tmp << (64 - 58)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 22) | (tmp << (64 - 22)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 12) | (tmp << (64 - 12)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 25) | (tmp << (64 - 25)); b0 -= b1 + k1; b1 -= k2 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b3 + k3 + t2; b3 -= k4 + 1;
-	tmp = b3 ^ b0; b3 = (tmp >> 5) | (tmp << (64 - 5)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 37) | (tmp << (64 - 37)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 23) | (tmp << (64 - 23)); b0 -= b1;
-	tmp = b3 ^ b2; b3 = (tmp >> 40) | (tmp << (64 - 40)); b2 -= b3;
-	tmp = b3 ^ b0; b3 = (tmp >> 52) | (tmp << (64 - 52)); b0 -= b3;
-	tmp = b1 ^ b2; b1 = (tmp >> 57) | (tmp << (64 - 57)); b2 -= b1;
-	tmp = b1 ^ b0; b1 = (tmp >> 14) | (tmp << (64 - 14)); b0 -= b1 + k0; b1 -= k1 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 16) | (tmp << (64 - 16)); b2 -= b3 + k2 + t1; b3 -= k3;
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k2;
+	b1 -= k3 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k4 + t0;
+	b3 -= k0 + 17;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k1;
+	b1 -= k2 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k3 + t2;
+	b3 -= k4 + 16;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k0;
+	b1 -= k1 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k2 + t1;
+	b3 -= k3 + 15;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k4;
+	b1 -= k0 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k1 + t0;
+	b3 -= k2 + 14;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k3;
+	b1 -= k4 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k0 + t2;
+	b3 -= k1 + 13;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k2;
+	b1 -= k3 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k4 + t1;
+	b3 -= k0 + 12;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k1;
+	b1 -= k2 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k3 + t0;
+	b3 -= k4 + 11;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k0;
+	b1 -= k1 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k2 + t2;
+	b3 -= k3 + 10;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k4;
+	b1 -= k0 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k1 + t1;
+	b3 -= k2 + 9;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k3;
+	b1 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k0 + t0;
+	b3 -= k1 + 8;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k2;
+	b1 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k4 + t2;
+	b3 -= k0 + 7;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k1;
+	b1 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k3 + t1;
+	b3 -= k4 + 6;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k0;
+	b1 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k2 + t0;
+	b3 -= k3 + 5;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k4;
+	b1 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k1 + t2;
+	b3 -= k2 + 4;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k3;
+	b1 -= k4 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k0 + t1;
+	b3 -= k1 + 3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k2;
+	b1 -= k3 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k4 + t0;
+	b3 -= k0 + 2;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k1;
+	b1 -= k2 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k3 + t2;
+	b3 -= k4 + 1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k0;
+	b1 -= k1 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k2 + t1;
+	b3 -= k3;
 
 	output[0] = b0;
 	output[1] = b1;
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index dc96ba279720..f428fd6e1719 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -16,294 +16,941 @@ void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
 	  t2 = keyCtx->tweak[2];
 
-	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k7; b6 += b7 + k6 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k8 + 1; b6 += b7 + k7 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k0 + 2; b6 += b7 + k8 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k1 + 3; b6 += b7 + k0 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k2 + 4; b6 += b7 + k1 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k3 + 5; b6 += b7 + k2 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k0; b2 += b3 + k8; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k4 + 6; b6 += b7 + k3 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k5 + 7; b6 += b7 + k4 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k0; b0 += b1 + k8; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k6 + 8; b6 += b7 + k5 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k1; b0 += b1 + k0; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k3; b2 += b3 + k2; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k5 + t0; b4 += b5 + k4; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k7 + 9; b6 += b7 + k6 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k2; b0 += b1 + k1; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k4; b2 += b3 + k3; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k6 + t1; b4 += b5 + k5; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k8 + 10; b6 += b7 + k7 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k3; b0 += b1 + k2; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k5; b2 += b3 + k4; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k7 + t2; b4 += b5 + k6; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k0 + 11; b6 += b7 + k8 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k4; b0 += b1 + k3; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k6; b2 += b3 + k5; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k8 + t0; b4 += b5 + k7; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k1 + 12; b6 += b7 + k0 + t1; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k5; b0 += b1 + k4; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k7; b2 += b3 + k6; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k0 + t1; b4 += b5 + k8; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k2 + 13; b6 += b7 + k1 + t2; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k6; b0 += b1 + k5; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k8; b2 += b3 + k7; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k1 + t2; b4 += b5 + k0; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k3 + 14; b6 += b7 + k2 + t0; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k7; b0 += b1 + k6; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k0; b2 += b3 + k8; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k2 + t0; b4 += b5 + k1; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k4 + 15; b6 += b7 + k3 + t1; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-	b1 += k8; b0 += b1 + k7; b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-	b3 += k1; b2 += b3 + k0; b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-	b5 += k3 + t1; b4 += b5 + k2; b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-	b7 += k5 + 16; b6 += b7 + k4 + t2; b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-	b2 += b1; b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-	b4 += b7; b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-	b6 += b5; b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-	b0 += b3; b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-	b4 += b1; b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-	b6 += b3; b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-	b0 += b5; b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-	b2 += b7; b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-	b6 += b1; b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-	b0 += b7; b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-	b2 += b5; b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-	b4 += b3; b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-	b1 += k0; b0 += b1 + k8; b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-	b3 += k2; b2 += b3 + k1; b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-	b5 += k4 + t2; b4 += b5 + k3; b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-	b7 += k6 + 17; b6 += b7 + k5 + t0; b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-	b2 += b1; b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-	b4 += b7; b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-	b6 += b5; b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-	b0 += b3; b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-	b4 += b1; b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-	b6 += b3; b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-	b0 += b5; b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-	b2 += b7; b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-	b6 += b1; b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-	b0 += b7; b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-	b2 += b5; b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-	b4 += b3; b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k5 + t0;
+	b4 += b5 + k4;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k6 + t1;
+	b4 += b5 + k5;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k8 + 1;
+	b6 += b7 + k7 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k7 + t2;
+	b4 += b5 + k6;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k0 + 2;
+	b6 += b7 + k8 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k8 + t0;
+	b4 += b5 + k7;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k1 + 3;
+	b6 += b7 + k0 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k0 + t1;
+	b4 += b5 + k8;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k2 + 4;
+	b6 += b7 + k1 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k1 + t2;
+	b4 += b5 + k0;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k3 + 5;
+	b6 += b7 + k2 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k8;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k2 + t0;
+	b4 += b5 + k1;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k4 + 6;
+	b6 += b7 + k3 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k3 + t1;
+	b4 += b5 + k2;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k5 + 7;
+	b6 += b7 + k4 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k0;
+	b0 += b1 + k8;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k4 + t2;
+	b4 += b5 + k3;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k6 + 8;
+	b6 += b7 + k5 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k5 + t0;
+	b4 += b5 + k4;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k7 + 9;
+	b6 += b7 + k6 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k6 + t1;
+	b4 += b5 + k5;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k8 + 10;
+	b6 += b7 + k7 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k7 + t2;
+	b4 += b5 + k6;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k0 + 11;
+	b6 += b7 + k8 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k8 + t0;
+	b4 += b5 + k7;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k1 + 12;
+	b6 += b7 + k0 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k0 + t1;
+	b4 += b5 + k8;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k2 + 13;
+	b6 += b7 + k1 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k1 + t2;
+	b4 += b5 + k0;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k3 + 14;
+	b6 += b7 + k2 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k8;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k2 + t0;
+	b4 += b5 + k1;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k4 + 15;
+	b6 += b7 + k3 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k3 + t1;
+	b4 += b5 + k2;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k5 + 16;
+	b6 += b7 + k4 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k0;
+	b0 += b1 + k8;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k4 + t2;
+	b4 += b5 + k3;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k6 + 17;
+	b6 += b7 + k5 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
 
 	output[0] = b0 + k0;
 	output[1] = b1 + k1;
@@ -315,318 +962,1254 @@ void threefishEncrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
 	output[7] = b7 + k7 + 18;
 }
 
-void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	  b2 = input[2], b3 = input[3],
-	  b4 = input[4], b5 = input[5],
-	  b6 = input[6], b7 = input[7];
-	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
-	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
-	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
-	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
-	  k8 = keyCtx->key[8];
-	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
-	  t2 = keyCtx->tweak[2];
+void threefishDecrypt512(struct threefish_key *keyCtx, u64 *input, u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	  b2 = input[2], b3 = input[3],
+	  b4 = input[4], b5 = input[5],
+	  b6 = input[6], b7 = input[7];
+	u64 k0 = keyCtx->key[0], k1 = keyCtx->key[1],
+	  k2 = keyCtx->key[2], k3 = keyCtx->key[3],
+	  k4 = keyCtx->key[4], k5 = keyCtx->key[5],
+	  k6 = keyCtx->key[6], k7 = keyCtx->key[7],
+	  k8 = keyCtx->key[8];
+	u64 t0 = keyCtx->tweak[0], t1 = keyCtx->tweak[1],
+	  t2 = keyCtx->tweak[2];
+
+	u64 tmp;
+
+	b0 -= k0;
+	b1 -= k1;
+	b2 -= k2;
+	b3 -= k3;
+	b4 -= k4;
+	b5 -= k5 + t0;
+	b6 -= k6 + t1;
+	b7 -= k7 + 18;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k5 + t0;
+	b7 -= k6 + 17;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k3;
+	b5 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k8;
+	b1 -= k0;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k4 + t2;
+	b7 -= k5 + 16;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k2;
+	b5 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k3 + t1;
+	b7 -= k4 + 15;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k1;
+	b5 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k8;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
 
-	u64 tmp;
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
 
-	b0 -= k0;
-	b1 -= k1;
-	b2 -= k2;
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k2 + t0;
+	b7 -= k3 + 14;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k0;
+	b5 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k1 + t2;
+	b7 -= k2 + 13;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k8;
+	b5 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k0 + t1;
+	b7 -= k1 + 12;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k7;
+	b5 -= k8 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k8 + t0;
+	b7 -= k0 + 11;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k6;
+	b5 -= k7 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k7 + t2;
+	b7 -= k8 + 10;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k5;
+	b5 -= k6 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k6 + t1;
+	b7 -= k7 + 9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k4;
+	b5 -= k5 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k2;
 	b3 -= k3;
-	b4 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k0;
+	b1 -= k1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k5 + t0;
+	b7 -= k6 + 8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k3;
+	b5 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k8;
+	b1 -= k0;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k4 + t2;
+	b7 -= k5 + 7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k2;
+	b5 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k3 + t1;
+	b7 -= k4 + 6;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k1;
+	b5 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k8;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k2 + t0;
+	b7 -= k3 + 5;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k0;
+	b5 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k1 + t2;
+	b7 -= k2 + 4;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k8;
+	b5 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k0 + t1;
+	b7 -= k1 + 3;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k7;
+	b5 -= k8 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k8 + t0;
+	b7 -= k0 + 2;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k6;
+	b5 -= k7 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k7 + t2;
+	b7 -= k8 + 1;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k5;
+	b5 -= k6 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k6 + t1;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k4;
 	b5 -= k5 + t0;
-	b6 -= k6 + t1;
-	b7 -= k7 + 18;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k5 + t0; b7 -= k6 + 17;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k3; b5 -= k4 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k1; b3 -= k2;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k8; b1 -= k0;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k4 + t2; b7 -= k5 + 16;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k2; b5 -= k3 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k0; b3 -= k1;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k7; b1 -= k8;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k3 + t1; b7 -= k4 + 15;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k1; b5 -= k2 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k8; b3 -= k0;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k6; b1 -= k7;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k2 + t0; b7 -= k3 + 14;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k0; b5 -= k1 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k7; b3 -= k8;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k5; b1 -= k6;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k1 + t2; b7 -= k2 + 13;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k8; b5 -= k0 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k6; b3 -= k7;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k4; b1 -= k5;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k0 + t1; b7 -= k1 + 12;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k7; b5 -= k8 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k5; b3 -= k6;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k3; b1 -= k4;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k8 + t0; b7 -= k0 + 11;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k6; b5 -= k7 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k4; b3 -= k5;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k2; b1 -= k3;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k7 + t2; b7 -= k8 + 10;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k5; b5 -= k6 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k3; b3 -= k4;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k1; b1 -= k2;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k6 + t1; b7 -= k7 + 9;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k4; b5 -= k5 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k2; b3 -= k3;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k0; b1 -= k1;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k5 + t0; b7 -= k6 + 8;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k3; b5 -= k4 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k1; b3 -= k2;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k8; b1 -= k0;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k4 + t2; b7 -= k5 + 7;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k2; b5 -= k3 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k0; b3 -= k1;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k7; b1 -= k8;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k3 + t1; b7 -= k4 + 6;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k1; b5 -= k2 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k8; b3 -= k0;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k6; b1 -= k7;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k2 + t0; b7 -= k3 + 5;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k0; b5 -= k1 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k7; b3 -= k8;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k5; b1 -= k6;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k1 + t2; b7 -= k2 + 4;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k8; b5 -= k0 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k6; b3 -= k7;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k4; b1 -= k5;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k0 + t1; b7 -= k1 + 3;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k7; b5 -= k8 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k5; b3 -= k6;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k3; b1 -= k4;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k8 + t0; b7 -= k0 + 2;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k6; b5 -= k7 + t2;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k4; b3 -= k5;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k2; b1 -= k3;
-	tmp = b3 ^ b4; b3 = (tmp >> 22) | (tmp << (64 - 22)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 56) | (tmp << (64 - 56)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 35) | (tmp << (64 - 35)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 8) | (tmp << (64 - 8)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 43) | (tmp << (64 - 43)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 29) | (tmp << (64 - 29)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 25) | (tmp << (64 - 25)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 17) | (tmp << (64 - 17)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 10) | (tmp << (64 - 10)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 50) | (tmp << (64 - 50)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 13) | (tmp << (64 - 13)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 24) | (tmp << (64 - 24)); b6 -= b7 + k7 + t2; b7 -= k8 + 1;
-	tmp = b5 ^ b4; b5 = (tmp >> 34) | (tmp << (64 - 34)); b4 -= b5 + k5; b5 -= k6 + t1;
-	tmp = b3 ^ b2; b3 = (tmp >> 30) | (tmp << (64 - 30)); b2 -= b3 + k3; b3 -= k4;
-	tmp = b1 ^ b0; b1 = (tmp >> 39) | (tmp << (64 - 39)); b0 -= b1 + k1; b1 -= k2;
-	tmp = b3 ^ b4; b3 = (tmp >> 56) | (tmp << (64 - 56)); b4 -= b3;
-	tmp = b5 ^ b2; b5 = (tmp >> 54) | (tmp << (64 - 54)); b2 -= b5;
-	tmp = b7 ^ b0; b7 = (tmp >> 9) | (tmp << (64 - 9)); b0 -= b7;
-	tmp = b1 ^ b6; b1 = (tmp >> 44) | (tmp << (64 - 44)); b6 -= b1;
-	tmp = b7 ^ b2; b7 = (tmp >> 39) | (tmp << (64 - 39)); b2 -= b7;
-	tmp = b5 ^ b0; b5 = (tmp >> 36) | (tmp << (64 - 36)); b0 -= b5;
-	tmp = b3 ^ b6; b3 = (tmp >> 49) | (tmp << (64 - 49)); b6 -= b3;
-	tmp = b1 ^ b4; b1 = (tmp >> 17) | (tmp << (64 - 17)); b4 -= b1;
-	tmp = b3 ^ b0; b3 = (tmp >> 42) | (tmp << (64 - 42)); b0 -= b3;
-	tmp = b5 ^ b6; b5 = (tmp >> 14) | (tmp << (64 - 14)); b6 -= b5;
-	tmp = b7 ^ b4; b7 = (tmp >> 27) | (tmp << (64 - 27)); b4 -= b7;
-	tmp = b1 ^ b2; b1 = (tmp >> 33) | (tmp << (64 - 33)); b2 -= b1;
-	tmp = b7 ^ b6; b7 = (tmp >> 37) | (tmp << (64 - 37)); b6 -= b7 + k6 + t1; b7 -= k7;
-	tmp = b5 ^ b4; b5 = (tmp >> 19) | (tmp << (64 - 19)); b4 -= b5 + k4; b5 -= k5 + t0;
-	tmp = b3 ^ b2; b3 = (tmp >> 36) | (tmp << (64 - 36)); b2 -= b3 + k2; b3 -= k3;
-	tmp = b1 ^ b0; b1 = (tmp >> 46) | (tmp << (64 - 46)); b0 -= b1 + k0; b1 -= k1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k0;
+	b1 -= k1;
 
 	output[0] = b0;
 	output[1] = b1;
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index e8ce06a9122f..1e70f66b7032 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -3,8 +3,9 @@
 #include <linux/string.h>
 #include <threefishApi.h>
 
-void threefishSetKey(struct threefish_key *keyCtx, enum threefish_size stateSize,
-		     u64 *keyData, u64 *tweak)
+void threefishSetKey(struct threefish_key *keyCtx,
+			enum threefish_size stateSize,
+			u64 *keyData, u64 *tweak)
 {
 	int keyWords = stateSize / 64;
 	int i;
@@ -28,9 +29,9 @@ void threefishEncryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
 	u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
 	u64 cipher[SKEIN_MAX_STATE_WORDS];
 
-	Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);   /* bytes to words */
+	Skein_Get64_LSB_First(plain, in, keyCtx->stateSize / 64);
 	threefishEncryptBlockWords(keyCtx, plain, cipher);
-	Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);  /* words to bytes */
+	Skein_Put64_LSB_First(out, cipher, keyCtx->stateSize / 8);
 }
 
 void threefishEncryptBlockWords(struct threefish_key *keyCtx, u64 *in,
@@ -55,9 +56,9 @@ void threefishDecryptBlockBytes(struct threefish_key *keyCtx, u8 *in,
 	u64 plain[SKEIN_MAX_STATE_WORDS];        /* max number of words*/
 	u64 cipher[SKEIN_MAX_STATE_WORDS];
 
-	Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);  /* bytes to words */
+	Skein_Get64_LSB_First(cipher, in, keyCtx->stateSize / 64);
 	threefishDecryptBlockWords(keyCtx, cipher, plain);
-	Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);   /* words to bytes */
+	Skein_Put64_LSB_First(out, plain, keyCtx->stateSize / 8);
 }
 
 void threefishDecryptBlockWords(struct threefish_key *keyCtx, u64 *in,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 15/21] staging: crypto: skein: fix do/while brace formatting
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (13 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 14/21] staging: crypto: skein: cleanup >80 character lines Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 16/21] staging: crypto: skein: fix brace placement errors Jason Cooper
                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/skein_block.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 780b4936f783..6e0f4a21aae3 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -215,8 +215,7 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-	}
-	while (--blkCnt);
+	} while (--blkCnt);
 	ctx->h.T[0] = ts[0];
 	ctx->h.T[1] = ts[1];
 }
@@ -434,8 +433,7 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 		Skein_Show_Round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
-	}
-	while (--blkCnt);
+	} while (--blkCnt);
 	ctx->h.T[0] = ts[0];
 	ctx->h.T[1] = ts[1];
 }
@@ -717,8 +715,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
 		blkPtr += SKEIN1024_BLOCK_BYTES;
-	}
-	while (--blkCnt);
+	} while (--blkCnt);
 	ctx->h.T[0] = ts[0];
 	ctx->h.T[1] = ts[1];
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 16/21] staging: crypto: skein: fix brace placement errors
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (14 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 15/21] staging: crypto: skein: fix do/while brace formatting Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 17/21] staging: crypto: skein: wrap multi-line macros in do-while loops Jason Cooper
                     ` (4 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein.h    |  30 ++++-----
 drivers/staging/skein/include/skein_iv.h |  65 ++++++++----------
 drivers/staging/skein/skein.c            | 111 ++++++++++---------------------
 3 files changed, 74 insertions(+), 132 deletions(-)

diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index f92dc40711d1..0a2abcecd2f7 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -37,12 +37,11 @@
 #define Skein_Get64_LSB_First(dst64, src08, wCnt) memcpy(dst64, src08, 8*(wCnt))
 #define Skein_Swap64(w64)  (w64)
 
-enum
-	{
+enum {
 	SKEIN_SUCCESS         =      0, /* return codes from Skein calls */
 	SKEIN_FAIL            =      1,
 	SKEIN_BAD_HASHLEN     =      2
-	};
+};
 
 #define  SKEIN_MODIFIER_WORDS   (2) /* number of modifier (tweak) words */
 
@@ -63,33 +62,29 @@ enum
 #define  SKEIN_512_BLOCK_BYTES  (8*SKEIN_512_STATE_WORDS)
 #define  SKEIN1024_BLOCK_BYTES  (8*SKEIN1024_STATE_WORDS)
 
-struct skein_ctx_hdr
-	{
+struct skein_ctx_hdr {
 	size_t  hashBitLen;		/* size of hash result, in bits */
 	size_t  bCnt;			/* current byte count in buffer b[] */
 	u64  T[SKEIN_MODIFIER_WORDS];	/* tweak: T[0]=byte cnt, T[1]=flags */
-	};
+};
 
-struct skein_256_ctx /* 256-bit Skein hash context structure */
-	{
+struct skein_256_ctx { /* 256-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
 	u64  X[SKEIN_256_STATE_WORDS];	/* chaining variables */
 	u8  b[SKEIN_256_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
-	};
+};
 
-struct skein_512_ctx /* 512-bit Skein hash context structure */
-	{
+struct skein_512_ctx { /* 512-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
 	u64  X[SKEIN_512_STATE_WORDS];	/* chaining variables */
 	u8  b[SKEIN_512_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
-	};
+};
 
-struct skein1024_ctx /* 1024-bit Skein hash context structure */
-	{
+struct skein1024_ctx { /* 1024-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
 	u64  X[SKEIN1024_STATE_WORDS];	/* chaining variables */
 	u8  b[SKEIN1024_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
-	};
+};
 
 /*   Skein APIs for (incremental) "straight hashing" */
 int  Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen);
@@ -296,8 +291,7 @@ int  Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal);
 /*****************************************************************
 ** Skein block function constants (shared across Ref and Opt code)
 ******************************************************************/
-enum
-	{
+enum {
 	    /* Skein_256 round rotation constants */
 	R_256_0_0 = 14, R_256_0_1 = 16,
 	R_256_1_0 = 52, R_256_1_1 = 57,
@@ -335,7 +329,7 @@ enum
 	R1024_6_4 = 19, R1024_6_5 = 42, R1024_6_6 = 44, R1024_6_7 = 25,
 	R1024_7_0 =  9, R1024_7_1 = 48, R1024_7_2 = 35, R1024_7_3 = 52,
 	R1024_7_4 = 23, R1024_7_5 = 31, R1024_7_6 = 37, R1024_7_7 = 20
-	};
+};
 
 #ifndef SKEIN_ROUNDS
 #define SKEIN_256_ROUNDS_TOTAL (72)	/* # rounds for diff block sizes */
diff --git a/drivers/staging/skein/include/skein_iv.h b/drivers/staging/skein/include/skein_iv.h
index bbbba77c44d3..8dd5e4d88a1d 100644
--- a/drivers/staging/skein/include/skein_iv.h
+++ b/drivers/staging/skein/include/skein_iv.h
@@ -20,44 +20,39 @@
 #define MK_64 SKEIN_MK_64
 
 /* blkSize =  256 bits. hashSize =  128 bits */
-const u64 SKEIN_256_IV_128[] =
-	{
+const u64 SKEIN_256_IV_128[] = {
 	MK_64(0xE1111906, 0x964D7260),
 	MK_64(0x883DAAA7, 0x7C8D811C),
 	MK_64(0x10080DF4, 0x91960F7A),
 	MK_64(0xCCF7DDE5, 0xB45BC1C2)
-	};
+};
 
 /* blkSize =  256 bits. hashSize =  160 bits */
-const u64 SKEIN_256_IV_160[] =
-	{
+const u64 SKEIN_256_IV_160[] = {
 	MK_64(0x14202314, 0x72825E98),
 	MK_64(0x2AC4E9A2, 0x5A77E590),
 	MK_64(0xD47A5856, 0x8838D63E),
 	MK_64(0x2DD2E496, 0x8586AB7D)
-	};
+};
 
 /* blkSize =  256 bits. hashSize =  224 bits */
-const u64 SKEIN_256_IV_224[] =
-	{
+const u64 SKEIN_256_IV_224[] = {
 	MK_64(0xC6098A8C, 0x9AE5EA0B),
 	MK_64(0x876D5686, 0x08C5191C),
 	MK_64(0x99CB88D7, 0xD7F53884),
 	MK_64(0x384BDDB1, 0xAEDDB5DE)
-	};
+};
 
 /* blkSize =  256 bits. hashSize =  256 bits */
-const u64 SKEIN_256_IV_256[] =
-	{
+const u64 SKEIN_256_IV_256[] = {
 	MK_64(0xFC9DA860, 0xD048B449),
 	MK_64(0x2FCA6647, 0x9FA7D833),
 	MK_64(0xB33BC389, 0x6656840F),
 	MK_64(0x6A54E920, 0xFDE8DA69)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  128 bits */
-const u64 SKEIN_512_IV_128[] =
-	{
+const u64 SKEIN_512_IV_128[] = {
 	MK_64(0xA8BC7BF3, 0x6FBF9F52),
 	MK_64(0x1E9872CE, 0xBD1AF0AA),
 	MK_64(0x309B1790, 0xB32190D3),
@@ -66,11 +61,10 @@ const u64 SKEIN_512_IV_128[] =
 	MK_64(0x1A18EBEA, 0xD46A32E3),
 	MK_64(0xA2CC5B18, 0xCE84AA82),
 	MK_64(0x6982AB28, 0x9D46982D)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  160 bits */
-const u64 SKEIN_512_IV_160[] =
-	{
+const u64 SKEIN_512_IV_160[] = {
 	MK_64(0x28B81A2A, 0xE013BD91),
 	MK_64(0xC2F11668, 0xB5BDF78F),
 	MK_64(0x1760D8F3, 0xF6A56F12),
@@ -79,11 +73,10 @@ const u64 SKEIN_512_IV_160[] =
 	MK_64(0xD908922E, 0x63ED70B8),
 	MK_64(0xB8EC76FF, 0xECCB52FA),
 	MK_64(0x01A47BB8, 0xA3F27A6E)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  224 bits */
-const u64 SKEIN_512_IV_224[] =
-	{
+const u64 SKEIN_512_IV_224[] = {
 	MK_64(0xCCD06162, 0x48677224),
 	MK_64(0xCBA65CF3, 0xA92339EF),
 	MK_64(0x8CCD69D6, 0x52FF4B64),
@@ -92,11 +85,10 @@ const u64 SKEIN_512_IV_224[] =
 	MK_64(0x6776FE65, 0x75D4EB3D),
 	MK_64(0x99FBC70E, 0x997413E9),
 	MK_64(0x9E2CFCCF, 0xE1C41EF7)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  256 bits */
-const u64 SKEIN_512_IV_256[] =
-	{
+const u64 SKEIN_512_IV_256[] = {
 	MK_64(0xCCD044A1, 0x2FDB3E13),
 	MK_64(0xE8359030, 0x1A79A9EB),
 	MK_64(0x55AEA061, 0x4F816E6F),
@@ -105,11 +97,10 @@ const u64 SKEIN_512_IV_256[] =
 	MK_64(0xE7A436CD, 0xC4746251),
 	MK_64(0xC36FBAF9, 0x393AD185),
 	MK_64(0x3EEDBA18, 0x33EDFC13)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  384 bits */
-const u64 SKEIN_512_IV_384[] =
-	{
+const u64 SKEIN_512_IV_384[] = {
 	MK_64(0xA3F6C6BF, 0x3A75EF5F),
 	MK_64(0xB0FEF9CC, 0xFD84FAA4),
 	MK_64(0x9D77DD66, 0x3D770CFE),
@@ -118,11 +109,10 @@ const u64 SKEIN_512_IV_384[] =
 	MK_64(0x7ED7D434, 0xE5807407),
 	MK_64(0x548FC1AC, 0xD4EC44D6),
 	MK_64(0x266E1754, 0x6AA18FF8)
-	};
+};
 
 /* blkSize =  512 bits. hashSize =  512 bits */
-const u64 SKEIN_512_IV_512[] =
-	{
+const u64 SKEIN_512_IV_512[] = {
 	MK_64(0x4903ADFF, 0x749C51CE),
 	MK_64(0x0D95DE39, 0x9746DF03),
 	MK_64(0x8FD19341, 0x27C79BCE),
@@ -131,11 +121,10 @@ const u64 SKEIN_512_IV_512[] =
 	MK_64(0xEABE394C, 0xA9D5C3F4),
 	MK_64(0x991112C7, 0x1A75B523),
 	MK_64(0xAE18A40B, 0x660FCC33)
-	};
+};
 
 /* blkSize = 1024 bits. hashSize =  384 bits */
-const u64 SKEIN1024_IV_384[] =
-	{
+const u64 SKEIN1024_IV_384[] = {
 	MK_64(0x5102B6B8, 0xC1894A35),
 	MK_64(0xFEEBC9E3, 0xFE8AF11A),
 	MK_64(0x0C807F06, 0xE32BED71),
@@ -152,11 +141,10 @@ const u64 SKEIN1024_IV_384[] =
 	MK_64(0x3B5A6530, 0x0DBC6516),
 	MK_64(0x484B9CD2, 0x167BBCE1),
 	MK_64(0x2D136947, 0xD4CBAFEA)
-	};
+};
 
 /* blkSize = 1024 bits. hashSize =  512 bits */
-const u64 SKEIN1024_IV_512[] =
-	{
+const u64 SKEIN1024_IV_512[] = {
 	MK_64(0xCAEC0E5D, 0x7C1B1B18),
 	MK_64(0xA01B0E04, 0x5F03E802),
 	MK_64(0x33840451, 0xED912885),
@@ -173,11 +161,10 @@ const u64 SKEIN1024_IV_512[] =
 	MK_64(0x67070872, 0x5B749816),
 	MK_64(0xB9CD28FB, 0xF0581BD1),
 	MK_64(0x0E2940B8, 0x15804974)
-	};
+};
 
 /* blkSize = 1024 bits. hashSize = 1024 bits */
-const u64 SKEIN1024_IV_1024[] =
-	{
+const u64 SKEIN1024_IV_1024[] = {
 	MK_64(0xD593DA07, 0x41E72355),
 	MK_64(0x15B5E511, 0xAC73E00C),
 	MK_64(0x5180E5AE, 0xBAF2C4F0),
@@ -194,6 +181,6 @@ const u64 SKEIN1024_IV_1024[] =
 	MK_64(0x6572DD22, 0xF2B4969A),
 	MK_64(0x61FD3062, 0xD00A579A),
 	MK_64(0x1DE0536E, 0x8682E539)
-	};
+};
 
 #endif /* _SKEIN_IV_H_ */
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index ed603ee7b170..0d8c70c02c6f 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -31,8 +31,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 /* init the context for a straight hashing operation  */
 int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN_256_STATE_BYTES];
 		u64  w[SKEIN_256_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -40,8 +39,7 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
 	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
-	switch (hashBitLen)
-	{             /* use pre-computed values, where available */
+	switch (hashBitLen) { /* use pre-computed values, where available */
 	case  256:
 		memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
 		break;
@@ -91,8 +89,7 @@ int Skein_256_Init(struct skein_256_ctx *ctx, size_t hashBitLen)
 int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen,
 			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN_256_STATE_BYTES];
 		u64  w[SKEIN_256_STATE_WORDS];
 	} cfg; /* config block */
@@ -101,13 +98,10 @@ int Skein_256_InitExt(struct skein_256_ctx *ctx, size_t hashBitLen,
 	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
 	/* compute the initial chaining values ctx->X[], based on key */
-	if (keyBytes == 0) /* is there a key? */
-	{
+	if (keyBytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
 		memset(ctx->X, 0, sizeof(ctx->X));
-	}
-	else /* here to pre-process a key */
-	{
+	} else { /* here to pre-process a key */
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
@@ -162,15 +156,12 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
 	Skein_Assert(ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
-	if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES)
-	{
+	if (msgByteCnt + ctx->h.bCnt > SKEIN_256_BLOCK_BYTES) {
 		/* finish up any buffered message data */
-		if (ctx->h.bCnt)
-		{
+		if (ctx->h.bCnt) {
 			/* # bytes free in buffer b[] */
 			n = SKEIN_256_BLOCK_BYTES - ctx->h.bCnt;
-			if (n)
-			{
+			if (n) {
 				/* check on our logic here */
 				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
@@ -187,8 +178,7 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
 		 * now process any remaining full blocks, directly from input
 		 * message data
 		 */
-		if (msgByteCnt > SKEIN_256_BLOCK_BYTES)
-		{
+		if (msgByteCnt > SKEIN_256_BLOCK_BYTES) {
 			/* number of full blocks to process */
 			n = (msgByteCnt-1) / SKEIN_256_BLOCK_BYTES;
 			Skein_256_Process_Block(ctx, msg, n,
@@ -200,8 +190,7 @@ int Skein_256_Update(struct skein_256_ctx *ctx, const u8 *msg,
 	}
 
 	/* copy any remaining source message data bytes into b[] */
-	if (msgByteCnt)
-	{
+	if (msgByteCnt) {
 		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_256_BLOCK_BYTES);
 		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
 		ctx->h.bCnt += msgByteCnt;
@@ -238,8 +227,7 @@ int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -268,8 +256,7 @@ int Skein_256_Final(struct skein_256_ctx *ctx, u8 *hashVal)
 /* init the context for a straight hashing operation  */
 int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN_512_STATE_BYTES];
 		u64  w[SKEIN_512_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -277,8 +264,7 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
 	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
-	switch (hashBitLen)
-	{             /* use pre-computed values, where available */
+	switch (hashBitLen) { /* use pre-computed values, where available */
 	case  512:
 		memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
 		break;
@@ -332,8 +318,7 @@ int Skein_512_Init(struct skein_512_ctx *ctx, size_t hashBitLen)
 int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen,
 			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN_512_STATE_BYTES];
 		u64  w[SKEIN_512_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -342,13 +327,10 @@ int Skein_512_InitExt(struct skein_512_ctx *ctx, size_t hashBitLen,
 	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
 	/* compute the initial chaining values ctx->X[], based on key */
-	if (keyBytes == 0)                          /* is there a key? */
-	{
+	if (keyBytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
 		memset(ctx->X, 0, sizeof(ctx->X));
-	}
-	else /* here to pre-process a key */
-	{
+	} else { /* here to pre-process a key */
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
@@ -402,15 +384,12 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
 	Skein_Assert(ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
-	if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES)
-	{
+	if (msgByteCnt + ctx->h.bCnt > SKEIN_512_BLOCK_BYTES) {
 		/* finish up any buffered message data */
-		if (ctx->h.bCnt)
-		{
+		if (ctx->h.bCnt) {
 			/* # bytes free in buffer b[] */
 			n = SKEIN_512_BLOCK_BYTES - ctx->h.bCnt;
-			if (n)
-			{
+			if (n) {
 				/* check on our logic here */
 				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
@@ -427,8 +406,7 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
 		 * now process any remaining full blocks, directly from input
 		 * message data
 		 */
-		if (msgByteCnt > SKEIN_512_BLOCK_BYTES)
-		{
+		if (msgByteCnt > SKEIN_512_BLOCK_BYTES) {
 			/* number of full blocks to process */
 			n = (msgByteCnt-1) / SKEIN_512_BLOCK_BYTES;
 			Skein_512_Process_Block(ctx, msg, n,
@@ -440,8 +418,7 @@ int Skein_512_Update(struct skein_512_ctx *ctx, const u8 *msg,
 	}
 
 	/* copy any remaining source message data bytes into b[] */
-	if (msgByteCnt)
-	{
+	if (msgByteCnt) {
 		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN_512_BLOCK_BYTES);
 		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
 		ctx->h.bCnt += msgByteCnt;
@@ -478,8 +455,7 @@ int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -508,8 +484,7 @@ int Skein_512_Final(struct skein_512_ctx *ctx, u8 *hashVal)
 /* init the context for a straight hashing operation  */
 int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN1024_STATE_BYTES];
 		u64  w[SKEIN1024_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -517,8 +492,7 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 	Skein_Assert(hashBitLen > 0, SKEIN_BAD_HASHLEN);
 	ctx->h.hashBitLen = hashBitLen;         /* output hash bit count */
 
-	switch (hashBitLen)
-	{              /* use pre-computed values, where available */
+	switch (hashBitLen) { /* use pre-computed values, where available */
 	case  512:
 		memcpy(ctx->X, SKEIN1024_IV_512, sizeof(ctx->X));
 		break;
@@ -566,8 +540,7 @@ int Skein1024_Init(struct skein1024_ctx *ctx, size_t hashBitLen)
 int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen,
 			u64 treeInfo, const u8 *key, size_t keyBytes)
 {
-	union
-	{
+	union {
 		u8  b[SKEIN1024_STATE_BYTES];
 		u64  w[SKEIN1024_STATE_WORDS];
 	} cfg;                              /* config block */
@@ -576,13 +549,10 @@ int Skein1024_InitExt(struct skein1024_ctx *ctx, size_t hashBitLen,
 	Skein_Assert(keyBytes == 0 || key != NULL, SKEIN_FAIL);
 
 	/* compute the initial chaining values ctx->X[], based on key */
-	if (keyBytes == 0)                          /* is there a key? */
-	{
+	if (keyBytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
 		memset(ctx->X, 0, sizeof(ctx->X));
-	}
-	else /* here to pre-process a key */
-	{
+	} else { /* here to pre-process a key */
 		Skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
@@ -637,15 +607,12 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
 	Skein_Assert(ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* process full blocks, if any */
-	if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES)
-	{
+	if (msgByteCnt + ctx->h.bCnt > SKEIN1024_BLOCK_BYTES) {
 		/* finish up any buffered message data */
-		if (ctx->h.bCnt)
-		{
+		if (ctx->h.bCnt) {
 			/* # bytes free in buffer b[] */
 			n = SKEIN1024_BLOCK_BYTES - ctx->h.bCnt;
-			if (n)
-			{
+			if (n) {
 				/* check on our logic here */
 				Skein_assert(n < msgByteCnt);
 				memcpy(&ctx->b[ctx->h.bCnt], msg, n);
@@ -662,8 +629,7 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
 		 * now process any remaining full blocks, directly from input
 		 * message data
 		 */
-		if (msgByteCnt > SKEIN1024_BLOCK_BYTES)
-		{
+		if (msgByteCnt > SKEIN1024_BLOCK_BYTES) {
 			/* number of full blocks to process */
 			n = (msgByteCnt-1) / SKEIN1024_BLOCK_BYTES;
 			Skein1024_Process_Block(ctx, msg, n,
@@ -675,8 +641,7 @@ int Skein1024_Update(struct skein1024_ctx *ctx, const u8 *msg,
 	}
 
 	/* copy any remaining source message data bytes into b[] */
-	if (msgByteCnt)
-	{
+	if (msgByteCnt) {
 		Skein_assert(msgByteCnt + ctx->h.bCnt <= SKEIN1024_BLOCK_BYTES);
 		memcpy(&ctx->b[ctx->h.bCnt], msg, msgByteCnt);
 		ctx->h.bCnt += msgByteCnt;
@@ -713,8 +678,7 @@ int Skein1024_Final(struct skein1024_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -823,8 +787,7 @@ int Skein_256_Output(struct skein_256_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -863,8 +826,7 @@ int Skein_512_Output(struct skein_512_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
@@ -903,8 +865,7 @@ int Skein1024_Output(struct skein1024_ctx *ctx, u8 *hashVal)
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
 	memcpy(X, ctx->X, sizeof(X));
-	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++)
-	{
+	for (i = 0; i*SKEIN1024_BLOCK_BYTES < byteCnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = Skein_Swap64((u64) i);
 		Skein_Start_New_Type(ctx, OUT_FINAL);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 17/21] staging: crypto: skein: wrap multi-line macros in do-while loops
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (15 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 16/21] staging: crypto: skein: fix brace placement errors Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 18/21] staging: crypto: skein: remove externs from .c files Jason Cooper
                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/skein_block.c | 66 ++++++++++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 6e0f4a21aae3..707a21ae53c6 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -107,27 +107,36 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 		/* run the rounds */
 
 #define Round256(p0, p1, p2, p3, ROT, rNum)                              \
+do { \
 	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
+} while (0)
 
 #if SKEIN_UNROLL_256 == 0
 #define R256(p0, p1, p2, p3, ROT, rNum) /* fully unrolled */ \
+do { \
 	Round256(p0, p1, p2, p3, ROT, rNum) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr); \
+} while (0)
 
 #define I256(R) \
+do { \
 	/* inject the key schedule value */ \
 	X0   += ks[((R)+1) % 5]; \
 	X1   += ks[((R)+2) % 5] + ts[((R)+1) % 3]; \
 	X2   += ks[((R)+3) % 5] + ts[((R)+2) % 3]; \
 	X3   += ks[((R)+4) % 5] +     (R)+1;       \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 #else /* looping version */
 #define R256(p0, p1, p2, p3, ROT, rNum) \
+do { \
 	Round256(p0, p1, p2, p3, ROT, rNum) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr); \
+} while (0)
 
 #define I256(R) \
+do { \
 	/* inject the key schedule value */ \
 	X0   += ks[r+(R)+0]; \
 	X1   += ks[r+(R)+1] + ts[r+(R)+0]; \
@@ -136,12 +145,14 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 	/* rotate key schedule */ \
 	ks[r + (R) + 4]   = ks[r + (R) - 1]; \
 	ts[r + (R) + 2]   = ts[r + (R) - 1]; \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 
 	for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_256)
 #endif
 		{
 #define R256_8_rounds(R)                  \
+do { \
 		R256(0, 1, 2, 3, R_256_0, 8 * (R) + 1);  \
 		R256(0, 3, 2, 1, R_256_1, 8 * (R) + 2);  \
 		R256(0, 1, 2, 3, R_256_2, 8 * (R) + 3);  \
@@ -151,7 +162,8 @@ void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
 		R256(0, 3, 2, 1, R_256_5, 8 * (R) + 6);  \
 		R256(0, 1, 2, 3, R_256_6, 8 * (R) + 7);  \
 		R256(0, 3, 2, 1, R_256_7, 8 * (R) + 8);  \
-		I256(2 * (R) + 1);
+		I256(2 * (R) + 1); \
+} while (0)
 
 		R256_8_rounds(0);
 
@@ -311,17 +323,22 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 				 Xptr);
 		/* run the rounds */
 #define Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+do { \
 	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0; \
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2; \
 	X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4; \
 	X##p6 += X##p7; X##p7 = RotL_64(X##p7, ROT##_3); X##p7 ^= X##p6; \
+} while (0)
 
 #if SKEIN_UNROLL_512 == 0
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) /* unrolled */ \
+do { \
 	Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rNum, Xptr); \
+} while (0)
 
 #define I512(R) \
+do { \
 	/* inject the key schedule value */ \
 	X0   += ks[((R) + 1) % 9]; \
 	X1   += ks[((R) + 2) % 9]; \
@@ -331,13 +348,17 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 	X5   += ks[((R) + 6) % 9] + ts[((R) + 1) % 3]; \
 	X6   += ks[((R) + 7) % 9] + ts[((R) + 2) % 3]; \
 	X7   += ks[((R) + 8) % 9] +     (R) + 1;       \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 #else /* looping version */
 #define R512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
+do { \
 	Round512(p0, p1, p2, p3, p4, p5, p6, p7, ROT, rNum) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rNum, Xptr); \
+} while (0)
 
 #define I512(R) \
+do { \
 	/* inject the key schedule value */ \
 	X0   += ks[r + (R) + 0]; \
 	X1   += ks[r + (R) + 1]; \
@@ -350,12 +371,14 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 	/* rotate key schedule */ \
 	ks[r +         (R) + 8] = ks[r + (R) - 1]; \
 	ts[r +         (R) + 2] = ts[r + (R) - 1]; \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 
 		for (r = 1; r < 2 * RCNT; r += 2 * SKEIN_UNROLL_512)
 #endif /* end of looped code definitions */
 		{
 #define R512_8_rounds(R)  /* do 8 full rounds */  \
+do { \
 		R512(0, 1, 2, 3, 4, 5, 6, 7, R_512_0, 8 * (R) + 1);   \
 		R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_1, 8 * (R) + 2);   \
 		R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_2, 8 * (R) + 3);   \
@@ -365,7 +388,8 @@ void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
 		R512(2, 1, 4, 7, 6, 5, 0, 3, R_512_5, 8 * (R) + 6);   \
 		R512(4, 1, 6, 3, 0, 5, 2, 7, R_512_6, 8 * (R) + 7);   \
 		R512(6, 1, 0, 7, 2, 5, 4, 3, R_512_7, 8 * (R) + 8);   \
-		I512(2 * (R) + 1);        /* and key injection */
+		I512(2 * (R) + 1);        /* and key injection */ \
+} while (0)
 
 			R512_8_rounds(0);
 
@@ -551,6 +575,7 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 
 #define Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
 			pF, ROT, rNum) \
+do { \
 	X##p0 += X##p1; X##p1 = RotL_64(X##p1, ROT##_0); X##p1 ^= X##p0;   \
 	X##p2 += X##p3; X##p3 = RotL_64(X##p3, ROT##_1); X##p3 ^= X##p2;   \
 	X##p4 += X##p5; X##p5 = RotL_64(X##p5, ROT##_2); X##p5 ^= X##p4;   \
@@ -559,15 +584,19 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 	X##pA += X##pB; X##pB = RotL_64(X##pB, ROT##_5); X##pB ^= X##pA;   \
 	X##pC += X##pD; X##pD = RotL_64(X##pD, ROT##_6); X##pD ^= X##pC;   \
 	X##pE += X##pF; X##pF = RotL_64(X##pF, ROT##_7); X##pF ^= X##pE;   \
+} while (0)
 
 #if SKEIN_UNROLL_1024 == 0
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, \
 		ROT, rn) \
+do { \
 	Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
 			pF, ROT, rn) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, rn, Xptr); \
+} while (0)
 
 #define I1024(R) \
+do { \
 	/* inject the key schedule value */ \
 	X00   += ks[((R) +  1) % 17]; \
 	X01   += ks[((R) +  2) % 17]; \
@@ -585,15 +614,19 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 	X13   += ks[((R) + 14) % 17] + ts[((R) + 1) % 3]; \
 	X14   += ks[((R) + 15) % 17] + ts[((R) + 2) % 3]; \
 	X15   += ks[((R) + 16) % 17] +     (R) + 1;       \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 #else /* looping version */
 #define R1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, pF, \
 		ROT, rn) \
+do { \
 	Round1024(p0, p1, p2, p3, p4, p5, p6, p7, p8, p9, pA, pB, pC, pD, pE, \
 			pF, ROT, rn) \
-	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr);
+	Skein_Show_R_Ptr(BLK_BITS, &ctx->h, 4 * (r - 1) + rn, Xptr); \
+} while (0)
 
 #define I1024(R) \
+do { \
 	/* inject the key schedule value */ \
 	X00   += ks[r + (R) +  0]; \
 	X01   += ks[r + (R) +  1]; \
@@ -614,12 +647,14 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 	/* rotate key schedule */ \
 	ks[r  +         (R) + 16] = ks[r + (R) - 1]; \
 	ts[r  +         (R) +  2] = ts[r + (R) - 1]; \
-	Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr);
+	Skein_Show_R_Ptr(BLK_BITSi, &ctx->h, SKEIN_RND_KEY_INJECT, Xptr); \
+} while (0)
 
 		for (r = 1; r <= 2 * RCNT; r += 2 * SKEIN_UNROLL_1024)
 #endif
 		{
 #define R1024_8_rounds(R) \
+do { \
 	R1024(00, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, \
 		R1024_0, 8*(R) + 1); \
 	R1024(00, 09, 02, 13, 06, 11, 04, 15, 10, 07, 12, 03, 14, 05, 08, 01, \
@@ -637,7 +672,8 @@ void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
 		R1024_6, 8*(R) + 7); \
 	R1024(00, 15, 02, 11, 06, 13, 04, 09, 14, 01, 08, 05, 10, 03, 12, 07, \
 		R1024_7, 8*(R) + 8); \
-	I1024(2*(R)+1);
+	I1024(2*(R)+1); \
+} while (0)
 
 			R1024_8_rounds(0);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 18/21] staging: crypto: skein: remove externs from .c files
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (16 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 17/21] staging: crypto: skein: wrap multi-line macros in do-while loops Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 19/21] staging: crypto: skein: remove braces from single-statement block Jason Cooper
                     ` (2 subsequent siblings)
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/include/skein_block.h | 22 ++++++++++++++++++++++
 drivers/staging/skein/skein.c               | 10 +---------
 2 files changed, 23 insertions(+), 9 deletions(-)
 create mode 100644 drivers/staging/skein/include/skein_block.h

diff --git a/drivers/staging/skein/include/skein_block.h b/drivers/staging/skein/include/skein_block.h
new file mode 100644
index 000000000000..b15c079b5bd4
--- /dev/null
+++ b/drivers/staging/skein/include/skein_block.h
@@ -0,0 +1,22 @@
+/***********************************************************************
+**
+** Implementation of the Skein hash function.
+**
+** Source code author: Doug Whiting, 2008.
+**
+** This algorithm and source code is released to the public domain.
+**
+************************************************************************/
+#ifndef _SKEIN_BLOCK_H_
+#define _SKEIN_BLOCK_H_
+
+#include <skein.h> /* get the Skein API definitions   */
+
+void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
+				size_t blkCnt, size_t byteCntAdd);
+
+#endif
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index 0d8c70c02c6f..096b86bf9430 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -13,15 +13,7 @@
 #include <linux/string.h>       /* get the memcpy/memset functions */
 #include <skein.h> /* get the Skein API definitions   */
 #include <skein_iv.h>    /* get precomputed IVs */
-
-/*****************************************************************/
-/* External function to process blkCnt (nonzero) full block(s) of data. */
-void Skein_256_Process_Block(struct skein_256_ctx *ctx, const u8 *blkPtr,
-				size_t blkCnt, size_t byteCntAdd);
-void Skein_512_Process_Block(struct skein_512_ctx *ctx, const u8 *blkPtr,
-				size_t blkCnt, size_t byteCntAdd);
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
-				size_t blkCnt, size_t byteCntAdd);
+#include <skein_block.h>
 
 /*****************************************************************/
 /*     256-bit Skein                                             */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 19/21] staging: crypto: skein: remove braces from single-statement block
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (17 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 18/21] staging: crypto: skein: remove externs from .c files Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 20/21] staging: crypto: skein: remove unnecessary line continuation Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 21/21] staging: crypto: skein: add TODO file Jason Cooper
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/skeinApi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index f0015d5b10f5..dd109bf6f7b9 100644
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -188,9 +188,9 @@ int skeinUpdateBits(struct skein_ctx *ctx, const u8 *msg,
 			msgBitCnt == 0, SKEIN_FAIL);
 
 	/* if number of bits is a multiple of bytes - that's easy */
-	if ((msgBitCnt & 0x7) == 0) {
+	if ((msgBitCnt & 0x7) == 0)
 		return skeinUpdate(ctx, msg, msgBitCnt >> 3);
-	}
+
 	skeinUpdate(ctx, msg, (msgBitCnt >> 3) + 1);
 
 	/*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 20/21] staging: crypto: skein: remove unnecessary line continuation
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (18 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 19/21] staging: crypto: skein: remove braces from single-statement block Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  2014-03-24  1:49   ` [PATCH V2 21/21] staging: crypto: skein: add TODO file Jason Cooper
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/skein_block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 707a21ae53c6..fd96ca0ad0ed 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -477,7 +477,7 @@ unsigned int Skein_512_Unroll_Cnt(void)
 
 /*****************************  Skein1024 ******************************/
 #if !(SKEIN_USE_ASM & 1024)
-void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr, \
+void Skein1024_Process_Block(struct skein1024_ctx *ctx, const u8 *blkPtr,
 				size_t blkCnt, size_t byteCntAdd)
 { /* do it in C, always looping (unrolled is bigger AND slower!) */
 	enum {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH V2 21/21] staging: crypto: skein: add TODO file
  2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
                     ` (19 preceding siblings ...)
  2014-03-24  1:49   ` [PATCH V2 20/21] staging: crypto: skein: remove unnecessary line continuation Jason Cooper
@ 2014-03-24  1:49   ` Jason Cooper
  20 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  1:49 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
 drivers/staging/skein/TODO | 11 +++++++++++
 1 file changed, 11 insertions(+)
 create mode 100644 drivers/staging/skein/TODO

diff --git a/drivers/staging/skein/TODO b/drivers/staging/skein/TODO
new file mode 100644
index 000000000000..f5c167a305ae
--- /dev/null
+++ b/drivers/staging/skein/TODO
@@ -0,0 +1,11 @@
+skein/threefish TODO
+
+ - rename camelcase vars
+ - rename camelcase functions
+ - rename files
+ - move macros into appropriate header files
+ - add / pass test vectors
+ - module support
+
+Please send patches to Jason Cooper <jason@lakedaemon.net> in addition to the
+staging tree mailinglist.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC PATCH 03/22] staging: crypto: skein: allow building statically
  2014-03-18 14:28       ` Greg KH
@ 2014-03-24  2:22         ` Jason Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  2:22 UTC (permalink / raw)
  To: Greg KH; +Cc: devel, David S. Miller, linux-crypto, Herbert Xu

On Tue, Mar 18, 2014 at 02:28:20PM +0000, Greg KH wrote:
> On Tue, Mar 18, 2014 at 08:58:49AM -0400, Jason Cooper wrote:
> > On Mon, Mar 17, 2014 at 02:52:52PM -0700, Greg KH wrote:
> > > On Tue, Mar 11, 2014 at 09:32:35PM +0000, Jason Cooper wrote:
> > > > These are the minimum changes required to get the code to build
> > > > statically in the kernel.  It's necessary to do this first so that we
> > > > can empirically determine that future cleanup patches aren't changing
> > > > the generated object code.
> > > > 
> > > > Signed-off-by: Jason Cooper <jason@lakedaemon.net>
> > > 
> > > This doesn't apply to my latest tree :(
> > 
> > Ah, ok.  I'll rebase this series on the staging tree.

Done, submitted.

> > > > --- a/drivers/staging/Makefile
> > > > +++ b/drivers/staging/Makefile
> > > > @@ -65,3 +65,4 @@ obj-$(CONFIG_XILLYBUS)		+= xillybus/
> > > >  obj-$(CONFIG_DGNC)			+= dgnc/
> > > >  obj-$(CONFIG_DGAP)			+= dgap/
> > > >  obj-$(CONFIG_MTD_SPINAND_MT29F)	+= mt29f_spinand/
> > > > +obj-$(CONFIG_CRYPTO_SKEIN) += skein/
> > > 
> > > Care to align these up with the way this file is formatted?
> > 
> > Of course, not sure what happened there (well, other than the obvious
> > :-P)

dabbit.  I thought I remembered everything.  I'll do a V3 of this patch.

> > > And I have no objection to taking the drivers/staging/ patches, the
> > > script looks useful, but I can't take it through the staging tree,
> > > sorry.
> > 
> > Ok, I'll pull that out as a separate branch.

s/branch/patch/.  Done.

thx,

Jason.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH V3 02/21] staging: crypto: skein: allow building statically
  2014-03-24  1:48   ` [PATCH V2 02/21] staging: crypto: skein: allow building statically Jason Cooper
@ 2014-03-24  2:32     ` Jason Cooper
  0 siblings, 0 replies; 51+ messages in thread
From: Jason Cooper @ 2014-03-24  2:32 UTC (permalink / raw)
  To: Greg KH, Herbert Xu, David S. Miller; +Cc: devel, linux-crypto, Jason Cooper

These are the minimum changes required to get the code to build
statically in the kernel.  It's necessary to do this first so that we
can empirically determine that future cleanup patches aren't changing
the generated object code.

Signed-off-by: Jason Cooper <jason@lakedaemon.net>
---
Changes since V2:

 - fixed whitespace in staging/Makefile (gregkh)

Changes since RFC:

 - rebased onto staging-next caused conflicts in Kconfig and Makefile, fixed.


 drivers/staging/Kconfig                      |  2 +
 drivers/staging/Makefile                     |  1 +
 drivers/staging/skein/CMakeLists.txt         | 27 -------------
 drivers/staging/skein/Kconfig                | 32 ++++++++++++++++
 drivers/staging/skein/Makefile               | 13 +++++++
 drivers/staging/skein/include/brg_types.h    | 57 ----------------------------
 drivers/staging/skein/include/skein.h        | 10 -----
 drivers/staging/skein/include/skeinApi.h     |  2 +-
 drivers/staging/skein/include/skein_port.h   | 16 +-------
 drivers/staging/skein/include/threefishApi.h |  2 +-
 drivers/staging/skein/skein.c                |  2 +-
 drivers/staging/skein/skeinApi.c             |  4 +-
 drivers/staging/skein/skeinBlockNo3F.c       |  2 +-
 drivers/staging/skein/skein_block.c          |  2 +-
 drivers/staging/skein/threefish1024Block.c   |  3 +-
 drivers/staging/skein/threefish256Block.c    |  3 +-
 drivers/staging/skein/threefish512Block.c    |  3 +-
 drivers/staging/skein/threefishApi.c         |  3 +-
 18 files changed, 59 insertions(+), 125 deletions(-)
 delete mode 100755 drivers/staging/skein/CMakeLists.txt
 create mode 100644 drivers/staging/skein/Kconfig
 create mode 100644 drivers/staging/skein/Makefile

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 47cf17543008..b78f669b7ed8 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -144,6 +144,8 @@ source "drivers/staging/gs_fpgaboot/Kconfig"
 
 source "drivers/staging/nokia_h4p/Kconfig"
 
+source "drivers/staging/skein/Kconfig"
+
 source "drivers/staging/unisys/Kconfig"
 
 endif # STAGING
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index d12f6189db46..fc05783bb7da 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -64,4 +64,5 @@ obj-$(CONFIG_DGAP)			+= dgap/
 obj-$(CONFIG_MTD_SPINAND_MT29F)	+= mt29f_spinand/
 obj-$(CONFIG_GS_FPGABOOT)	+= gs_fpgaboot/
 obj-$(CONFIG_BT_NOKIA_H4P)	+= nokia_h4p/
+obj-$(CONFIG_CRYPTO_SKEIN)	+= skein/
 obj-$(CONFIG_UNISYSSPAR)	+= unisys/
diff --git a/drivers/staging/skein/CMakeLists.txt b/drivers/staging/skein/CMakeLists.txt
deleted file mode 100755
index 604aaa394cb1..000000000000
--- a/drivers/staging/skein/CMakeLists.txt
+++ /dev/null
@@ -1,27 +0,0 @@
-cmake_minimum_required (VERSION 2.6)
-
-include_directories (${CMAKE_CURRENT_SOURCE_DIR}/include)
-
-# set(skeinBlock_src skein_block.c)
-set(skeinBlock_src skeinBlockNo3F.c)
-
-set(skein_src 
-    ${skeinBlock_src}
-    skein.c
-    skeinApi.c
-    )
-
-set(threefish_src
-    threefishApi.c
-    threefish256Block.c
-    threefish512Block.c
-    threefish1024Block.c
-    )
-set(s3f_src ${skein_src} ${threefish_src})
-
-add_library(skein3fish SHARED ${s3f_src})
-set_target_properties(skein3fish PROPERTIES VERSION ${VERSION} SOVERSION ${SOVERSION})
-target_link_libraries(skein3fish ${LIBS})
-
-install(TARGETS skein3fish DESTINATION ${LIBDIRNAME})
-
diff --git a/drivers/staging/skein/Kconfig b/drivers/staging/skein/Kconfig
new file mode 100644
index 000000000000..8f5a72a90ced
--- /dev/null
+++ b/drivers/staging/skein/Kconfig
@@ -0,0 +1,32 @@
+config CRYPTO_SKEIN
+	bool "Skein digest algorithm"
+	depends on (X86 || UML_X86) && 64BIT
+	select CRYPTO_THREEFISH
+	select CRYPTO_HASH
+	help
+	  Skein secure hash algorithm is one of 5 finalists from the NIST SHA3
+	  competition.
+
+	  Skein is optimized for modern, 64bit processors and is highly
+	  customizable.  See:
+
+	  http://www.skein-hash.info/sites/default/files/skein1.3.pdf
+
+	  for more information.  This module depends on the threefish block
+	  cipher module.
+
+config CRYPTO_THREEFISH
+	bool "Threefish tweakable block cipher"
+	depends on (X86 || UML_X86) && 64BIT
+	select CRYPTO_ALGAPI
+	help
+	  Threefish cipher algorithm is the tweakable block cipher underneath
+	  the Skein family of secure hash algorithms.  Skein is one of 5
+	  finalists from the NIST SHA3 competition.
+
+	  Skein is optimized for modern, 64bit processors and is highly
+	  customizable.  See:
+
+	  http://www.skein-hash.info/sites/default/files/skein1.3.pdf
+
+	  for more information.
diff --git a/drivers/staging/skein/Makefile b/drivers/staging/skein/Makefile
new file mode 100644
index 000000000000..2bb386e1e58c
--- /dev/null
+++ b/drivers/staging/skein/Makefile
@@ -0,0 +1,13 @@
+#
+# Makefile for the skein secure hash algorithm
+#
+subdir-ccflags-y := -I$(src)/include/
+
+obj-$(CONFIG_CRYPTO_SKEIN) +=   skein.o \
+				skeinApi.o \
+				skein_block.o
+
+obj-$(CONFIG_CRYPTO_THREEFISH) += threefish1024Block.o \
+				  threefish256Block.o \
+				  threefish512Block.o \
+				  threefishApi.o
diff --git a/drivers/staging/skein/include/brg_types.h b/drivers/staging/skein/include/brg_types.h
index 6db737d71b9e..3d9fe0df5238 100644
--- a/drivers/staging/skein/include/brg_types.h
+++ b/drivers/staging/skein/include/brg_types.h
@@ -46,83 +46,26 @@
 extern "C" {
 #endif
 
-#include <limits.h>
-
 #ifndef BRG_UI8
 #  define BRG_UI8
-#  if UCHAR_MAX == 255u
      typedef unsigned char uint_8t;
-#  else
-#    error Please define uint_8t as an 8-bit unsigned integer type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI16
 #  define BRG_UI16
-#  if USHRT_MAX == 65535u
      typedef unsigned short uint_16t;
-#  else
-#    error Please define uint_16t as a 16-bit unsigned short type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI32
 #  define BRG_UI32
-#  if UINT_MAX == 4294967295u
 #    define li_32(h) 0x##h##u
      typedef unsigned int uint_32t;
-#  elif ULONG_MAX == 4294967295u
-#    define li_32(h) 0x##h##ul
-     typedef unsigned long uint_32t;
-#  elif defined( _CRAY )
-#    error This code needs 32-bit data types, which Cray machines do not provide
-#  else
-#    error Please define uint_32t as a 32-bit unsigned integer type in brg_types.h
-#  endif
 #endif
 
 #ifndef BRG_UI64
-#  if defined( __BORLANDC__ ) && !defined( __MSDOS__ )
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ui64
-     typedef unsigned __int64 uint_64t;
-#  elif defined( _MSC_VER ) && ( _MSC_VER < 1300 )    /* 1300 == VC++ 7.0 */
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ui64
-     typedef unsigned __int64 uint_64t;
-#  elif defined( __sun ) && defined(ULONG_MAX) && ULONG_MAX == 0xfffffffful
-#    define BRG_UI64
-#    define li_64(h) 0x##h##ull
-     typedef unsigned long long uint_64t;
-#  elif defined( UINT_MAX ) && UINT_MAX > 4294967295u
-#    if UINT_MAX == 18446744073709551615u
-#      define BRG_UI64
-#      define li_64(h) 0x##h##u
-       typedef unsigned int uint_64t;
-#    endif
-#  elif defined( ULONG_MAX ) && ULONG_MAX > 4294967295u
-#    if ULONG_MAX == 18446744073709551615ul
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ul
-       typedef unsigned long uint_64t;
-#    endif
-#  elif defined( ULLONG_MAX ) && ULLONG_MAX > 4294967295u
-#    if ULLONG_MAX == 18446744073709551615ull
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ull
-       typedef unsigned long long uint_64t;
-#    endif
-#  elif defined( ULONG_LONG_MAX ) && ULONG_LONG_MAX > 4294967295u
-#    if ULONG_LONG_MAX == 18446744073709551615ull
 #      define BRG_UI64
 #      define li_64(h) 0x##h##ull
        typedef unsigned long long uint_64t;
-#    endif
-#  elif defined(__GNUC__)  /* DLW: avoid mingw problem with -ansi */
-#      define BRG_UI64
-#      define li_64(h) 0x##h##ull
-       typedef unsigned long long uint_64t;
-#  endif
 #endif
 
 #if defined( NEED_UINT_64T ) && !defined( BRG_UI64 )
diff --git a/drivers/staging/skein/include/skein.h b/drivers/staging/skein/include/skein.h
index cb613fa09d9e..315cdcd14413 100644
--- a/drivers/staging/skein/include/skein.h
+++ b/drivers/staging/skein/include/skein.h
@@ -261,18 +261,8 @@ int  Skein1024_Output   (Skein1024_Ctxt_t *ctx, u08b_t * hashVal);
 #define Skein_Show_Key(bits,ctx,key,keyBytes)
 #endif
 
-#ifndef SKEIN_ERR_CHECK        /* run-time checks (e.g., bad params, uninitialized context)? */
 #define Skein_Assert(x,retCode)/* default: ignore all Asserts, for performance */
 #define Skein_assert(x)
-#elif   defined(SKEIN_ASSERT)
-#include <assert.h>     
-#define Skein_Assert(x,retCode) assert(x) 
-#define Skein_assert(x)         assert(x) 
-#else
-#include <assert.h>     
-#define Skein_Assert(x,retCode) { if (!(x)) return retCode; } /*  caller  error */
-#define Skein_assert(x)         assert(x)                     /* internal error */
-#endif
 
 /*****************************************************************
 ** Skein block function constants (shared across Ref and Opt code)
diff --git a/drivers/staging/skein/include/skeinApi.h b/drivers/staging/skein/include/skeinApi.h
index 19c3225460fc..734d27b79f01 100755
--- a/drivers/staging/skein/include/skeinApi.h
+++ b/drivers/staging/skein/include/skeinApi.h
@@ -78,8 +78,8 @@ OTHER DEALINGS IN THE SOFTWARE.
  * 
  */
 
+#include <linux/types.h>
 #include <skein.h>
-#include <stdint.h>
 
 #ifdef __cplusplus
 extern "C"
diff --git a/drivers/staging/skein/include/skein_port.h b/drivers/staging/skein/include/skein_port.h
index 18d892553c8d..1c68070358ce 100644
--- a/drivers/staging/skein/include/skein_port.h
+++ b/drivers/staging/skein/include/skein_port.h
@@ -44,24 +44,10 @@ typedef uint_64t        u64b_t;             /* 64-bit unsigned integer */
  * platform-specific code instead (e.g., for big-endian CPUs).
  *
  */
-#ifndef SKEIN_NEED_SWAP /* compile-time "override" for endianness? */
-
-#include <brg_endian.h>              /* get endianness selection */
-#if   PLATFORM_BYTE_ORDER == IS_BIG_ENDIAN
-    /* here for big-endian CPUs */
-#define SKEIN_NEED_SWAP   (1)
-#elif PLATFORM_BYTE_ORDER == IS_LITTLE_ENDIAN
-    /* here for x86 and x86-64 CPUs (and other detected little-endian CPUs) */
 #define SKEIN_NEED_SWAP   (0)
-#if   PLATFORM_MUST_ALIGN == 0              /* ok to use "fast" versions? */
+/* below two prototype assume we are handed aligned data */
 #define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
 #define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
-#endif
-#else
-#error "Skein needs endianness setting!"
-#endif
-
-#endif /* ifndef SKEIN_NEED_SWAP */
 
 /*
  ******************************************************************
diff --git a/drivers/staging/skein/include/threefishApi.h b/drivers/staging/skein/include/threefishApi.h
index 85afd72fe987..dae270cf71d3 100644
--- a/drivers/staging/skein/include/threefishApi.h
+++ b/drivers/staging/skein/include/threefishApi.h
@@ -28,8 +28,8 @@
 @endcode
  */
 
+#include <linux/types.h>
 #include <skein.h>
-#include <stdint.h>
 
 #define KeyScheduleConst 0x1BD11BDAA9FC1A22L
 
diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index f0b176ac1dc7..3fae6fdf7c75 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -10,7 +10,7 @@
 
 #define  SKEIN_PORT_CODE /* instantiate any code in skein_port.h */
 
-#include <string.h>       /* get the memcpy/memset functions */
+#include <linux/string.h>       /* get the memcpy/memset functions */
 #include <skein.h> /* get the Skein API definitions   */
 #include <skein_iv.h>    /* get precomputed IVs */
 
diff --git a/drivers/staging/skein/skeinApi.c b/drivers/staging/skein/skeinApi.c
index 7b963758d32c..579b92efbf65 100755
--- a/drivers/staging/skein/skeinApi.c
+++ b/drivers/staging/skein/skeinApi.c
@@ -24,10 +24,8 @@ OTHER DEALINGS IN THE SOFTWARE.
 
 */
 
-#define SKEIN_ERR_CHECK 1
+#include <linux/string.h>
 #include <skeinApi.h>
-#include <string.h>
-#include <stdio.h>
 
 int skeinCtxPrepare(SkeinCtx_t* ctx, SkeinSize_t size)
 {
diff --git a/drivers/staging/skein/skeinBlockNo3F.c b/drivers/staging/skein/skeinBlockNo3F.c
index 4ad6c50360e7..6a19ceb17d0f 100644
--- a/drivers/staging/skein/skeinBlockNo3F.c
+++ b/drivers/staging/skein/skeinBlockNo3F.c
@@ -1,5 +1,5 @@
 
-#include <string.h>
+#include <linux/string.h>
 #include <skein.h>
 #include <threefishApi.h>
 
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 86724a2443b5..b5be41af6d17 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -14,7 +14,7 @@
 **
 ************************************************************************/
 
-#include <string.h>
+#include <linux/string.h>
 #include <skein.h>
 
 #ifndef SKEIN_USE_ASM
diff --git a/drivers/staging/skein/threefish1024Block.c b/drivers/staging/skein/threefish1024Block.c
index 8b43586f46bc..58a8c26a1f6f 100644
--- a/drivers/staging/skein/threefish1024Block.c
+++ b/drivers/staging/skein/threefish1024Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt1024(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefish256Block.c b/drivers/staging/skein/threefish256Block.c
index db2b81978c91..a7e06f905186 100644
--- a/drivers/staging/skein/threefish256Block.c
+++ b/drivers/staging/skein/threefish256Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt256(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefish512Block.c b/drivers/staging/skein/threefish512Block.c
index 4fe708fea066..3cbfcd9af5c9 100644
--- a/drivers/staging/skein/threefish512Block.c
+++ b/drivers/staging/skein/threefish512Block.c
@@ -1,6 +1,5 @@
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdint.h>
-#include <string.h>
 
 
 void threefishEncrypt512(ThreefishKey_t* keyCtx, uint64_t* input, uint64_t* output)
diff --git a/drivers/staging/skein/threefishApi.c b/drivers/staging/skein/threefishApi.c
index 5afa0338aef4..968d3d21fe61 100644
--- a/drivers/staging/skein/threefishApi.c
+++ b/drivers/staging/skein/threefishApi.c
@@ -1,8 +1,7 @@
 
 
+#include <linux/string.h>
 #include <threefishApi.h>
-#include <stdlib.h>
-#include <string.h>
 
 void threefishSetKey(ThreefishKey_t* keyCtx, ThreefishSize_t stateSize,
                      uint64_t* keyData, uint64_t* tweak)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-03-24  2:32 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-11 21:32 [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 01/22] scripts: objdiff: detect object code changes between two commits Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 02/22] staging: crypto: skein: import code from Skein3Fish.git Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 03/22] staging: crypto: skein: allow building statically Jason Cooper
2014-03-17 21:52   ` Greg KH
2014-03-18 12:58     ` Jason Cooper
2014-03-18 14:28       ` Greg KH
2014-03-24  2:22         ` Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 04/22] staging: crypto: skein: remove brg_*.h includes Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 05/22] staging: crypto: skein: remove skein_port.h Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 06/22] staging: crypto: skein: remove __cplusplus and an unneeded stddef.h Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 07/22] staging: crypto: skein: remove unneeded typedefs Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 08/22] staging: crypto: skein: remove all typedef {struct,enum} Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 09/22] staging: crypto: skein: use u8, u64 vice uint*_t Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 10/22] staging: crypto: skein: fixup pointer whitespace Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 11/22] staging: crypto: skein: cleanup whitespace around operators/punc Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 12/22] staging: crypto: skein: dos2unix, remove executable perms Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 13/22] staging: crypto: skein: fix leading whitespace Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 14/22] staging: crypto: skein: remove trailing whitespace Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 15/22] staging: crypto: skein: cleanup >80 character lines Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 16/22] staging: crypto: skein: fix do/while brace formatting Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 17/22] staging: crypto: skein: fix brace placement errors Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 18/22] staging: crypto: skein: wrap multi-line macros in do-while loops Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 19/22] staging: crypto: skein: remove externs from .c files Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 20/22] staging: crypto: skein: remove braces from single-statement block Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 21/22] staging: crypto: skein: remove unnecessary line continuation Jason Cooper
2014-03-11 21:32 ` [RFC PATCH 22/22] staging: crypto: skein: add TODO file Jason Cooper
2014-03-12 16:55 ` [RFC PATCH 00/22] staging: add skein/threefish crypto algos Jason Cooper
2014-03-24  1:48 ` [PATCH V2 00/21] " Jason Cooper
2014-03-24  1:48   ` [PATCH V2 01/21] staging: crypto: skein: import code from Skein3Fish.git Jason Cooper
2014-03-24  1:48   ` [PATCH V2 02/21] staging: crypto: skein: allow building statically Jason Cooper
2014-03-24  2:32     ` [PATCH V3 " Jason Cooper
2014-03-24  1:49   ` [PATCH V2 03/21] staging: crypto: skein: remove brg_*.h includes Jason Cooper
2014-03-24  1:49   ` [PATCH V2 04/21] staging: crypto: skein: remove skein_port.h Jason Cooper
2014-03-24  1:49   ` [PATCH V2 05/21] staging: crypto: skein: remove __cplusplus and an unneeded stddef.h Jason Cooper
2014-03-24  1:49   ` [PATCH V2 06/21] staging: crypto: skein: remove unneeded typedefs Jason Cooper
2014-03-24  1:49   ` [PATCH V2 07/21] staging: crypto: skein: remove all typedef {struct,enum} Jason Cooper
2014-03-24  1:49   ` [PATCH V2 08/21] staging: crypto: skein: use u8, u64 vice uint*_t Jason Cooper
2014-03-24  1:49   ` [PATCH V2 09/21] staging: crypto: skein: fixup pointer whitespace Jason Cooper
2014-03-24  1:49   ` [PATCH V2 10/21] staging: crypto: skein: cleanup whitespace around operators/punc Jason Cooper
2014-03-24  1:49   ` [PATCH V2 11/21] staging: crypto: skein: dos2unix, remove executable perms Jason Cooper
2014-03-24  1:49   ` [PATCH V2 12/21] staging: crypto: skein: fix leading whitespace Jason Cooper
2014-03-24  1:49   ` [PATCH V2 13/21] staging: crypto: skein: remove trailing whitespace Jason Cooper
2014-03-24  1:49   ` [PATCH V2 14/21] staging: crypto: skein: cleanup >80 character lines Jason Cooper
2014-03-24  1:49   ` [PATCH V2 15/21] staging: crypto: skein: fix do/while brace formatting Jason Cooper
2014-03-24  1:49   ` [PATCH V2 16/21] staging: crypto: skein: fix brace placement errors Jason Cooper
2014-03-24  1:49   ` [PATCH V2 17/21] staging: crypto: skein: wrap multi-line macros in do-while loops Jason Cooper
2014-03-24  1:49   ` [PATCH V2 18/21] staging: crypto: skein: remove externs from .c files Jason Cooper
2014-03-24  1:49   ` [PATCH V2 19/21] staging: crypto: skein: remove braces from single-statement block Jason Cooper
2014-03-24  1:49   ` [PATCH V2 20/21] staging: crypto: skein: remove unnecessary line continuation Jason Cooper
2014-03-24  1:49   ` [PATCH V2 21/21] staging: crypto: skein: add TODO file Jason Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.