linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] staging/skein: more cleanup
@ 2014-05-20 13:56 Jake Edge
  2014-05-20 13:58 ` [PATCH 1/3] staging/skein: move all threefish block functions to one file Jake Edge
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Jake Edge @ 2014-05-20 13:56 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jason Cooper, devel, linux-kernel, Joe Perches, Dan Carpenter,
	Anton Saraev


Clean up a few more things in skein to get it closer to mainline
inclusion.  The first may be questionable (so I probably should have
put it last -- oh well, I can always respin), but it seemed like
putting all of the threefish block functions in one file, like the
skein block functions are all in one file, made sense.

Jake Edge (3):
  move all threefish block functions to one file, remove unneeded
    include
  fix some comment typos
  Rename a few more variables and structure member names to lower case.

 drivers/staging/skein/Makefile               |    4 +-
 drivers/staging/skein/skein.c                |  148 +-
 drivers/staging/skein/skein.h                |   34 +-
 drivers/staging/skein/skein_api.c            |   32 +-
 drivers/staging/skein/skein_api.h            |    2 +-
 drivers/staging/skein/skein_block.c          |  155 +-
 drivers/staging/skein/threefish_1024_block.c | 4902 ---------------
 drivers/staging/skein/threefish_256_block.c  | 1139 ----
 drivers/staging/skein/threefish_512_block.c  | 2225 -------
 drivers/staging/skein/threefish_api.h        |   18 +-
 drivers/staging/skein/threefish_block.c      | 8258 ++++++++++++++++++++++++++
 11 files changed, 8454 insertions(+), 8463 deletions(-)
 delete mode 100644 drivers/staging/skein/threefish_1024_block.c
 delete mode 100644 drivers/staging/skein/threefish_256_block.c
 delete mode 100644 drivers/staging/skein/threefish_512_block.c
 create mode 100644 drivers/staging/skein/threefish_block.c

-- 
1.9.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] staging/skein: move all threefish block functions to one file
  2014-05-20 13:56 [PATCH 0/3] staging/skein: more cleanup Jake Edge
@ 2014-05-20 13:58 ` Jake Edge
  2014-05-20 14:00 ` [PATCH 2/3] staging/skein: comment typos Jake Edge
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Jake Edge @ 2014-05-20 13:58 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jason Cooper, devel, linux-kernel, Joe Perches, Dan Carpenter,
	Anton Saraev

move all threefish block functions to one file, remove unneeded include

Signed-off-by: Jake Edge <jake@lwn.net>
---

against staging-next branch of staging tree

 drivers/staging/skein/Makefile               |    4 +-
 drivers/staging/skein/threefish_1024_block.c | 4902 ---------------
 drivers/staging/skein/threefish_256_block.c  | 1139 ----
 drivers/staging/skein/threefish_512_block.c  | 2225 -------
 drivers/staging/skein/threefish_block.c      | 8258 ++++++++++++++++++++++++++
 5 files changed, 8259 insertions(+), 8269 deletions(-)
 delete mode 100644 drivers/staging/skein/threefish_1024_block.c
 delete mode 100644 drivers/staging/skein/threefish_256_block.c
 delete mode 100644 drivers/staging/skein/threefish_512_block.c
 create mode 100644 drivers/staging/skein/threefish_block.c

diff --git a/drivers/staging/skein/Makefile b/drivers/staging/skein/Makefile
index 395454c..a14aadd 100644
--- a/drivers/staging/skein/Makefile
+++ b/drivers/staging/skein/Makefile
@@ -5,7 +5,5 @@ obj-$(CONFIG_CRYPTO_SKEIN) +=   skein.o \
 				skein_api.o \
 				skein_block.o
 
-obj-$(CONFIG_CRYPTO_THREEFISH) += threefish_1024_block.o \
-				  threefish_256_block.o \
-				  threefish_512_block.o \
+obj-$(CONFIG_CRYPTO_THREEFISH) += threefish_block.o \
 				  threefish_api.o
diff --git a/drivers/staging/skein/threefish_1024_block.c b/drivers/staging/skein/threefish_1024_block.c
deleted file mode 100644
index dac74e1..0000000
--- a/drivers/staging/skein/threefish_1024_block.c
+++ /dev/null
@@ -1,4902 +0,0 @@
-#include <linux/string.h>
-#include "threefish_api.h"
-
-
-void threefish_encrypt_1024(struct threefish_key *key_ctx, u64 *input,
-			    u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	    b2 = input[2], b3 = input[3],
-	    b4 = input[4], b5 = input[5],
-	    b6 = input[6], b7 = input[7],
-	    b8 = input[8], b9 = input[9],
-	    b10 = input[10], b11 = input[11],
-	    b12 = input[12], b13 = input[13],
-	    b14 = input[14], b15 = input[15];
-	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
-	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
-	    k4 = key_ctx->key[4], k5 = key_ctx->key[5],
-	    k6 = key_ctx->key[6], k7 = key_ctx->key[7],
-	    k8 = key_ctx->key[8], k9 = key_ctx->key[9],
-	    k10 = key_ctx->key[10], k11 = key_ctx->key[11],
-	    k12 = key_ctx->key[12], k13 = key_ctx->key[13],
-	    k14 = key_ctx->key[14], k15 = key_ctx->key[15],
-	    k16 = key_ctx->key[16];
-	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
-	    t2 = key_ctx->tweak[2];
-
-	b1 += k1;
-	b0 += b1 + k0;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k3;
-	b2 += b3 + k2;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k5;
-	b4 += b5 + k4;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k7;
-	b6 += b7 + k6;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k9;
-	b8 += b9 + k8;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k11;
-	b10 += b11 + k10;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k13 + t0;
-	b12 += b13 + k12;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k15;
-	b14 += b15 + k14 + t1;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k2;
-	b0 += b1 + k1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k4;
-	b2 += b3 + k3;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k6;
-	b4 += b5 + k5;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k8;
-	b6 += b7 + k7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k10;
-	b8 += b9 + k9;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k12;
-	b10 += b11 + k11;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k14 + t1;
-	b12 += b13 + k13;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k16 + 1;
-	b14 += b15 + k15 + t2;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k3;
-	b0 += b1 + k2;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k5;
-	b2 += b3 + k4;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k7;
-	b4 += b5 + k6;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k9;
-	b6 += b7 + k8;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k11;
-	b8 += b9 + k10;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k13;
-	b10 += b11 + k12;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k15 + t2;
-	b12 += b13 + k14;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k0 + 2;
-	b14 += b15 + k16 + t0;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k4;
-	b0 += b1 + k3;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k6;
-	b2 += b3 + k5;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k8;
-	b4 += b5 + k7;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k10;
-	b6 += b7 + k9;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k12;
-	b8 += b9 + k11;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k14;
-	b10 += b11 + k13;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k16 + t0;
-	b12 += b13 + k15;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k1 + 3;
-	b14 += b15 + k0 + t1;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k5;
-	b0 += b1 + k4;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k7;
-	b2 += b3 + k6;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k9;
-	b4 += b5 + k8;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k11;
-	b6 += b7 + k10;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k13;
-	b8 += b9 + k12;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k15;
-	b10 += b11 + k14;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k0 + t1;
-	b12 += b13 + k16;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k2 + 4;
-	b14 += b15 + k1 + t2;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k6;
-	b0 += b1 + k5;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k8;
-	b2 += b3 + k7;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k10;
-	b4 += b5 + k9;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k12;
-	b6 += b7 + k11;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k14;
-	b8 += b9 + k13;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k16;
-	b10 += b11 + k15;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k1 + t2;
-	b12 += b13 + k0;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k3 + 5;
-	b14 += b15 + k2 + t0;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k7;
-	b0 += b1 + k6;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k9;
-	b2 += b3 + k8;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k11;
-	b4 += b5 + k10;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k13;
-	b6 += b7 + k12;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k15;
-	b8 += b9 + k14;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k0;
-	b10 += b11 + k16;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k2 + t0;
-	b12 += b13 + k1;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k4 + 6;
-	b14 += b15 + k3 + t1;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k8;
-	b0 += b1 + k7;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k10;
-	b2 += b3 + k9;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k12;
-	b4 += b5 + k11;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k14;
-	b6 += b7 + k13;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k16;
-	b8 += b9 + k15;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k1;
-	b10 += b11 + k0;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k3 + t1;
-	b12 += b13 + k2;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k5 + 7;
-	b14 += b15 + k4 + t2;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k9;
-	b0 += b1 + k8;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k11;
-	b2 += b3 + k10;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k13;
-	b4 += b5 + k12;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k15;
-	b6 += b7 + k14;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k0;
-	b8 += b9 + k16;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k2;
-	b10 += b11 + k1;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k4 + t2;
-	b12 += b13 + k3;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k6 + 8;
-	b14 += b15 + k5 + t0;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k10;
-	b0 += b1 + k9;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k12;
-	b2 += b3 + k11;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k14;
-	b4 += b5 + k13;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k16;
-	b6 += b7 + k15;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k1;
-	b8 += b9 + k0;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k3;
-	b10 += b11 + k2;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k5 + t0;
-	b12 += b13 + k4;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k7 + 9;
-	b14 += b15 + k6 + t1;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k11;
-	b0 += b1 + k10;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k13;
-	b2 += b3 + k12;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k15;
-	b4 += b5 + k14;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k0;
-	b6 += b7 + k16;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k2;
-	b8 += b9 + k1;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k4;
-	b10 += b11 + k3;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k6 + t1;
-	b12 += b13 + k5;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k8 + 10;
-	b14 += b15 + k7 + t2;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k12;
-	b0 += b1 + k11;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k14;
-	b2 += b3 + k13;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k16;
-	b4 += b5 + k15;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k1;
-	b6 += b7 + k0;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k3;
-	b8 += b9 + k2;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k5;
-	b10 += b11 + k4;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k7 + t2;
-	b12 += b13 + k6;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k9 + 11;
-	b14 += b15 + k8 + t0;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k13;
-	b0 += b1 + k12;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k15;
-	b2 += b3 + k14;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k0;
-	b4 += b5 + k16;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k2;
-	b6 += b7 + k1;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k4;
-	b8 += b9 + k3;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k6;
-	b10 += b11 + k5;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k8 + t0;
-	b12 += b13 + k7;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k10 + 12;
-	b14 += b15 + k9 + t1;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k14;
-	b0 += b1 + k13;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k16;
-	b2 += b3 + k15;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k1;
-	b4 += b5 + k0;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k3;
-	b6 += b7 + k2;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k5;
-	b8 += b9 + k4;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k7;
-	b10 += b11 + k6;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k9 + t1;
-	b12 += b13 + k8;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k11 + 13;
-	b14 += b15 + k10 + t2;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k15;
-	b0 += b1 + k14;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k0;
-	b2 += b3 + k16;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k2;
-	b4 += b5 + k1;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k4;
-	b6 += b7 + k3;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k6;
-	b8 += b9 + k5;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k8;
-	b10 += b11 + k7;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k10 + t2;
-	b12 += b13 + k9;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k12 + 14;
-	b14 += b15 + k11 + t0;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k16;
-	b0 += b1 + k15;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k1;
-	b2 += b3 + k0;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k3;
-	b4 += b5 + k2;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k5;
-	b6 += b7 + k4;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k7;
-	b8 += b9 + k6;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k9;
-	b10 += b11 + k8;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k11 + t0;
-	b12 += b13 + k10;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k13 + 15;
-	b14 += b15 + k12 + t1;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k0;
-	b0 += b1 + k16;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k2;
-	b2 += b3 + k1;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k4;
-	b4 += b5 + k3;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k6;
-	b6 += b7 + k5;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k8;
-	b8 += b9 + k7;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k10;
-	b10 += b11 + k9;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k12 + t1;
-	b12 += b13 + k11;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k14 + 16;
-	b14 += b15 + k13 + t2;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k1;
-	b0 += b1 + k0;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k3;
-	b2 += b3 + k2;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k5;
-	b4 += b5 + k4;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k7;
-	b6 += b7 + k6;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k9;
-	b8 += b9 + k8;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k11;
-	b10 += b11 + k10;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k13 + t2;
-	b12 += b13 + k12;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k15 + 17;
-	b14 += b15 + k14 + t0;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	b1 += k2;
-	b0 += b1 + k1;
-	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
-
-	b3 += k4;
-	b2 += b3 + k3;
-	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
-
-	b5 += k6;
-	b4 += b5 + k5;
-	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
-
-	b7 += k8;
-	b6 += b7 + k7;
-	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
-
-	b9 += k10;
-	b8 += b9 + k9;
-	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
-
-	b11 += k12;
-	b10 += b11 + k11;
-	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
-
-	b13 += k14 + t0;
-	b12 += b13 + k13;
-	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
-
-	b15 += k16 + 18;
-	b14 += b15 + k15 + t1;
-	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
-
-	b1 += k3;
-	b0 += b1 + k2;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
-
-	b3 += k5;
-	b2 += b3 + k4;
-	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
-
-	b5 += k7;
-	b4 += b5 + k6;
-	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
-
-	b7 += k9;
-	b6 += b7 + k8;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
-
-	b9 += k11;
-	b8 += b9 + k10;
-	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
-
-	b11 += k13;
-	b10 += b11 + k12;
-	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
-
-	b13 += k15 + t1;
-	b12 += b13 + k14;
-	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
-
-	b15 += k0 + 19;
-	b14 += b15 + k16 + t2;
-	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
-
-	b0 += b9;
-	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
-
-	b2 += b13;
-	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
-
-	b6 += b11;
-	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
-
-	b4 += b15;
-	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
-
-	b10 += b7;
-	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
-
-	b12 += b3;
-	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
-
-	b14 += b5;
-	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
-
-	b8 += b1;
-	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
-
-	b0 += b7;
-	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
-
-	b6 += b1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
-
-	b12 += b15;
-	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
-
-	b14 += b13;
-	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
-
-	b8 += b11;
-	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
-
-	b10 += b9;
-	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
-
-	b0 += b15;
-	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
-
-	b2 += b11;
-	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
-
-	b6 += b13;
-	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
-
-	b4 += b9;
-	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
-
-	b14 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
-
-	b8 += b5;
-	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
-
-	b10 += b3;
-	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
-
-	b12 += b7;
-	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
-
-	output[0] = b0 + k3;
-	output[1] = b1 + k4;
-	output[2] = b2 + k5;
-	output[3] = b3 + k6;
-	output[4] = b4 + k7;
-	output[5] = b5 + k8;
-	output[6] = b6 + k9;
-	output[7] = b7 + k10;
-	output[8] = b8 + k11;
-	output[9] = b9 + k12;
-	output[10] = b10 + k13;
-	output[11] = b11 + k14;
-	output[12] = b12 + k15;
-	output[13] = b13 + k16 + t2;
-	output[14] = b14 + k0 + t0;
-	output[15] = b15 + k1 + 20;
-}
-
-void threefish_decrypt_1024(struct threefish_key *key_ctx, u64 *input,
-			    u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	    b2 = input[2], b3 = input[3],
-	    b4 = input[4], b5 = input[5],
-	    b6 = input[6], b7 = input[7],
-	    b8 = input[8], b9 = input[9],
-	    b10 = input[10], b11 = input[11],
-	    b12 = input[12], b13 = input[13],
-	    b14 = input[14], b15 = input[15];
-	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
-	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
-	    k4 = key_ctx->key[4], k5 = key_ctx->key[5],
-	    k6 = key_ctx->key[6], k7 = key_ctx->key[7],
-	    k8 = key_ctx->key[8], k9 = key_ctx->key[9],
-	    k10 = key_ctx->key[10], k11 = key_ctx->key[11],
-	    k12 = key_ctx->key[12], k13 = key_ctx->key[13],
-	    k14 = key_ctx->key[14], k15 = key_ctx->key[15],
-	    k16 = key_ctx->key[16];
-	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
-	    t2 = key_ctx->tweak[2];
-	u64 tmp;
-
-	b0 -= k3;
-	b1 -= k4;
-	b2 -= k5;
-	b3 -= k6;
-	b4 -= k7;
-	b5 -= k8;
-	b6 -= k9;
-	b7 -= k10;
-	b8 -= k11;
-	b9 -= k12;
-	b10 -= k13;
-	b11 -= k14;
-	b12 -= k15;
-	b13 -= k16 + t2;
-	b14 -= k0 + t0;
-	b15 -= k1 + 20;
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k16 + t2;
-	b15 -= k0 + 19;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k14;
-	b13 -= k15 + t1;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k12;
-	b11 -= k13;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k10;
-	b9 -= k11;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k8;
-	b7 -= k9;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k6;
-	b5 -= k7;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k4;
-	b3 -= k5;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k2;
-	b1 -= k3;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k15 + t1;
-	b15 -= k16 + 18;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k13;
-	b13 -= k14 + t0;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k11;
-	b11 -= k12;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k9;
-	b9 -= k10;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k7;
-	b7 -= k8;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k5;
-	b5 -= k6;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k3;
-	b3 -= k4;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k1;
-	b1 -= k2;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k14 + t0;
-	b15 -= k15 + 17;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k12;
-	b13 -= k13 + t2;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k10;
-	b11 -= k11;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k8;
-	b9 -= k9;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k6;
-	b7 -= k7;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k4;
-	b5 -= k5;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k2;
-	b3 -= k3;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k0;
-	b1 -= k1;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k13 + t2;
-	b15 -= k14 + 16;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k11;
-	b13 -= k12 + t1;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k9;
-	b11 -= k10;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k7;
-	b9 -= k8;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k5;
-	b7 -= k6;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k3;
-	b5 -= k4;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k1;
-	b3 -= k2;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k16;
-	b1 -= k0;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k12 + t1;
-	b15 -= k13 + 15;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k10;
-	b13 -= k11 + t0;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k8;
-	b11 -= k9;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k6;
-	b9 -= k7;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k4;
-	b7 -= k5;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k2;
-	b5 -= k3;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k0;
-	b3 -= k1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k15;
-	b1 -= k16;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k11 + t0;
-	b15 -= k12 + 14;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k9;
-	b13 -= k10 + t2;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k7;
-	b11 -= k8;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k5;
-	b9 -= k6;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k3;
-	b7 -= k4;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k1;
-	b5 -= k2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k16;
-	b3 -= k0;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k14;
-	b1 -= k15;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k10 + t2;
-	b15 -= k11 + 13;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k8;
-	b13 -= k9 + t1;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k6;
-	b11 -= k7;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k4;
-	b9 -= k5;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k2;
-	b7 -= k3;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k0;
-	b5 -= k1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k15;
-	b3 -= k16;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k13;
-	b1 -= k14;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k9 + t1;
-	b15 -= k10 + 12;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k7;
-	b13 -= k8 + t0;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k5;
-	b11 -= k6;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k3;
-	b9 -= k4;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k1;
-	b7 -= k2;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k16;
-	b5 -= k0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k14;
-	b3 -= k15;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k12;
-	b1 -= k13;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k8 + t0;
-	b15 -= k9 + 11;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k6;
-	b13 -= k7 + t2;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k4;
-	b11 -= k5;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k2;
-	b9 -= k3;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k0;
-	b7 -= k1;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k15;
-	b5 -= k16;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k13;
-	b3 -= k14;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k11;
-	b1 -= k12;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k7 + t2;
-	b15 -= k8 + 10;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k5;
-	b13 -= k6 + t1;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k3;
-	b11 -= k4;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k1;
-	b9 -= k2;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k16;
-	b7 -= k0;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k14;
-	b5 -= k15;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k12;
-	b3 -= k13;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k10;
-	b1 -= k11;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k6 + t1;
-	b15 -= k7 + 9;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k4;
-	b13 -= k5 + t0;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k2;
-	b11 -= k3;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k0;
-	b9 -= k1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k15;
-	b7 -= k16;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k13;
-	b5 -= k14;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k11;
-	b3 -= k12;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k9;
-	b1 -= k10;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k5 + t0;
-	b15 -= k6 + 8;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k3;
-	b13 -= k4 + t2;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k1;
-	b11 -= k2;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k16;
-	b9 -= k0;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k14;
-	b7 -= k15;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k12;
-	b5 -= k13;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k10;
-	b3 -= k11;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k8;
-	b1 -= k9;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k4 + t2;
-	b15 -= k5 + 7;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k2;
-	b13 -= k3 + t1;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k0;
-	b11 -= k1;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k15;
-	b9 -= k16;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k13;
-	b7 -= k14;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k11;
-	b5 -= k12;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k9;
-	b3 -= k10;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k7;
-	b1 -= k8;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k3 + t1;
-	b15 -= k4 + 6;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k1;
-	b13 -= k2 + t0;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k16;
-	b11 -= k0;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k14;
-	b9 -= k15;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k12;
-	b7 -= k13;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k10;
-	b5 -= k11;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k8;
-	b3 -= k9;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k6;
-	b1 -= k7;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k2 + t0;
-	b15 -= k3 + 5;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k0;
-	b13 -= k1 + t2;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k15;
-	b11 -= k16;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k13;
-	b9 -= k14;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k11;
-	b7 -= k12;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k9;
-	b5 -= k10;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k7;
-	b3 -= k8;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k5;
-	b1 -= k6;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k1 + t2;
-	b15 -= k2 + 4;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k16;
-	b13 -= k0 + t1;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k14;
-	b11 -= k15;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k12;
-	b9 -= k13;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k10;
-	b7 -= k11;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k8;
-	b5 -= k9;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k6;
-	b3 -= k7;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k4;
-	b1 -= k5;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k0 + t1;
-	b15 -= k1 + 3;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k15;
-	b13 -= k16 + t0;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k13;
-	b11 -= k14;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k11;
-	b9 -= k12;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k9;
-	b7 -= k10;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k7;
-	b5 -= k8;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k5;
-	b3 -= k6;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k3;
-	b1 -= k4;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k16 + t0;
-	b15 -= k0 + 2;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k14;
-	b13 -= k15 + t2;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k12;
-	b11 -= k13;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k10;
-	b9 -= k11;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k8;
-	b7 -= k9;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k6;
-	b5 -= k7;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k4;
-	b3 -= k5;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k2;
-	b1 -= k3;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 20) | (tmp << (64 - 20));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 37) | (tmp << (64 - 37));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 31) | (tmp << (64 - 31));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 52) | (tmp << (64 - 52));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 35) | (tmp << (64 - 35));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 48) | (tmp << (64 - 48));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 25) | (tmp << (64 - 25));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 44) | (tmp << (64 - 44));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 19) | (tmp << (64 - 19));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 47) | (tmp << (64 - 47));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 44) | (tmp << (64 - 44));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 42) | (tmp << (64 - 42));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 53) | (tmp << (64 - 53));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 4) | (tmp << (64 - 4));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 56) | (tmp << (64 - 56));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 34) | (tmp << (64 - 34));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 16) | (tmp << (64 - 16));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 30) | (tmp << (64 - 30));
-	b14 -= b15 + k15 + t2;
-	b15 -= k16 + 1;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 44) | (tmp << (64 - 44));
-	b12 -= b13 + k13;
-	b13 -= k14 + t1;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 47) | (tmp << (64 - 47));
-	b10 -= b11 + k11;
-	b11 -= k12;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 12) | (tmp << (64 - 12));
-	b8 -= b9 + k9;
-	b9 -= k10;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 31) | (tmp << (64 - 31));
-	b6 -= b7 + k7;
-	b7 -= k8;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 37) | (tmp << (64 - 37));
-	b4 -= b5 + k5;
-	b5 -= k6;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 9) | (tmp << (64 - 9));
-	b2 -= b3 + k3;
-	b3 -= k4;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 41) | (tmp << (64 - 41));
-	b0 -= b1 + k1;
-	b1 -= k2;
-
-	tmp = b7 ^ b12;
-	b7 = (tmp >> 25) | (tmp << (64 - 25));
-	b12 -= b7;
-
-	tmp = b3 ^ b10;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b10 -= b3;
-
-	tmp = b5 ^ b8;
-	b5 = (tmp >> 28) | (tmp << (64 - 28));
-	b8 -= b5;
-
-	tmp = b1 ^ b14;
-	b1 = (tmp >> 47) | (tmp << (64 - 47));
-	b14 -= b1;
-
-	tmp = b9 ^ b4;
-	b9 = (tmp >> 41) | (tmp << (64 - 41));
-	b4 -= b9;
-
-	tmp = b13 ^ b6;
-	b13 = (tmp >> 48) | (tmp << (64 - 48));
-	b6 -= b13;
-
-	tmp = b11 ^ b2;
-	b11 = (tmp >> 20) | (tmp << (64 - 20));
-	b2 -= b11;
-
-	tmp = b15 ^ b0;
-	b15 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b15;
-
-	tmp = b9 ^ b10;
-	b9 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b9;
-
-	tmp = b11 ^ b8;
-	b11 = (tmp >> 59) | (tmp << (64 - 59));
-	b8 -= b11;
-
-	tmp = b13 ^ b14;
-	b13 = (tmp >> 41) | (tmp << (64 - 41));
-	b14 -= b13;
-
-	tmp = b15 ^ b12;
-	b15 = (tmp >> 34) | (tmp << (64 - 34));
-	b12 -= b15;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b6 -= b1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 51) | (tmp << (64 - 51));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 4) | (tmp << (64 - 4));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 33) | (tmp << (64 - 33));
-	b0 -= b7;
-
-	tmp = b1 ^ b8;
-	b1 = (tmp >> 52) | (tmp << (64 - 52));
-	b8 -= b1;
-
-	tmp = b5 ^ b14;
-	b5 = (tmp >> 23) | (tmp << (64 - 23));
-	b14 -= b5;
-
-	tmp = b3 ^ b12;
-	b3 = (tmp >> 18) | (tmp << (64 - 18));
-	b12 -= b3;
-
-	tmp = b7 ^ b10;
-	b7 = (tmp >> 49) | (tmp << (64 - 49));
-	b10 -= b7;
-
-	tmp = b15 ^ b4;
-	b15 = (tmp >> 55) | (tmp << (64 - 55));
-	b4 -= b15;
-
-	tmp = b11 ^ b6;
-	b11 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b11;
-
-	tmp = b13 ^ b2;
-	b13 = (tmp >> 19) | (tmp << (64 - 19));
-	b2 -= b13;
-
-	tmp = b9 ^ b0;
-	b9 = (tmp >> 38) | (tmp << (64 - 38));
-	b0 -= b9;
-
-	tmp = b15 ^ b14;
-	b15 = (tmp >> 37) | (tmp << (64 - 37));
-	b14 -= b15 + k14 + t1;
-	b15 -= k15;
-
-	tmp = b13 ^ b12;
-	b13 = (tmp >> 22) | (tmp << (64 - 22));
-	b12 -= b13 + k12;
-	b13 -= k13 + t0;
-
-	tmp = b11 ^ b10;
-	b11 = (tmp >> 17) | (tmp << (64 - 17));
-	b10 -= b11 + k10;
-	b11 -= k11;
-
-	tmp = b9 ^ b8;
-	b9 = (tmp >> 8) | (tmp << (64 - 8));
-	b8 -= b9 + k8;
-	b9 -= k9;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 47) | (tmp << (64 - 47));
-	b6 -= b7 + k6;
-	b7 -= k7;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 8) | (tmp << (64 - 8));
-	b4 -= b5 + k4;
-	b5 -= k5;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b3 + k2;
-	b3 -= k3;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 24) | (tmp << (64 - 24));
-	b0 -= b1 + k0;
-	b1 -= k1;
-
-	output[15] = b15;
-	output[14] = b14;
-	output[13] = b13;
-	output[12] = b12;
-	output[11] = b11;
-	output[10] = b10;
-	output[9] = b9;
-	output[8] = b8;
-	output[7] = b7;
-	output[6] = b6;
-	output[5] = b5;
-	output[4] = b4;
-	output[3] = b3;
-	output[2] = b2;
-	output[1] = b1;
-	output[0] = b0;
-}
diff --git a/drivers/staging/skein/threefish_256_block.c b/drivers/staging/skein/threefish_256_block.c
deleted file mode 100644
index 0b33b3f..0000000
--- a/drivers/staging/skein/threefish_256_block.c
+++ /dev/null
@@ -1,1139 +0,0 @@
-#include <linux/string.h>
-#include "threefish_api.h"
-
-
-void threefish_encrypt_256(struct threefish_key *key_ctx, u64 *input,
-			   u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	    b2 = input[2], b3 = input[3];
-	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
-	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
-	    k4 = key_ctx->key[4];
-	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
-	    t2 = key_ctx->tweak[2];
-
-	b1 += k1 + t0;
-	b0 += b1 + k0;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k3;
-	b2 += b3 + k2 + t1;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k2 + t1;
-	b0 += b1 + k1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k4 + 1;
-	b2 += b3 + k3 + t2;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-
-	b1 += k3 + t2;
-	b0 += b1 + k2;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k0 + 2;
-	b2 += b3 + k4 + t0;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k4 + t0;
-	b0 += b1 + k3;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k1 + 3;
-	b2 += b3 + k0 + t1;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-
-	b1 += k0 + t1;
-	b0 += b1 + k4;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k2 + 4;
-	b2 += b3 + k1 + t2;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k1 + t2;
-	b0 += b1 + k0;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k3 + 5;
-	b2 += b3 + k2 + t0;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-
-	b1 += k2 + t0;
-	b0 += b1 + k1;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k4 + 6;
-	b2 += b3 + k3 + t1;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k3 + t1;
-	b0 += b1 + k2;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k0 + 7;
-	b2 += b3 + k4 + t2;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-
-	b1 += k4 + t2;
-	b0 += b1 + k3;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k1 + 8;
-	b2 += b3 + k0 + t0;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k0 + t0;
-	b0 += b1 + k4;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k2 + 9;
-	b2 += b3 + k1 + t1;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-
-	b1 += k1 + t1;
-	b0 += b1 + k0;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k3 + 10;
-	b2 += b3 + k2 + t2;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k2 + t2;
-	b0 += b1 + k1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k4 + 11;
-	b2 += b3 + k3 + t0;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-
-	b1 += k3 + t0;
-	b0 += b1 + k2;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k0 + 12;
-	b2 += b3 + k4 + t1;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k4 + t1;
-	b0 += b1 + k3;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k1 + 13;
-	b2 += b3 + k0 + t2;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-
-	b1 += k0 + t2;
-	b0 += b1 + k4;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k2 + 14;
-	b2 += b3 + k1 + t0;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k1 + t0;
-	b0 += b1 + k0;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k3 + 15;
-	b2 += b3 + k2 + t1;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-
-	b1 += k2 + t1;
-	b0 += b1 + k1;
-	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
-
-	b3 += k4 + 16;
-	b2 += b3 + k3 + t2;
-	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
-
-	b1 += k3 + t2;
-	b0 += b1 + k2;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
-
-	b3 += k0 + 17;
-	b2 += b3 + k4 + t0;
-	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
-
-	b0 += b1;
-	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
-
-	b2 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
-
-	b0 += b3;
-	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
-
-	b2 += b1;
-	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
-
-	output[0] = b0 + k3;
-	output[1] = b1 + k4 + t0;
-	output[2] = b2 + k0 + t1;
-	output[3] = b3 + k1 + 18;
-}
-
-void threefish_decrypt_256(struct threefish_key *key_ctx, u64 *input,
-			   u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	    b2 = input[2], b3 = input[3];
-	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
-	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
-	    k4 = key_ctx->key[4];
-	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
-	    t2 = key_ctx->tweak[2];
-
-	u64 tmp;
-
-	b0 -= k3;
-	b1 -= k4 + t0;
-	b2 -= k0 + t1;
-	b3 -= k1 + 18;
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k2;
-	b1 -= k3 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k4 + t0;
-	b3 -= k0 + 17;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k1;
-	b1 -= k2 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k3 + t2;
-	b3 -= k4 + 16;
-
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k0;
-	b1 -= k1 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k2 + t1;
-	b3 -= k3 + 15;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k4;
-	b1 -= k0 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k1 + t0;
-	b3 -= k2 + 14;
-
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k3;
-	b1 -= k4 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k0 + t2;
-	b3 -= k1 + 13;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k2;
-	b1 -= k3 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k4 + t1;
-	b3 -= k0 + 12;
-
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k1;
-	b1 -= k2 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k3 + t0;
-	b3 -= k4 + 11;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k0;
-	b1 -= k1 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k2 + t2;
-	b3 -= k3 + 10;
-
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k4;
-	b1 -= k0 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k1 + t1;
-	b3 -= k2 + 9;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k3;
-	b1 -= k4 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k0 + t0;
-	b3 -= k1 + 8;
-
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k2;
-	b1 -= k3 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k4 + t2;
-	b3 -= k0 + 7;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k1;
-	b1 -= k2 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k3 + t1;
-	b3 -= k4 + 6;
-
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k0;
-	b1 -= k1 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k2 + t0;
-	b3 -= k3 + 5;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k4;
-	b1 -= k0 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k1 + t2;
-	b3 -= k2 + 4;
-
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k3;
-	b1 -= k4 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k0 + t1;
-	b3 -= k1 + 3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k2;
-	b1 -= k3 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k4 + t0;
-	b3 -= k0 + 2;
-
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 32) | (tmp << (64 - 32));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 32) | (tmp << (64 - 32));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 58) | (tmp << (64 - 58));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 12) | (tmp << (64 - 12));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b0 -= b1 + k1;
-	b1 -= k2 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b3 + k3 + t2;
-	b3 -= k4 + 1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 5) | (tmp << (64 - 5));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 37) | (tmp << (64 - 37));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 23) | (tmp << (64 - 23));
-	b0 -= b1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 40) | (tmp << (64 - 40));
-	b2 -= b3;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 52) | (tmp << (64 - 52));
-	b0 -= b3;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 57) | (tmp << (64 - 57));
-	b2 -= b1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 14) | (tmp << (64 - 14));
-	b0 -= b1 + k0;
-	b1 -= k1 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 16) | (tmp << (64 - 16));
-	b2 -= b3 + k2 + t1;
-	b3 -= k3;
-
-	output[0] = b0;
-	output[1] = b1;
-	output[2] = b2;
-	output[3] = b3;
-}
diff --git a/drivers/staging/skein/threefish_512_block.c b/drivers/staging/skein/threefish_512_block.c
deleted file mode 100644
index 1c62bf6..0000000
--- a/drivers/staging/skein/threefish_512_block.c
+++ /dev/null
@@ -1,2225 +0,0 @@
-#include <linux/string.h>
-#include "threefish_api.h"
-
-
-void threefish_encrypt_512(struct threefish_key *key_ctx, u64 *input,
-			   u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	    b2 = input[2], b3 = input[3],
-	    b4 = input[4], b5 = input[5],
-	    b6 = input[6], b7 = input[7];
-	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
-	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
-	    k4 = key_ctx->key[4], k5 = key_ctx->key[5],
-	    k6 = key_ctx->key[6], k7 = key_ctx->key[7],
-	    k8 = key_ctx->key[8];
-	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
-	    t2 = key_ctx->tweak[2];
-
-	b1 += k1;
-	b0 += b1 + k0;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k3;
-	b2 += b3 + k2;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k5 + t0;
-	b4 += b5 + k4;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k7;
-	b6 += b7 + k6 + t1;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k2;
-	b0 += b1 + k1;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k4;
-	b2 += b3 + k3;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k6 + t1;
-	b4 += b5 + k5;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k8 + 1;
-	b6 += b7 + k7 + t2;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	b1 += k3;
-	b0 += b1 + k2;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k5;
-	b2 += b3 + k4;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k7 + t2;
-	b4 += b5 + k6;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k0 + 2;
-	b6 += b7 + k8 + t0;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k4;
-	b0 += b1 + k3;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k6;
-	b2 += b3 + k5;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k8 + t0;
-	b4 += b5 + k7;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k1 + 3;
-	b6 += b7 + k0 + t1;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	b1 += k5;
-	b0 += b1 + k4;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k7;
-	b2 += b3 + k6;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k0 + t1;
-	b4 += b5 + k8;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k2 + 4;
-	b6 += b7 + k1 + t2;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k6;
-	b0 += b1 + k5;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k8;
-	b2 += b3 + k7;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k1 + t2;
-	b4 += b5 + k0;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k3 + 5;
-	b6 += b7 + k2 + t0;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	b1 += k7;
-	b0 += b1 + k6;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k0;
-	b2 += b3 + k8;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k2 + t0;
-	b4 += b5 + k1;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k4 + 6;
-	b6 += b7 + k3 + t1;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k8;
-	b0 += b1 + k7;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k1;
-	b2 += b3 + k0;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k3 + t1;
-	b4 += b5 + k2;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k5 + 7;
-	b6 += b7 + k4 + t2;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	b1 += k0;
-	b0 += b1 + k8;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k2;
-	b2 += b3 + k1;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k4 + t2;
-	b4 += b5 + k3;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k6 + 8;
-	b6 += b7 + k5 + t0;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k1;
-	b0 += b1 + k0;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k3;
-	b2 += b3 + k2;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k5 + t0;
-	b4 += b5 + k4;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k7 + 9;
-	b6 += b7 + k6 + t1;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	b1 += k2;
-	b0 += b1 + k1;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k4;
-	b2 += b3 + k3;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k6 + t1;
-	b4 += b5 + k5;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k8 + 10;
-	b6 += b7 + k7 + t2;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k3;
-	b0 += b1 + k2;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k5;
-	b2 += b3 + k4;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k7 + t2;
-	b4 += b5 + k6;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k0 + 11;
-	b6 += b7 + k8 + t0;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	b1 += k4;
-	b0 += b1 + k3;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k6;
-	b2 += b3 + k5;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k8 + t0;
-	b4 += b5 + k7;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k1 + 12;
-	b6 += b7 + k0 + t1;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k5;
-	b0 += b1 + k4;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k7;
-	b2 += b3 + k6;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k0 + t1;
-	b4 += b5 + k8;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k2 + 13;
-	b6 += b7 + k1 + t2;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	b1 += k6;
-	b0 += b1 + k5;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k8;
-	b2 += b3 + k7;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k1 + t2;
-	b4 += b5 + k0;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k3 + 14;
-	b6 += b7 + k2 + t0;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k7;
-	b0 += b1 + k6;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k0;
-	b2 += b3 + k8;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k2 + t0;
-	b4 += b5 + k1;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k4 + 15;
-	b6 += b7 + k3 + t1;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	b1 += k8;
-	b0 += b1 + k7;
-	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
-
-	b3 += k1;
-	b2 += b3 + k0;
-	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
-
-	b5 += k3 + t1;
-	b4 += b5 + k2;
-	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
-
-	b7 += k5 + 16;
-	b6 += b7 + k4 + t2;
-	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
-
-	b1 += k0;
-	b0 += b1 + k8;
-	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
-
-	b3 += k2;
-	b2 += b3 + k1;
-	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
-
-	b5 += k4 + t2;
-	b4 += b5 + k3;
-	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
-
-	b7 += k6 + 17;
-	b6 += b7 + k5 + t0;
-	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
-
-	b2 += b1;
-	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
-
-	b4 += b7;
-	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
-
-	b6 += b5;
-	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
-
-	b0 += b3;
-	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
-
-	b4 += b1;
-	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
-
-	b6 += b3;
-	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
-
-	b0 += b5;
-	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
-
-	b2 += b7;
-	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
-
-	b6 += b1;
-	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
-
-	b0 += b7;
-	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
-
-	b2 += b5;
-	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
-
-	b4 += b3;
-	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
-
-	output[0] = b0 + k0;
-	output[1] = b1 + k1;
-	output[2] = b2 + k2;
-	output[3] = b3 + k3;
-	output[4] = b4 + k4;
-	output[5] = b5 + k5 + t0;
-	output[6] = b6 + k6 + t1;
-	output[7] = b7 + k7 + 18;
-}
-
-void threefish_decrypt_512(struct threefish_key *key_ctx, u64 *input,
-			   u64 *output)
-{
-	u64 b0 = input[0], b1 = input[1],
-	    b2 = input[2], b3 = input[3],
-	    b4 = input[4], b5 = input[5],
-	    b6 = input[6], b7 = input[7];
-	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
-	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
-	    k4 = key_ctx->key[4], k5 = key_ctx->key[5],
-	    k6 = key_ctx->key[6], k7 = key_ctx->key[7],
-	    k8 = key_ctx->key[8];
-	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
-	    t2 = key_ctx->tweak[2];
-
-	u64 tmp;
-
-	b0 -= k0;
-	b1 -= k1;
-	b2 -= k2;
-	b3 -= k3;
-	b4 -= k4;
-	b5 -= k5 + t0;
-	b6 -= k6 + t1;
-	b7 -= k7 + 18;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k5 + t0;
-	b7 -= k6 + 17;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k3;
-	b5 -= k4 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k1;
-	b3 -= k2;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k8;
-	b1 -= k0;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k4 + t2;
-	b7 -= k5 + 16;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k2;
-	b5 -= k3 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k0;
-	b3 -= k1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k7;
-	b1 -= k8;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k3 + t1;
-	b7 -= k4 + 15;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k1;
-	b5 -= k2 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k8;
-	b3 -= k0;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k6;
-	b1 -= k7;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k2 + t0;
-	b7 -= k3 + 14;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k0;
-	b5 -= k1 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k7;
-	b3 -= k8;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k5;
-	b1 -= k6;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k1 + t2;
-	b7 -= k2 + 13;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k8;
-	b5 -= k0 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k6;
-	b3 -= k7;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k4;
-	b1 -= k5;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k0 + t1;
-	b7 -= k1 + 12;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k7;
-	b5 -= k8 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k5;
-	b3 -= k6;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k3;
-	b1 -= k4;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k8 + t0;
-	b7 -= k0 + 11;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k6;
-	b5 -= k7 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k4;
-	b3 -= k5;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k2;
-	b1 -= k3;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k7 + t2;
-	b7 -= k8 + 10;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k5;
-	b5 -= k6 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k3;
-	b3 -= k4;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k1;
-	b1 -= k2;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k6 + t1;
-	b7 -= k7 + 9;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k4;
-	b5 -= k5 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k2;
-	b3 -= k3;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k0;
-	b1 -= k1;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k5 + t0;
-	b7 -= k6 + 8;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k3;
-	b5 -= k4 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k1;
-	b3 -= k2;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k8;
-	b1 -= k0;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k4 + t2;
-	b7 -= k5 + 7;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k2;
-	b5 -= k3 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k0;
-	b3 -= k1;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k7;
-	b1 -= k8;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k3 + t1;
-	b7 -= k4 + 6;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k1;
-	b5 -= k2 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k8;
-	b3 -= k0;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k6;
-	b1 -= k7;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k2 + t0;
-	b7 -= k3 + 5;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k0;
-	b5 -= k1 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k7;
-	b3 -= k8;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k5;
-	b1 -= k6;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k1 + t2;
-	b7 -= k2 + 4;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k8;
-	b5 -= k0 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k6;
-	b3 -= k7;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k4;
-	b1 -= k5;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k0 + t1;
-	b7 -= k1 + 3;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k7;
-	b5 -= k8 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k5;
-	b3 -= k6;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k3;
-	b1 -= k4;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k8 + t0;
-	b7 -= k0 + 2;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k6;
-	b5 -= k7 + t2;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k4;
-	b3 -= k5;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k2;
-	b1 -= k3;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 22) | (tmp << (64 - 22));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 56) | (tmp << (64 - 56));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 35) | (tmp << (64 - 35));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 8) | (tmp << (64 - 8));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 43) | (tmp << (64 - 43));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 29) | (tmp << (64 - 29));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 25) | (tmp << (64 - 25));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 17) | (tmp << (64 - 17));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 10) | (tmp << (64 - 10));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 50) | (tmp << (64 - 50));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 13) | (tmp << (64 - 13));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 24) | (tmp << (64 - 24));
-	b6 -= b7 + k7 + t2;
-	b7 -= k8 + 1;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 34) | (tmp << (64 - 34));
-	b4 -= b5 + k5;
-	b5 -= k6 + t1;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 30) | (tmp << (64 - 30));
-	b2 -= b3 + k3;
-	b3 -= k4;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 39) | (tmp << (64 - 39));
-	b0 -= b1 + k1;
-	b1 -= k2;
-
-	tmp = b3 ^ b4;
-	b3 = (tmp >> 56) | (tmp << (64 - 56));
-	b4 -= b3;
-
-	tmp = b5 ^ b2;
-	b5 = (tmp >> 54) | (tmp << (64 - 54));
-	b2 -= b5;
-
-	tmp = b7 ^ b0;
-	b7 = (tmp >> 9) | (tmp << (64 - 9));
-	b0 -= b7;
-
-	tmp = b1 ^ b6;
-	b1 = (tmp >> 44) | (tmp << (64 - 44));
-	b6 -= b1;
-
-	tmp = b7 ^ b2;
-	b7 = (tmp >> 39) | (tmp << (64 - 39));
-	b2 -= b7;
-
-	tmp = b5 ^ b0;
-	b5 = (tmp >> 36) | (tmp << (64 - 36));
-	b0 -= b5;
-
-	tmp = b3 ^ b6;
-	b3 = (tmp >> 49) | (tmp << (64 - 49));
-	b6 -= b3;
-
-	tmp = b1 ^ b4;
-	b1 = (tmp >> 17) | (tmp << (64 - 17));
-	b4 -= b1;
-
-	tmp = b3 ^ b0;
-	b3 = (tmp >> 42) | (tmp << (64 - 42));
-	b0 -= b3;
-
-	tmp = b5 ^ b6;
-	b5 = (tmp >> 14) | (tmp << (64 - 14));
-	b6 -= b5;
-
-	tmp = b7 ^ b4;
-	b7 = (tmp >> 27) | (tmp << (64 - 27));
-	b4 -= b7;
-
-	tmp = b1 ^ b2;
-	b1 = (tmp >> 33) | (tmp << (64 - 33));
-	b2 -= b1;
-
-	tmp = b7 ^ b6;
-	b7 = (tmp >> 37) | (tmp << (64 - 37));
-	b6 -= b7 + k6 + t1;
-	b7 -= k7;
-
-	tmp = b5 ^ b4;
-	b5 = (tmp >> 19) | (tmp << (64 - 19));
-	b4 -= b5 + k4;
-	b5 -= k5 + t0;
-
-	tmp = b3 ^ b2;
-	b3 = (tmp >> 36) | (tmp << (64 - 36));
-	b2 -= b3 + k2;
-	b3 -= k3;
-
-	tmp = b1 ^ b0;
-	b1 = (tmp >> 46) | (tmp << (64 - 46));
-	b0 -= b1 + k0;
-	b1 -= k1;
-
-	output[0] = b0;
-	output[1] = b1;
-	output[2] = b2;
-	output[3] = b3;
-
-	output[7] = b7;
-	output[6] = b6;
-	output[5] = b5;
-	output[4] = b4;
-}
diff --git a/drivers/staging/skein/threefish_block.c b/drivers/staging/skein/threefish_block.c
new file mode 100644
index 0000000..bd1e15c
--- /dev/null
+++ b/drivers/staging/skein/threefish_block.c
@@ -0,0 +1,8258 @@
+#include "threefish_api.h"
+
+void threefish_encrypt_256(struct threefish_key *key_ctx, u64 *input,
+			   u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	    b2 = input[2], b3 = input[3];
+	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
+	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
+	    k4 = key_ctx->key[4];
+	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
+	    t2 = key_ctx->tweak[2];
+
+	b1 += k1 + t0;
+	b0 += b1 + k0;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k2 + t1;
+	b0 += b1 + k1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k4 + 1;
+	b2 += b3 + k3 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k3 + t2;
+	b0 += b1 + k2;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k0 + 2;
+	b2 += b3 + k4 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k4 + t0;
+	b0 += b1 + k3;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k1 + 3;
+	b2 += b3 + k0 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k0 + t1;
+	b0 += b1 + k4;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k2 + 4;
+	b2 += b3 + k1 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k1 + t2;
+	b0 += b1 + k0;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k3 + 5;
+	b2 += b3 + k2 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k2 + t0;
+	b0 += b1 + k1;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k4 + 6;
+	b2 += b3 + k3 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k3 + t1;
+	b0 += b1 + k2;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k0 + 7;
+	b2 += b3 + k4 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k4 + t2;
+	b0 += b1 + k3;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k1 + 8;
+	b2 += b3 + k0 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k0 + t0;
+	b0 += b1 + k4;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k2 + 9;
+	b2 += b3 + k1 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k1 + t1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k3 + 10;
+	b2 += b3 + k2 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k2 + t2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k4 + 11;
+	b2 += b3 + k3 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k3 + t0;
+	b0 += b1 + k2;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k0 + 12;
+	b2 += b3 + k4 + t1;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k4 + t1;
+	b0 += b1 + k3;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k1 + 13;
+	b2 += b3 + k0 + t2;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k0 + t2;
+	b0 += b1 + k4;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k2 + 14;
+	b2 += b3 + k1 + t0;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k1 + t0;
+	b0 += b1 + k0;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k3 + 15;
+	b2 += b3 + k2 + t1;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+
+	b1 += k2 + t1;
+	b0 += b1 + k1;
+	b1 = ((b1 << 14) | (b1 >> (64 - 14))) ^ b0;
+
+	b3 += k4 + 16;
+	b2 += b3 + k3 + t2;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 52) | (b3 >> (64 - 52))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 57) | (b1 >> (64 - 57))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 40) | (b3 >> (64 - 40))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 5) | (b3 >> (64 - 5))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 37) | (b1 >> (64 - 37))) ^ b2;
+
+	b1 += k3 + t2;
+	b0 += b1 + k2;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b0;
+
+	b3 += k0 + 17;
+	b2 += b3 + k4 + t0;
+	b3 = ((b3 << 33) | (b3 >> (64 - 33))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 46) | (b3 >> (64 - 46))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 12) | (b1 >> (64 - 12))) ^ b2;
+
+	b0 += b1;
+	b1 = ((b1 << 58) | (b1 >> (64 - 58))) ^ b0;
+
+	b2 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b2;
+
+	b0 += b3;
+	b3 = ((b3 << 32) | (b3 >> (64 - 32))) ^ b0;
+
+	b2 += b1;
+	b1 = ((b1 << 32) | (b1 >> (64 - 32))) ^ b2;
+
+	output[0] = b0 + k3;
+	output[1] = b1 + k4 + t0;
+	output[2] = b2 + k0 + t1;
+	output[3] = b3 + k1 + 18;
+}
+
+void threefish_decrypt_256(struct threefish_key *key_ctx, u64 *input,
+			   u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	    b2 = input[2], b3 = input[3];
+	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
+	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
+	    k4 = key_ctx->key[4];
+	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
+	    t2 = key_ctx->tweak[2];
+
+	u64 tmp;
+
+	b0 -= k3;
+	b1 -= k4 + t0;
+	b2 -= k0 + t1;
+	b3 -= k1 + 18;
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k2;
+	b1 -= k3 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k4 + t0;
+	b3 -= k0 + 17;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k1;
+	b1 -= k2 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k3 + t2;
+	b3 -= k4 + 16;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k0;
+	b1 -= k1 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k2 + t1;
+	b3 -= k3 + 15;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k4;
+	b1 -= k0 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k1 + t0;
+	b3 -= k2 + 14;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k3;
+	b1 -= k4 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k0 + t2;
+	b3 -= k1 + 13;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k2;
+	b1 -= k3 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k4 + t1;
+	b3 -= k0 + 12;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k1;
+	b1 -= k2 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k3 + t0;
+	b3 -= k4 + 11;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k0;
+	b1 -= k1 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k2 + t2;
+	b3 -= k3 + 10;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k4;
+	b1 -= k0 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k1 + t1;
+	b3 -= k2 + 9;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k3;
+	b1 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k0 + t0;
+	b3 -= k1 + 8;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k2;
+	b1 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k4 + t2;
+	b3 -= k0 + 7;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k1;
+	b1 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k3 + t1;
+	b3 -= k4 + 6;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k0;
+	b1 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k2 + t0;
+	b3 -= k3 + 5;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k4;
+	b1 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k1 + t2;
+	b3 -= k2 + 4;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k3;
+	b1 -= k4 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k0 + t1;
+	b3 -= k1 + 3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k2;
+	b1 -= k3 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k4 + t0;
+	b3 -= k0 + 2;
+
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 32) | (tmp << (64 - 32));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 32) | (tmp << (64 - 32));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 58) | (tmp << (64 - 58));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 12) | (tmp << (64 - 12));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b0 -= b1 + k1;
+	b1 -= k2 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b3 + k3 + t2;
+	b3 -= k4 + 1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 37) | (tmp << (64 - 37));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b0 -= b1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 40) | (tmp << (64 - 40));
+	b2 -= b3;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 52) | (tmp << (64 - 52));
+	b0 -= b3;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 57) | (tmp << (64 - 57));
+	b2 -= b1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 14) | (tmp << (64 - 14));
+	b0 -= b1 + k0;
+	b1 -= k1 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b2 -= b3 + k2 + t1;
+	b3 -= k3;
+
+	output[0] = b0;
+	output[1] = b1;
+	output[2] = b2;
+	output[3] = b3;
+}
+
+void threefish_encrypt_512(struct threefish_key *key_ctx, u64 *input,
+			   u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	    b2 = input[2], b3 = input[3],
+	    b4 = input[4], b5 = input[5],
+	    b6 = input[6], b7 = input[7];
+	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
+	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
+	    k4 = key_ctx->key[4], k5 = key_ctx->key[5],
+	    k6 = key_ctx->key[6], k7 = key_ctx->key[7],
+	    k8 = key_ctx->key[8];
+	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
+	    t2 = key_ctx->tweak[2];
+
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k5 + t0;
+	b4 += b5 + k4;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k6 + t1;
+	b4 += b5 + k5;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k8 + 1;
+	b6 += b7 + k7 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k7 + t2;
+	b4 += b5 + k6;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k0 + 2;
+	b6 += b7 + k8 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k8 + t0;
+	b4 += b5 + k7;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k1 + 3;
+	b6 += b7 + k0 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k0 + t1;
+	b4 += b5 + k8;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k2 + 4;
+	b6 += b7 + k1 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k1 + t2;
+	b4 += b5 + k0;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k3 + 5;
+	b6 += b7 + k2 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k8;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k2 + t0;
+	b4 += b5 + k1;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k4 + 6;
+	b6 += b7 + k3 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k3 + t1;
+	b4 += b5 + k2;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k5 + 7;
+	b6 += b7 + k4 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k0;
+	b0 += b1 + k8;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k4 + t2;
+	b4 += b5 + k3;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k6 + 8;
+	b6 += b7 + k5 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k5 + t0;
+	b4 += b5 + k4;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k7 + 9;
+	b6 += b7 + k6 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k6 + t1;
+	b4 += b5 + k5;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k8 + 10;
+	b6 += b7 + k7 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k7 + t2;
+	b4 += b5 + k6;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k0 + 11;
+	b6 += b7 + k8 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k8 + t0;
+	b4 += b5 + k7;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k1 + 12;
+	b6 += b7 + k0 + t1;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k0 + t1;
+	b4 += b5 + k8;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k2 + 13;
+	b6 += b7 + k1 + t2;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k1 + t2;
+	b4 += b5 + k0;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k3 + 14;
+	b6 += b7 + k2 + t0;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k8;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k2 + t0;
+	b4 += b5 + k1;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k4 + 15;
+	b6 += b7 + k3 + t1;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 36) | (b3 >> (64 - 36))) ^ b2;
+
+	b5 += k3 + t1;
+	b4 += b5 + k2;
+	b5 = ((b5 << 19) | (b5 >> (64 - 19))) ^ b4;
+
+	b7 += k5 + 16;
+	b6 += b7 + k4 + t2;
+	b7 = ((b7 << 37) | (b7 >> (64 - 37))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 33) | (b1 >> (64 - 33))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 27) | (b7 >> (64 - 27))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 14) | (b5 >> (64 - 14))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 42) | (b3 >> (64 - 42))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 17) | (b1 >> (64 - 17))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 49) | (b3 >> (64 - 49))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 36) | (b5 >> (64 - 36))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 39) | (b7 >> (64 - 39))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 44) | (b1 >> (64 - 44))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 9) | (b7 >> (64 - 9))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 54) | (b5 >> (64 - 54))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 56) | (b3 >> (64 - 56))) ^ b4;
+
+	b1 += k0;
+	b0 += b1 + k8;
+	b1 = ((b1 << 39) | (b1 >> (64 - 39))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 30) | (b3 >> (64 - 30))) ^ b2;
+
+	b5 += k4 + t2;
+	b4 += b5 + k3;
+	b5 = ((b5 << 34) | (b5 >> (64 - 34))) ^ b4;
+
+	b7 += k6 + 17;
+	b6 += b7 + k5 + t0;
+	b7 = ((b7 << 24) | (b7 >> (64 - 24))) ^ b6;
+
+	b2 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b2;
+
+	b4 += b7;
+	b7 = ((b7 << 50) | (b7 >> (64 - 50))) ^ b4;
+
+	b6 += b5;
+	b5 = ((b5 << 10) | (b5 >> (64 - 10))) ^ b6;
+
+	b0 += b3;
+	b3 = ((b3 << 17) | (b3 >> (64 - 17))) ^ b0;
+
+	b4 += b1;
+	b1 = ((b1 << 25) | (b1 >> (64 - 25))) ^ b4;
+
+	b6 += b3;
+	b3 = ((b3 << 29) | (b3 >> (64 - 29))) ^ b6;
+
+	b0 += b5;
+	b5 = ((b5 << 39) | (b5 >> (64 - 39))) ^ b0;
+
+	b2 += b7;
+	b7 = ((b7 << 43) | (b7 >> (64 - 43))) ^ b2;
+
+	b6 += b1;
+	b1 = ((b1 << 8) | (b1 >> (64 - 8))) ^ b6;
+
+	b0 += b7;
+	b7 = ((b7 << 35) | (b7 >> (64 - 35))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 56) | (b5 >> (64 - 56))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 22) | (b3 >> (64 - 22))) ^ b4;
+
+	output[0] = b0 + k0;
+	output[1] = b1 + k1;
+	output[2] = b2 + k2;
+	output[3] = b3 + k3;
+	output[4] = b4 + k4;
+	output[5] = b5 + k5 + t0;
+	output[6] = b6 + k6 + t1;
+	output[7] = b7 + k7 + 18;
+}
+
+void threefish_decrypt_512(struct threefish_key *key_ctx, u64 *input,
+			   u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	    b2 = input[2], b3 = input[3],
+	    b4 = input[4], b5 = input[5],
+	    b6 = input[6], b7 = input[7];
+	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
+	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
+	    k4 = key_ctx->key[4], k5 = key_ctx->key[5],
+	    k6 = key_ctx->key[6], k7 = key_ctx->key[7],
+	    k8 = key_ctx->key[8];
+	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
+	    t2 = key_ctx->tweak[2];
+
+	u64 tmp;
+
+	b0 -= k0;
+	b1 -= k1;
+	b2 -= k2;
+	b3 -= k3;
+	b4 -= k4;
+	b5 -= k5 + t0;
+	b6 -= k6 + t1;
+	b7 -= k7 + 18;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k5 + t0;
+	b7 -= k6 + 17;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k3;
+	b5 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k8;
+	b1 -= k0;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k4 + t2;
+	b7 -= k5 + 16;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k2;
+	b5 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k3 + t1;
+	b7 -= k4 + 15;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k1;
+	b5 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k8;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k2 + t0;
+	b7 -= k3 + 14;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k0;
+	b5 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k1 + t2;
+	b7 -= k2 + 13;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k8;
+	b5 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k0 + t1;
+	b7 -= k1 + 12;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k7;
+	b5 -= k8 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k8 + t0;
+	b7 -= k0 + 11;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k6;
+	b5 -= k7 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k7 + t2;
+	b7 -= k8 + 10;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k5;
+	b5 -= k6 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k6 + t1;
+	b7 -= k7 + 9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k4;
+	b5 -= k5 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k0;
+	b1 -= k1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k5 + t0;
+	b7 -= k6 + 8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k3;
+	b5 -= k4 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k8;
+	b1 -= k0;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k4 + t2;
+	b7 -= k5 + 7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k2;
+	b5 -= k3 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k3 + t1;
+	b7 -= k4 + 6;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k1;
+	b5 -= k2 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k8;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k2 + t0;
+	b7 -= k3 + 5;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k0;
+	b5 -= k1 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k1 + t2;
+	b7 -= k2 + 4;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k8;
+	b5 -= k0 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k0 + t1;
+	b7 -= k1 + 3;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k7;
+	b5 -= k8 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k8 + t0;
+	b7 -= k0 + 2;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k6;
+	b5 -= k7 + t2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 22) | (tmp << (64 - 22));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 56) | (tmp << (64 - 56));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 35) | (tmp << (64 - 35));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 8) | (tmp << (64 - 8));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 43) | (tmp << (64 - 43));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 29) | (tmp << (64 - 29));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 25) | (tmp << (64 - 25));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 17) | (tmp << (64 - 17));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 50) | (tmp << (64 - 50));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 24) | (tmp << (64 - 24));
+	b6 -= b7 + k7 + t2;
+	b7 -= k8 + 1;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 34) | (tmp << (64 - 34));
+	b4 -= b5 + k5;
+	b5 -= k6 + t1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 30) | (tmp << (64 - 30));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 39) | (tmp << (64 - 39));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 56) | (tmp << (64 - 56));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 54) | (tmp << (64 - 54));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b7;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 44) | (tmp << (64 - 44));
+	b6 -= b1;
+
+	tmp = b7 ^ b2;
+	b7 = (tmp >> 39) | (tmp << (64 - 39));
+	b2 -= b7;
+
+	tmp = b5 ^ b0;
+	b5 = (tmp >> 36) | (tmp << (64 - 36));
+	b0 -= b5;
+
+	tmp = b3 ^ b6;
+	b3 = (tmp >> 49) | (tmp << (64 - 49));
+	b6 -= b3;
+
+	tmp = b1 ^ b4;
+	b1 = (tmp >> 17) | (tmp << (64 - 17));
+	b4 -= b1;
+
+	tmp = b3 ^ b0;
+	b3 = (tmp >> 42) | (tmp << (64 - 42));
+	b0 -= b3;
+
+	tmp = b5 ^ b6;
+	b5 = (tmp >> 14) | (tmp << (64 - 14));
+	b6 -= b5;
+
+	tmp = b7 ^ b4;
+	b7 = (tmp >> 27) | (tmp << (64 - 27));
+	b4 -= b7;
+
+	tmp = b1 ^ b2;
+	b1 = (tmp >> 33) | (tmp << (64 - 33));
+	b2 -= b1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 37) | (tmp << (64 - 37));
+	b6 -= b7 + k6 + t1;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 19) | (tmp << (64 - 19));
+	b4 -= b5 + k4;
+	b5 -= k5 + t0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 36) | (tmp << (64 - 36));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b0 -= b1 + k0;
+	b1 -= k1;
+
+	output[0] = b0;
+	output[1] = b1;
+	output[2] = b2;
+	output[3] = b3;
+
+	output[7] = b7;
+	output[6] = b6;
+	output[5] = b5;
+	output[4] = b4;
+}
+
+void threefish_encrypt_1024(struct threefish_key *key_ctx, u64 *input,
+			    u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	    b2 = input[2], b3 = input[3],
+	    b4 = input[4], b5 = input[5],
+	    b6 = input[6], b7 = input[7],
+	    b8 = input[8], b9 = input[9],
+	    b10 = input[10], b11 = input[11],
+	    b12 = input[12], b13 = input[13],
+	    b14 = input[14], b15 = input[15];
+	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
+	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
+	    k4 = key_ctx->key[4], k5 = key_ctx->key[5],
+	    k6 = key_ctx->key[6], k7 = key_ctx->key[7],
+	    k8 = key_ctx->key[8], k9 = key_ctx->key[9],
+	    k10 = key_ctx->key[10], k11 = key_ctx->key[11],
+	    k12 = key_ctx->key[12], k13 = key_ctx->key[13],
+	    k14 = key_ctx->key[14], k15 = key_ctx->key[15],
+	    k16 = key_ctx->key[16];
+	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
+	    t2 = key_ctx->tweak[2];
+
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k5;
+	b4 += b5 + k4;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k9;
+	b8 += b9 + k8;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k11;
+	b10 += b11 + k10;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k13 + t0;
+	b12 += b13 + k12;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k15;
+	b14 += b15 + k14 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k6;
+	b4 += b5 + k5;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k8;
+	b6 += b7 + k7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k10;
+	b8 += b9 + k9;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k12;
+	b10 += b11 + k11;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k14 + t1;
+	b12 += b13 + k13;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k16 + 1;
+	b14 += b15 + k15 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k7;
+	b4 += b5 + k6;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k9;
+	b6 += b7 + k8;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k11;
+	b8 += b9 + k10;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k13;
+	b10 += b11 + k12;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k15 + t2;
+	b12 += b13 + k14;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k0 + 2;
+	b14 += b15 + k16 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k4;
+	b0 += b1 + k3;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k6;
+	b2 += b3 + k5;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k8;
+	b4 += b5 + k7;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k10;
+	b6 += b7 + k9;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k12;
+	b8 += b9 + k11;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k14;
+	b10 += b11 + k13;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k16 + t0;
+	b12 += b13 + k15;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k1 + 3;
+	b14 += b15 + k0 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k5;
+	b0 += b1 + k4;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k7;
+	b2 += b3 + k6;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k9;
+	b4 += b5 + k8;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k11;
+	b6 += b7 + k10;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k13;
+	b8 += b9 + k12;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k15;
+	b10 += b11 + k14;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k0 + t1;
+	b12 += b13 + k16;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k2 + 4;
+	b14 += b15 + k1 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k6;
+	b0 += b1 + k5;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k8;
+	b2 += b3 + k7;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k10;
+	b4 += b5 + k9;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k12;
+	b6 += b7 + k11;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k14;
+	b8 += b9 + k13;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k16;
+	b10 += b11 + k15;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k1 + t2;
+	b12 += b13 + k0;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k3 + 5;
+	b14 += b15 + k2 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k7;
+	b0 += b1 + k6;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k9;
+	b2 += b3 + k8;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k11;
+	b4 += b5 + k10;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k13;
+	b6 += b7 + k12;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k15;
+	b8 += b9 + k14;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k0;
+	b10 += b11 + k16;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k2 + t0;
+	b12 += b13 + k1;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k4 + 6;
+	b14 += b15 + k3 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k8;
+	b0 += b1 + k7;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k10;
+	b2 += b3 + k9;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k12;
+	b4 += b5 + k11;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k14;
+	b6 += b7 + k13;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k16;
+	b8 += b9 + k15;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k1;
+	b10 += b11 + k0;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k3 + t1;
+	b12 += b13 + k2;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k5 + 7;
+	b14 += b15 + k4 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k9;
+	b0 += b1 + k8;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k11;
+	b2 += b3 + k10;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k13;
+	b4 += b5 + k12;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k15;
+	b6 += b7 + k14;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k0;
+	b8 += b9 + k16;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k2;
+	b10 += b11 + k1;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k4 + t2;
+	b12 += b13 + k3;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k6 + 8;
+	b14 += b15 + k5 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k10;
+	b0 += b1 + k9;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k12;
+	b2 += b3 + k11;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k14;
+	b4 += b5 + k13;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k16;
+	b6 += b7 + k15;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k1;
+	b8 += b9 + k0;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k3;
+	b10 += b11 + k2;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k5 + t0;
+	b12 += b13 + k4;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k7 + 9;
+	b14 += b15 + k6 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k11;
+	b0 += b1 + k10;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k13;
+	b2 += b3 + k12;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k15;
+	b4 += b5 + k14;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k0;
+	b6 += b7 + k16;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k2;
+	b8 += b9 + k1;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k4;
+	b10 += b11 + k3;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k6 + t1;
+	b12 += b13 + k5;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k8 + 10;
+	b14 += b15 + k7 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k12;
+	b0 += b1 + k11;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k14;
+	b2 += b3 + k13;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k16;
+	b4 += b5 + k15;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k1;
+	b6 += b7 + k0;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k3;
+	b8 += b9 + k2;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k5;
+	b10 += b11 + k4;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k7 + t2;
+	b12 += b13 + k6;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k9 + 11;
+	b14 += b15 + k8 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k13;
+	b0 += b1 + k12;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k15;
+	b2 += b3 + k14;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k0;
+	b4 += b5 + k16;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k2;
+	b6 += b7 + k1;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k4;
+	b8 += b9 + k3;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k6;
+	b10 += b11 + k5;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k8 + t0;
+	b12 += b13 + k7;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k10 + 12;
+	b14 += b15 + k9 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k14;
+	b0 += b1 + k13;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k16;
+	b2 += b3 + k15;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k1;
+	b4 += b5 + k0;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k3;
+	b6 += b7 + k2;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k5;
+	b8 += b9 + k4;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k7;
+	b10 += b11 + k6;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k9 + t1;
+	b12 += b13 + k8;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k11 + 13;
+	b14 += b15 + k10 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k15;
+	b0 += b1 + k14;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k0;
+	b2 += b3 + k16;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k2;
+	b4 += b5 + k1;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k4;
+	b6 += b7 + k3;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k6;
+	b8 += b9 + k5;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k8;
+	b10 += b11 + k7;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k10 + t2;
+	b12 += b13 + k9;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k12 + 14;
+	b14 += b15 + k11 + t0;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k16;
+	b0 += b1 + k15;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k1;
+	b2 += b3 + k0;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k3;
+	b4 += b5 + k2;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k5;
+	b6 += b7 + k4;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k7;
+	b8 += b9 + k6;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k9;
+	b10 += b11 + k8;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k11 + t0;
+	b12 += b13 + k10;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k13 + 15;
+	b14 += b15 + k12 + t1;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k0;
+	b0 += b1 + k16;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k2;
+	b2 += b3 + k1;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k4;
+	b4 += b5 + k3;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k6;
+	b6 += b7 + k5;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k8;
+	b8 += b9 + k7;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k10;
+	b10 += b11 + k9;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k12 + t1;
+	b12 += b13 + k11;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k14 + 16;
+	b14 += b15 + k13 + t2;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k1;
+	b0 += b1 + k0;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k3;
+	b2 += b3 + k2;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k5;
+	b4 += b5 + k4;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k7;
+	b6 += b7 + k6;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k9;
+	b8 += b9 + k8;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k11;
+	b10 += b11 + k10;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k13 + t2;
+	b12 += b13 + k12;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k15 + 17;
+	b14 += b15 + k14 + t0;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	b1 += k2;
+	b0 += b1 + k1;
+	b1 = ((b1 << 24) | (b1 >> (64 - 24))) ^ b0;
+
+	b3 += k4;
+	b2 += b3 + k3;
+	b3 = ((b3 << 13) | (b3 >> (64 - 13))) ^ b2;
+
+	b5 += k6;
+	b4 += b5 + k5;
+	b5 = ((b5 << 8) | (b5 >> (64 - 8))) ^ b4;
+
+	b7 += k8;
+	b6 += b7 + k7;
+	b7 = ((b7 << 47) | (b7 >> (64 - 47))) ^ b6;
+
+	b9 += k10;
+	b8 += b9 + k9;
+	b9 = ((b9 << 8) | (b9 >> (64 - 8))) ^ b8;
+
+	b11 += k12;
+	b10 += b11 + k11;
+	b11 = ((b11 << 17) | (b11 >> (64 - 17))) ^ b10;
+
+	b13 += k14 + t0;
+	b12 += b13 + k13;
+	b13 = ((b13 << 22) | (b13 >> (64 - 22))) ^ b12;
+
+	b15 += k16 + 18;
+	b14 += b15 + k15 + t1;
+	b15 = ((b15 << 37) | (b15 >> (64 - 37))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 38) | (b9 >> (64 - 38))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 19) | (b13 >> (64 - 19))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 10) | (b11 >> (64 - 10))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 55) | (b15 >> (64 - 55))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 49) | (b7 >> (64 - 49))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 18) | (b3 >> (64 - 18))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 23) | (b5 >> (64 - 23))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 52) | (b1 >> (64 - 52))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 33) | (b7 >> (64 - 33))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 4) | (b5 >> (64 - 4))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 51) | (b3 >> (64 - 51))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 13) | (b1 >> (64 - 13))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 34) | (b15 >> (64 - 34))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 41) | (b13 >> (64 - 41))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 59) | (b11 >> (64 - 59))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 17) | (b9 >> (64 - 17))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 5) | (b15 >> (64 - 5))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 20) | (b11 >> (64 - 20))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 48) | (b13 >> (64 - 48))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 41) | (b9 >> (64 - 41))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 47) | (b1 >> (64 - 47))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 28) | (b5 >> (64 - 28))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 16) | (b3 >> (64 - 16))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 25) | (b7 >> (64 - 25))) ^ b12;
+
+	b1 += k3;
+	b0 += b1 + k2;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b0;
+
+	b3 += k5;
+	b2 += b3 + k4;
+	b3 = ((b3 << 9) | (b3 >> (64 - 9))) ^ b2;
+
+	b5 += k7;
+	b4 += b5 + k6;
+	b5 = ((b5 << 37) | (b5 >> (64 - 37))) ^ b4;
+
+	b7 += k9;
+	b6 += b7 + k8;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b6;
+
+	b9 += k11;
+	b8 += b9 + k10;
+	b9 = ((b9 << 12) | (b9 >> (64 - 12))) ^ b8;
+
+	b11 += k13;
+	b10 += b11 + k12;
+	b11 = ((b11 << 47) | (b11 >> (64 - 47))) ^ b10;
+
+	b13 += k15 + t1;
+	b12 += b13 + k14;
+	b13 = ((b13 << 44) | (b13 >> (64 - 44))) ^ b12;
+
+	b15 += k0 + 19;
+	b14 += b15 + k16 + t2;
+	b15 = ((b15 << 30) | (b15 >> (64 - 30))) ^ b14;
+
+	b0 += b9;
+	b9 = ((b9 << 16) | (b9 >> (64 - 16))) ^ b0;
+
+	b2 += b13;
+	b13 = ((b13 << 34) | (b13 >> (64 - 34))) ^ b2;
+
+	b6 += b11;
+	b11 = ((b11 << 56) | (b11 >> (64 - 56))) ^ b6;
+
+	b4 += b15;
+	b15 = ((b15 << 51) | (b15 >> (64 - 51))) ^ b4;
+
+	b10 += b7;
+	b7 = ((b7 << 4) | (b7 >> (64 - 4))) ^ b10;
+
+	b12 += b3;
+	b3 = ((b3 << 53) | (b3 >> (64 - 53))) ^ b12;
+
+	b14 += b5;
+	b5 = ((b5 << 42) | (b5 >> (64 - 42))) ^ b14;
+
+	b8 += b1;
+	b1 = ((b1 << 41) | (b1 >> (64 - 41))) ^ b8;
+
+	b0 += b7;
+	b7 = ((b7 << 31) | (b7 >> (64 - 31))) ^ b0;
+
+	b2 += b5;
+	b5 = ((b5 << 44) | (b5 >> (64 - 44))) ^ b2;
+
+	b4 += b3;
+	b3 = ((b3 << 47) | (b3 >> (64 - 47))) ^ b4;
+
+	b6 += b1;
+	b1 = ((b1 << 46) | (b1 >> (64 - 46))) ^ b6;
+
+	b12 += b15;
+	b15 = ((b15 << 19) | (b15 >> (64 - 19))) ^ b12;
+
+	b14 += b13;
+	b13 = ((b13 << 42) | (b13 >> (64 - 42))) ^ b14;
+
+	b8 += b11;
+	b11 = ((b11 << 44) | (b11 >> (64 - 44))) ^ b8;
+
+	b10 += b9;
+	b9 = ((b9 << 25) | (b9 >> (64 - 25))) ^ b10;
+
+	b0 += b15;
+	b15 = ((b15 << 9) | (b15 >> (64 - 9))) ^ b0;
+
+	b2 += b11;
+	b11 = ((b11 << 48) | (b11 >> (64 - 48))) ^ b2;
+
+	b6 += b13;
+	b13 = ((b13 << 35) | (b13 >> (64 - 35))) ^ b6;
+
+	b4 += b9;
+	b9 = ((b9 << 52) | (b9 >> (64 - 52))) ^ b4;
+
+	b14 += b1;
+	b1 = ((b1 << 23) | (b1 >> (64 - 23))) ^ b14;
+
+	b8 += b5;
+	b5 = ((b5 << 31) | (b5 >> (64 - 31))) ^ b8;
+
+	b10 += b3;
+	b3 = ((b3 << 37) | (b3 >> (64 - 37))) ^ b10;
+
+	b12 += b7;
+	b7 = ((b7 << 20) | (b7 >> (64 - 20))) ^ b12;
+
+	output[0] = b0 + k3;
+	output[1] = b1 + k4;
+	output[2] = b2 + k5;
+	output[3] = b3 + k6;
+	output[4] = b4 + k7;
+	output[5] = b5 + k8;
+	output[6] = b6 + k9;
+	output[7] = b7 + k10;
+	output[8] = b8 + k11;
+	output[9] = b9 + k12;
+	output[10] = b10 + k13;
+	output[11] = b11 + k14;
+	output[12] = b12 + k15;
+	output[13] = b13 + k16 + t2;
+	output[14] = b14 + k0 + t0;
+	output[15] = b15 + k1 + 20;
+}
+
+void threefish_decrypt_1024(struct threefish_key *key_ctx, u64 *input,
+			    u64 *output)
+{
+	u64 b0 = input[0], b1 = input[1],
+	    b2 = input[2], b3 = input[3],
+	    b4 = input[4], b5 = input[5],
+	    b6 = input[6], b7 = input[7],
+	    b8 = input[8], b9 = input[9],
+	    b10 = input[10], b11 = input[11],
+	    b12 = input[12], b13 = input[13],
+	    b14 = input[14], b15 = input[15];
+	u64 k0 = key_ctx->key[0], k1 = key_ctx->key[1],
+	    k2 = key_ctx->key[2], k3 = key_ctx->key[3],
+	    k4 = key_ctx->key[4], k5 = key_ctx->key[5],
+	    k6 = key_ctx->key[6], k7 = key_ctx->key[7],
+	    k8 = key_ctx->key[8], k9 = key_ctx->key[9],
+	    k10 = key_ctx->key[10], k11 = key_ctx->key[11],
+	    k12 = key_ctx->key[12], k13 = key_ctx->key[13],
+	    k14 = key_ctx->key[14], k15 = key_ctx->key[15],
+	    k16 = key_ctx->key[16];
+	u64 t0 = key_ctx->tweak[0], t1 = key_ctx->tweak[1],
+	    t2 = key_ctx->tweak[2];
+	u64 tmp;
+
+	b0 -= k3;
+	b1 -= k4;
+	b2 -= k5;
+	b3 -= k6;
+	b4 -= k7;
+	b5 -= k8;
+	b6 -= k9;
+	b7 -= k10;
+	b8 -= k11;
+	b9 -= k12;
+	b10 -= k13;
+	b11 -= k14;
+	b12 -= k15;
+	b13 -= k16 + t2;
+	b14 -= k0 + t0;
+	b15 -= k1 + 20;
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k16 + t2;
+	b15 -= k0 + 19;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k14;
+	b13 -= k15 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k12;
+	b11 -= k13;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k10;
+	b9 -= k11;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k8;
+	b7 -= k9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k6;
+	b5 -= k7;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k15 + t1;
+	b15 -= k16 + 18;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k13;
+	b13 -= k14 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k11;
+	b11 -= k12;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k9;
+	b9 -= k10;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k7;
+	b7 -= k8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k5;
+	b5 -= k6;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k14 + t0;
+	b15 -= k15 + 17;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k12;
+	b13 -= k13 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k10;
+	b11 -= k11;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k8;
+	b9 -= k9;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k6;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k4;
+	b5 -= k5;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k0;
+	b1 -= k1;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k13 + t2;
+	b15 -= k14 + 16;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k11;
+	b13 -= k12 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k9;
+	b11 -= k10;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k7;
+	b9 -= k8;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k5;
+	b7 -= k6;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k3;
+	b5 -= k4;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k1;
+	b3 -= k2;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k16;
+	b1 -= k0;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k12 + t1;
+	b15 -= k13 + 15;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k10;
+	b13 -= k11 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k8;
+	b11 -= k9;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k6;
+	b9 -= k7;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k4;
+	b7 -= k5;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k2;
+	b5 -= k3;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k0;
+	b3 -= k1;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k15;
+	b1 -= k16;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k11 + t0;
+	b15 -= k12 + 14;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k9;
+	b13 -= k10 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k7;
+	b11 -= k8;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k5;
+	b9 -= k6;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k3;
+	b7 -= k4;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k1;
+	b5 -= k2;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k16;
+	b3 -= k0;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k14;
+	b1 -= k15;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k10 + t2;
+	b15 -= k11 + 13;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k8;
+	b13 -= k9 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k6;
+	b11 -= k7;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k4;
+	b9 -= k5;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k2;
+	b7 -= k3;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k0;
+	b5 -= k1;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k15;
+	b3 -= k16;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k13;
+	b1 -= k14;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k9 + t1;
+	b15 -= k10 + 12;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k7;
+	b13 -= k8 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k5;
+	b11 -= k6;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k3;
+	b9 -= k4;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k1;
+	b7 -= k2;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k16;
+	b5 -= k0;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k14;
+	b3 -= k15;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k12;
+	b1 -= k13;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k8 + t0;
+	b15 -= k9 + 11;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k6;
+	b13 -= k7 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k4;
+	b11 -= k5;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k2;
+	b9 -= k3;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k0;
+	b7 -= k1;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k15;
+	b5 -= k16;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k13;
+	b3 -= k14;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k11;
+	b1 -= k12;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k7 + t2;
+	b15 -= k8 + 10;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k5;
+	b13 -= k6 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k3;
+	b11 -= k4;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k1;
+	b9 -= k2;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k16;
+	b7 -= k0;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k14;
+	b5 -= k15;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k12;
+	b3 -= k13;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k10;
+	b1 -= k11;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k6 + t1;
+	b15 -= k7 + 9;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k4;
+	b13 -= k5 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k2;
+	b11 -= k3;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k0;
+	b9 -= k1;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k15;
+	b7 -= k16;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k13;
+	b5 -= k14;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k11;
+	b3 -= k12;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k9;
+	b1 -= k10;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k5 + t0;
+	b15 -= k6 + 8;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k3;
+	b13 -= k4 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k1;
+	b11 -= k2;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k16;
+	b9 -= k0;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k14;
+	b7 -= k15;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k12;
+	b5 -= k13;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k10;
+	b3 -= k11;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k8;
+	b1 -= k9;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k4 + t2;
+	b15 -= k5 + 7;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k2;
+	b13 -= k3 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k0;
+	b11 -= k1;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k15;
+	b9 -= k16;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k13;
+	b7 -= k14;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k11;
+	b5 -= k12;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k9;
+	b3 -= k10;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k7;
+	b1 -= k8;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k3 + t1;
+	b15 -= k4 + 6;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k1;
+	b13 -= k2 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k16;
+	b11 -= k0;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k14;
+	b9 -= k15;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k12;
+	b7 -= k13;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k10;
+	b5 -= k11;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k8;
+	b3 -= k9;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k6;
+	b1 -= k7;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k2 + t0;
+	b15 -= k3 + 5;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k0;
+	b13 -= k1 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k15;
+	b11 -= k16;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k13;
+	b9 -= k14;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k11;
+	b7 -= k12;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k9;
+	b5 -= k10;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k7;
+	b3 -= k8;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k5;
+	b1 -= k6;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k1 + t2;
+	b15 -= k2 + 4;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k16;
+	b13 -= k0 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k14;
+	b11 -= k15;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k12;
+	b9 -= k13;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k10;
+	b7 -= k11;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k8;
+	b5 -= k9;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k6;
+	b3 -= k7;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k4;
+	b1 -= k5;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k0 + t1;
+	b15 -= k1 + 3;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k15;
+	b13 -= k16 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k13;
+	b11 -= k14;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k11;
+	b9 -= k12;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k9;
+	b7 -= k10;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k7;
+	b5 -= k8;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k5;
+	b3 -= k6;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k3;
+	b1 -= k4;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k16 + t0;
+	b15 -= k0 + 2;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k14;
+	b13 -= k15 + t2;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k12;
+	b11 -= k13;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k10;
+	b9 -= k11;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k8;
+	b7 -= k9;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k6;
+	b5 -= k7;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k4;
+	b3 -= k5;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k2;
+	b1 -= k3;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 20) | (tmp << (64 - 20));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 37) | (tmp << (64 - 37));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 31) | (tmp << (64 - 31));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 52) | (tmp << (64 - 52));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 35) | (tmp << (64 - 35));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 48) | (tmp << (64 - 48));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 9) | (tmp << (64 - 9));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 25) | (tmp << (64 - 25));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 44) | (tmp << (64 - 44));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 19) | (tmp << (64 - 19));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 46) | (tmp << (64 - 46));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 47) | (tmp << (64 - 47));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 44) | (tmp << (64 - 44));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 42) | (tmp << (64 - 42));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 53) | (tmp << (64 - 53));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 4) | (tmp << (64 - 4));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 56) | (tmp << (64 - 56));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 34) | (tmp << (64 - 34));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 16) | (tmp << (64 - 16));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 30) | (tmp << (64 - 30));
+	b14 -= b15 + k15 + t2;
+	b15 -= k16 + 1;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 44) | (tmp << (64 - 44));
+	b12 -= b13 + k13;
+	b13 -= k14 + t1;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 47) | (tmp << (64 - 47));
+	b10 -= b11 + k11;
+	b11 -= k12;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 12) | (tmp << (64 - 12));
+	b8 -= b9 + k9;
+	b9 -= k10;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 31) | (tmp << (64 - 31));
+	b6 -= b7 + k7;
+	b7 -= k8;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 37) | (tmp << (64 - 37));
+	b4 -= b5 + k5;
+	b5 -= k6;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 9) | (tmp << (64 - 9));
+	b2 -= b3 + k3;
+	b3 -= k4;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 41) | (tmp << (64 - 41));
+	b0 -= b1 + k1;
+	b1 -= k2;
+
+	tmp = b7 ^ b12;
+	b7 = (tmp >> 25) | (tmp << (64 - 25));
+	b12 -= b7;
+
+	tmp = b3 ^ b10;
+	b3 = (tmp >> 16) | (tmp << (64 - 16));
+	b10 -= b3;
+
+	tmp = b5 ^ b8;
+	b5 = (tmp >> 28) | (tmp << (64 - 28));
+	b8 -= b5;
+
+	tmp = b1 ^ b14;
+	b1 = (tmp >> 47) | (tmp << (64 - 47));
+	b14 -= b1;
+
+	tmp = b9 ^ b4;
+	b9 = (tmp >> 41) | (tmp << (64 - 41));
+	b4 -= b9;
+
+	tmp = b13 ^ b6;
+	b13 = (tmp >> 48) | (tmp << (64 - 48));
+	b6 -= b13;
+
+	tmp = b11 ^ b2;
+	b11 = (tmp >> 20) | (tmp << (64 - 20));
+	b2 -= b11;
+
+	tmp = b15 ^ b0;
+	b15 = (tmp >> 5) | (tmp << (64 - 5));
+	b0 -= b15;
+
+	tmp = b9 ^ b10;
+	b9 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b9;
+
+	tmp = b11 ^ b8;
+	b11 = (tmp >> 59) | (tmp << (64 - 59));
+	b8 -= b11;
+
+	tmp = b13 ^ b14;
+	b13 = (tmp >> 41) | (tmp << (64 - 41));
+	b14 -= b13;
+
+	tmp = b15 ^ b12;
+	b15 = (tmp >> 34) | (tmp << (64 - 34));
+	b12 -= b15;
+
+	tmp = b1 ^ b6;
+	b1 = (tmp >> 13) | (tmp << (64 - 13));
+	b6 -= b1;
+
+	tmp = b3 ^ b4;
+	b3 = (tmp >> 51) | (tmp << (64 - 51));
+	b4 -= b3;
+
+	tmp = b5 ^ b2;
+	b5 = (tmp >> 4) | (tmp << (64 - 4));
+	b2 -= b5;
+
+	tmp = b7 ^ b0;
+	b7 = (tmp >> 33) | (tmp << (64 - 33));
+	b0 -= b7;
+
+	tmp = b1 ^ b8;
+	b1 = (tmp >> 52) | (tmp << (64 - 52));
+	b8 -= b1;
+
+	tmp = b5 ^ b14;
+	b5 = (tmp >> 23) | (tmp << (64 - 23));
+	b14 -= b5;
+
+	tmp = b3 ^ b12;
+	b3 = (tmp >> 18) | (tmp << (64 - 18));
+	b12 -= b3;
+
+	tmp = b7 ^ b10;
+	b7 = (tmp >> 49) | (tmp << (64 - 49));
+	b10 -= b7;
+
+	tmp = b15 ^ b4;
+	b15 = (tmp >> 55) | (tmp << (64 - 55));
+	b4 -= b15;
+
+	tmp = b11 ^ b6;
+	b11 = (tmp >> 10) | (tmp << (64 - 10));
+	b6 -= b11;
+
+	tmp = b13 ^ b2;
+	b13 = (tmp >> 19) | (tmp << (64 - 19));
+	b2 -= b13;
+
+	tmp = b9 ^ b0;
+	b9 = (tmp >> 38) | (tmp << (64 - 38));
+	b0 -= b9;
+
+	tmp = b15 ^ b14;
+	b15 = (tmp >> 37) | (tmp << (64 - 37));
+	b14 -= b15 + k14 + t1;
+	b15 -= k15;
+
+	tmp = b13 ^ b12;
+	b13 = (tmp >> 22) | (tmp << (64 - 22));
+	b12 -= b13 + k12;
+	b13 -= k13 + t0;
+
+	tmp = b11 ^ b10;
+	b11 = (tmp >> 17) | (tmp << (64 - 17));
+	b10 -= b11 + k10;
+	b11 -= k11;
+
+	tmp = b9 ^ b8;
+	b9 = (tmp >> 8) | (tmp << (64 - 8));
+	b8 -= b9 + k8;
+	b9 -= k9;
+
+	tmp = b7 ^ b6;
+	b7 = (tmp >> 47) | (tmp << (64 - 47));
+	b6 -= b7 + k6;
+	b7 -= k7;
+
+	tmp = b5 ^ b4;
+	b5 = (tmp >> 8) | (tmp << (64 - 8));
+	b4 -= b5 + k4;
+	b5 -= k5;
+
+	tmp = b3 ^ b2;
+	b3 = (tmp >> 13) | (tmp << (64 - 13));
+	b2 -= b3 + k2;
+	b3 -= k3;
+
+	tmp = b1 ^ b0;
+	b1 = (tmp >> 24) | (tmp << (64 - 24));
+	b0 -= b1 + k0;
+	b1 -= k1;
+
+	output[15] = b15;
+	output[14] = b14;
+	output[13] = b13;
+	output[12] = b12;
+	output[11] = b11;
+	output[10] = b10;
+	output[9] = b9;
+	output[8] = b8;
+	output[7] = b7;
+	output[6] = b6;
+	output[5] = b5;
+	output[4] = b4;
+	output[3] = b3;
+	output[2] = b2;
+	output[1] = b1;
+	output[0] = b0;
+}
-- 
1.9.0



-- 
Jake Edge - LWN - jake@lwn.net - http://lwn.net

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] staging/skein: comment typos
  2014-05-20 13:56 [PATCH 0/3] staging/skein: more cleanup Jake Edge
  2014-05-20 13:58 ` [PATCH 1/3] staging/skein: move all threefish block functions to one file Jake Edge
@ 2014-05-20 14:00 ` Jake Edge
  2014-05-20 14:02 ` [PATCH 3/3] staging/skein: variable/member name cleanup Jake Edge
  2014-05-20 14:47 ` [PATCH 0/3] staging/skein: more cleanup Jason Cooper
  3 siblings, 0 replies; 11+ messages in thread
From: Jake Edge @ 2014-05-20 14:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jason Cooper, devel, linux-kernel, Joe Perches, Dan Carpenter,
	Anton Saraev

fix some comment typos

Signed-off-by: Jake Edge <jake@lwn.net>
---

against staging-next branch of staging tree

 drivers/staging/skein/threefish_api.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/skein/threefish_api.h b/drivers/staging/skein/threefish_api.h
index 2fce154..8d5ddf8 100644
--- a/drivers/staging/skein/threefish_api.h
+++ b/drivers/staging/skein/threefish_api.h
@@ -12,7 +12,7 @@
  * follow the openSSL design but at the same time take care of some Threefish
  * specific behaviour and possibilities.
  *
- * These are the low level functions that deal with Threefisch blocks only.
+ * These are the low level functions that deal with Threefish blocks only.
  * Implementations for cipher modes such as ECB, CFB, or CBC may use these
  * functions.
  *
@@ -77,9 +77,9 @@ void threefish_set_key(struct threefish_key *key_ctx,
 		       u64 *key_data, u64 *tweak);
 
 /**
- * Encrypt Threefisch block (bytes).
+ * Encrypt Threefish block (bytes).
  *
- * The buffer must have at least the same length (number of bits) aas the
+ * The buffer must have at least the same length (number of bits) as the
  * state size for this key. The function uses the first @c state_size bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
@@ -95,9 +95,9 @@ void threefish_encrypt_block_bytes(struct threefish_key *key_ctx, u8 *in,
 				   u8 *out);
 
 /**
- * Encrypt Threefisch block (words).
+ * Encrypt Threefish block (words).
  *
- * The buffer must have at least the same length (number of bits) aas the
+ * The buffer must have at least the same length (number of bits) as the
  * state size for this key. The function uses the first @c state_size bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
@@ -115,9 +115,9 @@ void threefish_encrypt_block_words(struct threefish_key *key_ctx, u64 *in,
 				   u64 *out);
 
 /**
- * Decrypt Threefisch block (bytes).
+ * Decrypt Threefish block (bytes).
  *
- * The buffer must have at least the same length (number of bits) aas the
+ * The buffer must have at least the same length (number of bits) as the
  * state size for this key. The function uses the first @c state_size bits
  * of the input buffer, decrypts them and stores the result in the output
  * buffer
@@ -133,9 +133,9 @@ void threefish_decrypt_block_bytes(struct threefish_key *key_ctx, u8 *in,
 				   u8 *out);
 
 /**
- * Decrypt Threefisch block (words).
+ * Decrypt Threefish block (words).
  *
- * The buffer must have at least the same length (number of bits) aas the
+ * The buffer must have at least the same length (number of bits) as the
  * state size for this key. The function uses the first @c state_size bits
  * of the input buffer, encrypts them and stores the result in the output
  * buffer.
-- 
1.9.0



-- 
Jake Edge - LWN - jake@lwn.net - http://lwn.net

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] staging/skein: variable/member name cleanup
  2014-05-20 13:56 [PATCH 0/3] staging/skein: more cleanup Jake Edge
  2014-05-20 13:58 ` [PATCH 1/3] staging/skein: move all threefish block functions to one file Jake Edge
  2014-05-20 14:00 ` [PATCH 2/3] staging/skein: comment typos Jake Edge
@ 2014-05-20 14:02 ` Jake Edge
  2014-05-20 14:47 ` [PATCH 0/3] staging/skein: more cleanup Jason Cooper
  3 siblings, 0 replies; 11+ messages in thread
From: Jake Edge @ 2014-05-20 14:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jason Cooper, devel, linux-kernel, Joe Perches, Dan Carpenter,
	Anton Saraev

Rename a few more variables and structure member names to lower case.

Signed-off-by: Jake Edge <jake@lwn.net>
---

against staging-next branch of staging tree

 drivers/staging/skein/skein.c       | 148 +++++++++++++++++-----------------
 drivers/staging/skein/skein.h       |  34 ++++----
 drivers/staging/skein/skein_api.c   |  32 ++++----
 drivers/staging/skein/skein_api.h   |   2 +-
 drivers/staging/skein/skein_block.c | 155 ++++++++++++++++++------------------
 5 files changed, 186 insertions(+), 185 deletions(-)

diff --git a/drivers/staging/skein/skein.c b/drivers/staging/skein/skein.c
index f76d585..8cc8358 100644
--- a/drivers/staging/skein/skein.c
+++ b/drivers/staging/skein/skein.c
@@ -33,16 +33,16 @@ int skein_256_init(struct skein_256_ctx *ctx, size_t hash_bit_len)
 
 	switch (hash_bit_len) { /* use pre-computed values, where available */
 	case  256:
-		memcpy(ctx->X, SKEIN_256_IV_256, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_256_IV_256, sizeof(ctx->x));
 		break;
 	case  224:
-		memcpy(ctx->X, SKEIN_256_IV_224, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_256_IV_224, sizeof(ctx->x));
 		break;
 	case  160:
-		memcpy(ctx->X, SKEIN_256_IV_160, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_256_IV_160, sizeof(ctx->x));
 		break;
 	case  128:
-		memcpy(ctx->X, SKEIN_256_IV_128, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_256_IV_128, sizeof(ctx->x));
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
@@ -63,11 +63,11 @@ int skein_256_init(struct skein_256_ctx *ctx, size_t hash_bit_len)
 
 		/* compute the initial chaining values from config block */
 		/* zero the chaining variables */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 		skein_256_process_block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
-	/* The chaining vars ctx->X are now initialized for hash_bit_len. */
+	/* The chaining vars ctx->x are now initialized for hash_bit_len. */
 	/* Set up to process the data message portion of the hash (default) */
 	skein_start_new_type(ctx, MSG);              /* T0=0, T1= MSG type */
 
@@ -89,25 +89,25 @@ int skein_256_init_ext(struct skein_256_ctx *ctx, size_t hash_bit_len,
 	skein_assert_ret(hash_bit_len > 0, SKEIN_BAD_HASHLEN);
 	skein_assert_ret(key_bytes == 0 || key != NULL, SKEIN_FAIL);
 
-	/* compute the initial chaining values ctx->X[], based on key */
+	/* compute the initial chaining values ctx->x[], based on key */
 	if (key_bytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 	} else { /* here to pre-process a key */
-		skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		skein_assert(sizeof(cfg.b) >= sizeof(ctx->x));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
-		ctx->h.hash_bit_len = 8*sizeof(ctx->X);
+		ctx->h.hash_bit_len = 8*sizeof(ctx->x);
 		/* set tweaks: T0 = 0; T1 = KEY type */
 		skein_start_new_type(ctx, KEY);
 		/* zero the initial chaining variables */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 		/* hash the key */
 		skein_256_update(ctx, key, key_bytes);
 		/* put result into cfg.b[] */
 		skein_256_final_pad(ctx, cfg.b);
-		/* copy over into ctx->X[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
+		/* copy over into ctx->x[] */
+		memcpy(ctx->x, cfg.b, sizeof(cfg.b));
 	}
 	/*
 	 * build/process the config block, type == CONFIG (could be
@@ -130,7 +130,7 @@ int skein_256_init_ext(struct skein_256_ctx *ctx, size_t hash_bit_len,
 	/* compute the initial chaining values from config block */
 	skein_256_process_block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
-	/* The chaining vars ctx->X are now initialized */
+	/* The chaining vars ctx->x are now initialized */
 	/* Set up to process the data message portion of the hash (default) */
 	skein_start_new_type(ctx, MSG);
 
@@ -197,12 +197,12 @@ int skein_256_update(struct skein_256_ctx *ctx, const u8 *msg,
 int skein_256_final(struct skein_256_ctx *ctx, u8 *hash_val)
 {
 	size_t i, n, byte_cnt;
-	u64 X[SKEIN_256_STATE_WORDS];
+	u64 x[SKEIN_256_STATE_WORDS];
 	/* catch uninitialized context */
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* tag as the final block */
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	ctx->h.tweak[1] |= SKEIN_T1_FLAG_FINAL;
 	/* zero pad b[] if necessary */
 	if (ctx->h.b_cnt < SKEIN_256_BLOCK_BYTES)
 		memset(&ctx->b[ctx->h.b_cnt], 0,
@@ -219,7 +219,7 @@ int skein_256_final(struct skein_256_ctx *ctx, u8 *hash_val)
 	/* zero out b[], so it can hold the counter */
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
-	memcpy(X, ctx->X, sizeof(X));
+	memcpy(x, ctx->x, sizeof(x));
 	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byte_cnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = skein_swap64((u64) i);
@@ -231,12 +231,12 @@ int skein_256_final(struct skein_256_ctx *ctx, u8 *hash_val)
 		if (n >= SKEIN_256_BLOCK_BYTES)
 			n  = SKEIN_256_BLOCK_BYTES;
 		/* "output" the ctr mode bytes */
-		skein_put64_lsb_first(hash_val+i*SKEIN_256_BLOCK_BYTES, ctx->X,
+		skein_put64_lsb_first(hash_val+i*SKEIN_256_BLOCK_BYTES, ctx->x,
 				      n);
 		skein_show_final(256, &ctx->h, n,
 				 hash_val+i*SKEIN_256_BLOCK_BYTES);
 		/* restore the counter mode key for next time */
-		memcpy(ctx->X, X, sizeof(X));
+		memcpy(ctx->x, x, sizeof(x));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -259,16 +259,16 @@ int skein_512_init(struct skein_512_ctx *ctx, size_t hash_bit_len)
 
 	switch (hash_bit_len) { /* use pre-computed values, where available */
 	case  512:
-		memcpy(ctx->X, SKEIN_512_IV_512, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_512_IV_512, sizeof(ctx->x));
 		break;
 	case  384:
-		memcpy(ctx->X, SKEIN_512_IV_384, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_512_IV_384, sizeof(ctx->x));
 		break;
 	case  256:
-		memcpy(ctx->X, SKEIN_512_IV_256, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_512_IV_256, sizeof(ctx->x));
 		break;
 	case  224:
-		memcpy(ctx->X, SKEIN_512_IV_224, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_512_IV_224, sizeof(ctx->x));
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
@@ -289,13 +289,13 @@ int skein_512_init(struct skein_512_ctx *ctx, size_t hash_bit_len)
 
 		/* compute the initial chaining values from config block */
 		/* zero the chaining variables */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 		skein_512_process_block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
 
 	/*
-	 * The chaining vars ctx->X are now initialized for the given
+	 * The chaining vars ctx->x are now initialized for the given
 	 * hash_bit_len.
 	 */
 	/* Set up to process the data message portion of the hash (default) */
@@ -319,25 +319,25 @@ int skein_512_init_ext(struct skein_512_ctx *ctx, size_t hash_bit_len,
 	skein_assert_ret(hash_bit_len > 0, SKEIN_BAD_HASHLEN);
 	skein_assert_ret(key_bytes == 0 || key != NULL, SKEIN_FAIL);
 
-	/* compute the initial chaining values ctx->X[], based on key */
+	/* compute the initial chaining values ctx->x[], based on key */
 	if (key_bytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 	} else { /* here to pre-process a key */
-		skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		skein_assert(sizeof(cfg.b) >= sizeof(ctx->x));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
-		ctx->h.hash_bit_len = 8*sizeof(ctx->X);
+		ctx->h.hash_bit_len = 8*sizeof(ctx->x);
 		/* set tweaks: T0 = 0; T1 = KEY type */
 		skein_start_new_type(ctx, KEY);
 		/* zero the initial chaining variables */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 		/* hash the key */
 		skein_512_update(ctx, key, key_bytes);
 		/* put result into cfg.b[] */
 		skein_512_final_pad(ctx, cfg.b);
-		/* copy over into ctx->X[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
+		/* copy over into ctx->x[] */
+		memcpy(ctx->x, cfg.b, sizeof(cfg.b));
 	}
 	/*
 	 * build/process the config block, type == CONFIG (could be
@@ -359,7 +359,7 @@ int skein_512_init_ext(struct skein_512_ctx *ctx, size_t hash_bit_len,
 	/* compute the initial chaining values from config block */
 	skein_512_process_block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
-	/* The chaining vars ctx->X are now initialized */
+	/* The chaining vars ctx->x are now initialized */
 	/* Set up to process the data message portion of the hash (default) */
 	skein_start_new_type(ctx, MSG);
 
@@ -426,12 +426,12 @@ int skein_512_update(struct skein_512_ctx *ctx, const u8 *msg,
 int skein_512_final(struct skein_512_ctx *ctx, u8 *hash_val)
 {
 	size_t i, n, byte_cnt;
-	u64 X[SKEIN_512_STATE_WORDS];
+	u64 x[SKEIN_512_STATE_WORDS];
 	/* catch uninitialized context */
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* tag as the final block */
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	ctx->h.tweak[1] |= SKEIN_T1_FLAG_FINAL;
 	/* zero pad b[] if necessary */
 	if (ctx->h.b_cnt < SKEIN_512_BLOCK_BYTES)
 		memset(&ctx->b[ctx->h.b_cnt], 0,
@@ -448,7 +448,7 @@ int skein_512_final(struct skein_512_ctx *ctx, u8 *hash_val)
 	/* zero out b[], so it can hold the counter */
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
-	memcpy(X, ctx->X, sizeof(X));
+	memcpy(x, ctx->x, sizeof(x));
 	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byte_cnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = skein_swap64((u64) i);
@@ -460,12 +460,12 @@ int skein_512_final(struct skein_512_ctx *ctx, u8 *hash_val)
 		if (n >= SKEIN_512_BLOCK_BYTES)
 			n  = SKEIN_512_BLOCK_BYTES;
 		/* "output" the ctr mode bytes */
-		skein_put64_lsb_first(hash_val+i*SKEIN_512_BLOCK_BYTES, ctx->X,
+		skein_put64_lsb_first(hash_val+i*SKEIN_512_BLOCK_BYTES, ctx->x,
 				      n);
 		skein_show_final(512, &ctx->h, n,
 				 hash_val+i*SKEIN_512_BLOCK_BYTES);
 		/* restore the counter mode key for next time */
-		memcpy(ctx->X, X, sizeof(X));
+		memcpy(ctx->x, x, sizeof(x));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -488,13 +488,13 @@ int skein_1024_init(struct skein_1024_ctx *ctx, size_t hash_bit_len)
 
 	switch (hash_bit_len) { /* use pre-computed values, where available */
 	case  512:
-		memcpy(ctx->X, SKEIN_1024_IV_512, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_1024_IV_512, sizeof(ctx->x));
 		break;
 	case  384:
-		memcpy(ctx->X, SKEIN_1024_IV_384, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_1024_IV_384, sizeof(ctx->x));
 		break;
 	case 1024:
-		memcpy(ctx->X, SKEIN_1024_IV_1024, sizeof(ctx->X));
+		memcpy(ctx->x, SKEIN_1024_IV_1024, sizeof(ctx->x));
 		break;
 	default:
 		/* here if there is no precomputed IV value available */
@@ -515,12 +515,12 @@ int skein_1024_init(struct skein_1024_ctx *ctx, size_t hash_bit_len)
 
 		/* compute the initial chaining values from config block */
 		/* zero the chaining variables */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 		skein_1024_process_block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 		break;
 	}
 
-	/* The chaining vars ctx->X are now initialized for the hash_bit_len. */
+	/* The chaining vars ctx->x are now initialized for the hash_bit_len. */
 	/* Set up to process the data message portion of the hash (default) */
 	skein_start_new_type(ctx, MSG);              /* T0=0, T1= MSG type */
 
@@ -542,25 +542,25 @@ int skein_1024_init_ext(struct skein_1024_ctx *ctx, size_t hash_bit_len,
 	skein_assert_ret(hash_bit_len > 0, SKEIN_BAD_HASHLEN);
 	skein_assert_ret(key_bytes == 0 || key != NULL, SKEIN_FAIL);
 
-	/* compute the initial chaining values ctx->X[], based on key */
+	/* compute the initial chaining values ctx->x[], based on key */
 	if (key_bytes == 0) { /* is there a key? */
 		/* no key: use all zeroes as key for config block */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 	} else { /* here to pre-process a key */
-		skein_assert(sizeof(cfg.b) >= sizeof(ctx->X));
+		skein_assert(sizeof(cfg.b) >= sizeof(ctx->x));
 		/* do a mini-Init right here */
 		/* set output hash bit count = state size */
-		ctx->h.hash_bit_len = 8*sizeof(ctx->X);
+		ctx->h.hash_bit_len = 8*sizeof(ctx->x);
 		/* set tweaks: T0 = 0; T1 = KEY type */
 		skein_start_new_type(ctx, KEY);
 		/* zero the initial chaining variables */
-		memset(ctx->X, 0, sizeof(ctx->X));
+		memset(ctx->x, 0, sizeof(ctx->x));
 		/* hash the key */
 		skein_1024_update(ctx, key, key_bytes);
 		/* put result into cfg.b[] */
 		skein_1024_final_pad(ctx, cfg.b);
-		/* copy over into ctx->X[] */
-		memcpy(ctx->X, cfg.b, sizeof(cfg.b));
+		/* copy over into ctx->x[] */
+		memcpy(ctx->x, cfg.b, sizeof(cfg.b));
 	}
 	/*
 	 * build/process the config block, type == CONFIG (could be
@@ -583,7 +583,7 @@ int skein_1024_init_ext(struct skein_1024_ctx *ctx, size_t hash_bit_len,
 	/* compute the initial chaining values from config block */
 	skein_1024_process_block(ctx, cfg.b, 1, SKEIN_CFG_STR_LEN);
 
-	/* The chaining vars ctx->X are now initialized */
+	/* The chaining vars ctx->x are now initialized */
 	/* Set up to process the data message portion of the hash (default) */
 	skein_start_new_type(ctx, MSG);
 
@@ -650,12 +650,12 @@ int skein_1024_update(struct skein_1024_ctx *ctx, const u8 *msg,
 int skein_1024_final(struct skein_1024_ctx *ctx, u8 *hash_val)
 {
 	size_t i, n, byte_cnt;
-	u64 X[SKEIN_1024_STATE_WORDS];
+	u64 x[SKEIN_1024_STATE_WORDS];
 	/* catch uninitialized context */
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_1024_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* tag as the final block */
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	ctx->h.tweak[1] |= SKEIN_T1_FLAG_FINAL;
 	/* zero pad b[] if necessary */
 	if (ctx->h.b_cnt < SKEIN_1024_BLOCK_BYTES)
 		memset(&ctx->b[ctx->h.b_cnt], 0,
@@ -672,7 +672,7 @@ int skein_1024_final(struct skein_1024_ctx *ctx, u8 *hash_val)
 	/* zero out b[], so it can hold the counter */
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
-	memcpy(X, ctx->X, sizeof(X));
+	memcpy(x, ctx->x, sizeof(x));
 	for (i = 0; i*SKEIN_1024_BLOCK_BYTES < byte_cnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = skein_swap64((u64) i);
@@ -684,12 +684,12 @@ int skein_1024_final(struct skein_1024_ctx *ctx, u8 *hash_val)
 		if (n >= SKEIN_1024_BLOCK_BYTES)
 			n  = SKEIN_1024_BLOCK_BYTES;
 		/* "output" the ctr mode bytes */
-		skein_put64_lsb_first(hash_val+i*SKEIN_1024_BLOCK_BYTES, ctx->X,
+		skein_put64_lsb_first(hash_val+i*SKEIN_1024_BLOCK_BYTES, ctx->x,
 				      n);
 		skein_show_final(1024, &ctx->h, n,
 				 hash_val+i*SKEIN_1024_BLOCK_BYTES);
 		/* restore the counter mode key for next time */
-		memcpy(ctx->X, X, sizeof(X));
+		memcpy(ctx->x, x, sizeof(x));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -705,7 +705,7 @@ int skein_256_final_pad(struct skein_256_ctx *ctx, u8 *hash_val)
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* tag as the final block */
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	ctx->h.tweak[1] |= SKEIN_T1_FLAG_FINAL;
 	/* zero pad b[] if necessary */
 	if (ctx->h.b_cnt < SKEIN_256_BLOCK_BYTES)
 		memset(&ctx->b[ctx->h.b_cnt], 0,
@@ -714,7 +714,7 @@ int skein_256_final_pad(struct skein_256_ctx *ctx, u8 *hash_val)
 	skein_256_process_block(ctx, ctx->b, 1, ctx->h.b_cnt);
 
 	/* "output" the state bytes */
-	skein_put64_lsb_first(hash_val, ctx->X, SKEIN_256_BLOCK_BYTES);
+	skein_put64_lsb_first(hash_val, ctx->x, SKEIN_256_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -727,7 +727,7 @@ int skein_512_final_pad(struct skein_512_ctx *ctx, u8 *hash_val)
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* tag as the final block */
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	ctx->h.tweak[1] |= SKEIN_T1_FLAG_FINAL;
 	/* zero pad b[] if necessary */
 	if (ctx->h.b_cnt < SKEIN_512_BLOCK_BYTES)
 		memset(&ctx->b[ctx->h.b_cnt], 0,
@@ -736,7 +736,7 @@ int skein_512_final_pad(struct skein_512_ctx *ctx, u8 *hash_val)
 	skein_512_process_block(ctx, ctx->b, 1, ctx->h.b_cnt);
 
 	/* "output" the state bytes */
-	skein_put64_lsb_first(hash_val, ctx->X, SKEIN_512_BLOCK_BYTES);
+	skein_put64_lsb_first(hash_val, ctx->x, SKEIN_512_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -749,7 +749,7 @@ int skein_1024_final_pad(struct skein_1024_ctx *ctx, u8 *hash_val)
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_1024_BLOCK_BYTES, SKEIN_FAIL);
 
 	/* tag as the final block */
-	ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;
+	ctx->h.tweak[1] |= SKEIN_T1_FLAG_FINAL;
 	/* zero pad b[] if necessary */
 	if (ctx->h.b_cnt < SKEIN_1024_BLOCK_BYTES)
 		memset(&ctx->b[ctx->h.b_cnt], 0,
@@ -758,7 +758,7 @@ int skein_1024_final_pad(struct skein_1024_ctx *ctx, u8 *hash_val)
 	skein_1024_process_block(ctx, ctx->b, 1, ctx->h.b_cnt);
 
 	/* "output" the state bytes */
-	skein_put64_lsb_first(hash_val, ctx->X, SKEIN_1024_BLOCK_BYTES);
+	skein_put64_lsb_first(hash_val, ctx->x, SKEIN_1024_BLOCK_BYTES);
 
 	return SKEIN_SUCCESS;
 }
@@ -769,7 +769,7 @@ int skein_1024_final_pad(struct skein_1024_ctx *ctx, u8 *hash_val)
 int skein_256_output(struct skein_256_ctx *ctx, u8 *hash_val)
 {
 	size_t i, n, byte_cnt;
-	u64 X[SKEIN_256_STATE_WORDS];
+	u64 x[SKEIN_256_STATE_WORDS];
 	/* catch uninitialized context */
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_256_BLOCK_BYTES, SKEIN_FAIL);
 
@@ -781,7 +781,7 @@ int skein_256_output(struct skein_256_ctx *ctx, u8 *hash_val)
 	/* zero out b[], so it can hold the counter */
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
-	memcpy(X, ctx->X, sizeof(X));
+	memcpy(x, ctx->x, sizeof(x));
 	for (i = 0; i*SKEIN_256_BLOCK_BYTES < byte_cnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = skein_swap64((u64) i);
@@ -793,12 +793,12 @@ int skein_256_output(struct skein_256_ctx *ctx, u8 *hash_val)
 		if (n >= SKEIN_256_BLOCK_BYTES)
 			n  = SKEIN_256_BLOCK_BYTES;
 		/* "output" the ctr mode bytes */
-		skein_put64_lsb_first(hash_val+i*SKEIN_256_BLOCK_BYTES, ctx->X,
+		skein_put64_lsb_first(hash_val+i*SKEIN_256_BLOCK_BYTES, ctx->x,
 				      n);
 		skein_show_final(256, &ctx->h, n,
 				 hash_val+i*SKEIN_256_BLOCK_BYTES);
 		/* restore the counter mode key for next time */
-		memcpy(ctx->X, X, sizeof(X));
+		memcpy(ctx->x, x, sizeof(x));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -808,7 +808,7 @@ int skein_256_output(struct skein_256_ctx *ctx, u8 *hash_val)
 int skein_512_output(struct skein_512_ctx *ctx, u8 *hash_val)
 {
 	size_t i, n, byte_cnt;
-	u64 X[SKEIN_512_STATE_WORDS];
+	u64 x[SKEIN_512_STATE_WORDS];
 	/* catch uninitialized context */
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_512_BLOCK_BYTES, SKEIN_FAIL);
 
@@ -820,7 +820,7 @@ int skein_512_output(struct skein_512_ctx *ctx, u8 *hash_val)
 	/* zero out b[], so it can hold the counter */
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
-	memcpy(X, ctx->X, sizeof(X));
+	memcpy(x, ctx->x, sizeof(x));
 	for (i = 0; i*SKEIN_512_BLOCK_BYTES < byte_cnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = skein_swap64((u64) i);
@@ -832,12 +832,12 @@ int skein_512_output(struct skein_512_ctx *ctx, u8 *hash_val)
 		if (n >= SKEIN_512_BLOCK_BYTES)
 			n  = SKEIN_512_BLOCK_BYTES;
 		/* "output" the ctr mode bytes */
-		skein_put64_lsb_first(hash_val+i*SKEIN_512_BLOCK_BYTES, ctx->X,
+		skein_put64_lsb_first(hash_val+i*SKEIN_512_BLOCK_BYTES, ctx->x,
 				      n);
 		skein_show_final(256, &ctx->h, n,
 				 hash_val+i*SKEIN_512_BLOCK_BYTES);
 		/* restore the counter mode key for next time */
-		memcpy(ctx->X, X, sizeof(X));
+		memcpy(ctx->x, x, sizeof(x));
 	}
 	return SKEIN_SUCCESS;
 }
@@ -847,7 +847,7 @@ int skein_512_output(struct skein_512_ctx *ctx, u8 *hash_val)
 int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val)
 {
 	size_t i, n, byte_cnt;
-	u64 X[SKEIN_1024_STATE_WORDS];
+	u64 x[SKEIN_1024_STATE_WORDS];
 	/* catch uninitialized context */
 	skein_assert_ret(ctx->h.b_cnt <= SKEIN_1024_BLOCK_BYTES, SKEIN_FAIL);
 
@@ -859,7 +859,7 @@ int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val)
 	/* zero out b[], so it can hold the counter */
 	memset(ctx->b, 0, sizeof(ctx->b));
 	/* keep a local copy of counter mode "key" */
-	memcpy(X, ctx->X, sizeof(X));
+	memcpy(x, ctx->x, sizeof(x));
 	for (i = 0; i*SKEIN_1024_BLOCK_BYTES < byte_cnt; i++) {
 		/* build the counter block */
 		((u64 *)ctx->b)[0] = skein_swap64((u64) i);
@@ -871,12 +871,12 @@ int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val)
 		if (n >= SKEIN_1024_BLOCK_BYTES)
 			n  = SKEIN_1024_BLOCK_BYTES;
 		/* "output" the ctr mode bytes */
-		skein_put64_lsb_first(hash_val+i*SKEIN_1024_BLOCK_BYTES, ctx->X,
+		skein_put64_lsb_first(hash_val+i*SKEIN_1024_BLOCK_BYTES, ctx->x,
 				      n);
 		skein_show_final(256, &ctx->h, n,
 				 hash_val+i*SKEIN_1024_BLOCK_BYTES);
 		/* restore the counter mode key for next time */
-		memcpy(ctx->X, X, sizeof(X));
+		memcpy(ctx->x, x, sizeof(x));
 	}
 	return SKEIN_SUCCESS;
 }
diff --git a/drivers/staging/skein/skein.h b/drivers/staging/skein/skein.h
index 2c87ff7..e6669f1 100644
--- a/drivers/staging/skein/skein.h
+++ b/drivers/staging/skein/skein.h
@@ -66,24 +66,24 @@ enum {
 struct skein_ctx_hdr {
 	size_t hash_bit_len;		/* size of hash result, in bits */
 	size_t b_cnt;			/* current byte count in buffer b[] */
-	u64 T[SKEIN_MODIFIER_WORDS];	/* tweak: T[0]=byte cnt, T[1]=flags */
+	u64 tweak[SKEIN_MODIFIER_WORDS]; /* tweak[0]=byte cnt, tweak[1]=flags */
 };
 
 struct skein_256_ctx { /* 256-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
-	u64 X[SKEIN_256_STATE_WORDS];	/* chaining variables */
+	u64 x[SKEIN_256_STATE_WORDS];	/* chaining variables */
 	u8 b[SKEIN_256_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 };
 
 struct skein_512_ctx { /* 512-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
-	u64 X[SKEIN_512_STATE_WORDS];	/* chaining variables */
+	u64 x[SKEIN_512_STATE_WORDS];	/* chaining variables */
 	u8 b[SKEIN_512_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 };
 
 struct skein_1024_ctx { /* 1024-bit Skein hash context structure */
 	struct skein_ctx_hdr h;		/* common header context variables */
-	u64 X[SKEIN_1024_STATE_WORDS];	/* chaining variables */
+	u64 x[SKEIN_1024_STATE_WORDS];	/* chaining variables */
 	u8 b[SKEIN_1024_BLOCK_BYTES];	/* partial block buf (8-byte aligned) */
 };
 
@@ -150,7 +150,7 @@ int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val);
 **           reference and optimized code.
 ******************************************************************/
 
-/* tweak word T[1]: bit field starting positions */
+/* tweak word tweak[1]: bit field starting positions */
 #define SKEIN_T1_BIT(BIT)       ((BIT) - 64)      /* second word  */
 
 #define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112) /* 112..118 hash tree level */
@@ -159,16 +159,16 @@ int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val);
 #define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126) /* 126      first blk flag */
 #define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127) /* 127      final blk flag */
 
-/* tweak word T[1]: flag bit definition(s) */
+/* tweak word tweak[1]: flag bit definition(s) */
 #define SKEIN_T1_FLAG_FIRST     (((u64)  1) << SKEIN_T1_POS_FIRST)
 #define SKEIN_T1_FLAG_FINAL     (((u64)  1) << SKEIN_T1_POS_FINAL)
 #define SKEIN_T1_FLAG_BIT_PAD   (((u64)  1) << SKEIN_T1_POS_BIT_PAD)
 
-/* tweak word T[1]: tree level bit field mask */
+/* tweak word tweak[1]: tree level bit field mask */
 #define SKEIN_T1_TREE_LVL_MASK  (((u64)0x7F) << SKEIN_T1_POS_TREE_LVL)
 #define SKEIN_T1_TREE_LEVEL(n)  (((u64) (n)) << SKEIN_T1_POS_TREE_LVL)
 
-/* tweak word T[1]: block type field */
+/* tweak word tweak[1]: block type field */
 #define SKEIN_BLK_TYPE_KEY       (0) /* key, for MAC and KDF */
 #define SKEIN_BLK_TYPE_CFG       (4) /* configuration block */
 #define SKEIN_BLK_TYPE_PERS      (8) /* personalization string */
@@ -232,9 +232,9 @@ int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val);
 **   Skein macros for getting/setting tweak words, etc.
 **   These are useful for partial input bytes, hash tree init/update, etc.
 **/
-#define skein_get_tweak(ctx_ptr, TWK_NUM)          ((ctx_ptr)->h.T[TWK_NUM])
+#define skein_get_tweak(ctx_ptr, TWK_NUM)          ((ctx_ptr)->h.tweak[TWK_NUM])
 #define skein_set_tweak(ctx_ptr, TWK_NUM, t_val) { \
-		(ctx_ptr)->h.T[TWK_NUM] = (t_val); \
+		(ctx_ptr)->h.tweak[TWK_NUM] = (t_val); \
 	}
 
 #define skein_get_T0(ctx_ptr)     skein_get_tweak(ctx_ptr, 0)
@@ -254,7 +254,7 @@ int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val);
 
 /*
  * setup for starting with a new type:
- * h.T[0]=0; h.T[1] = NEW_TYPE; h.b_cnt=0;
+ * h.tweak[0]=0; h.tweak[1] = NEW_TYPE; h.b_cnt=0;
  */
 #define skein_start_new_type(ctx_ptr, BLK_TYPE) { \
 		skein_set_T0_T1(ctx_ptr, 0, SKEIN_T1_FLAG_FIRST | \
@@ -263,14 +263,14 @@ int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val);
 	}
 
 #define skein_clear_first_flag(hdr) { \
-		(hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST; \
+		(hdr).tweak[1] &= ~SKEIN_T1_FLAG_FIRST; \
 	}
 #define skein_set_bit_pad_flag(hdr) { \
-		(hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD; \
+		(hdr).tweak[1] |=  SKEIN_T1_FLAG_BIT_PAD; \
 	}
 
 #define skein_set_tree_level(hdr, height) { \
-		(hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height); \
+		(hdr).tweak[1] |= SKEIN_T1_TREE_LEVEL(height); \
 	}
 
 /*****************************************************************
@@ -279,9 +279,9 @@ int skein_1024_output(struct skein_1024_ctx *ctx, u8 *hash_val);
 #ifdef SKEIN_DEBUG             /* examine/display intermediate values? */
 #include "skein_debug.h"
 #else                           /* default is no callouts */
-#define skein_show_block(bits, ctx, X, blk_ptr, w_ptr, ks_event_ptr, ks_odd_ptr)
-#define skein_show_round(bits, ctx, r, X)
-#define skein_show_r_ptr(bits, ctx, r, X_ptr)
+#define skein_show_block(bits, ctx, x, blk_ptr, w_ptr, ks_event_ptr, ks_odd_ptr)
+#define skein_show_round(bits, ctx, r, x)
+#define skein_show_r_ptr(bits, ctx, r, x_ptr)
 #define skein_show_final(bits, ctx, cnt, out_ptr)
 #define skein_show_key(bits, ctx, key, key_bytes)
 #endif
diff --git a/drivers/staging/skein/skein_api.c b/drivers/staging/skein/skein_api.c
index eaf7af4..6e700ee 100644
--- a/drivers/staging/skein/skein_api.c
+++ b/drivers/staging/skein/skein_api.c
@@ -40,8 +40,8 @@ int skein_ctx_prepare(struct skein_ctx *ctx, enum skein_size size)
 int skein_init(struct skein_ctx *ctx, size_t hash_bit_len)
 {
 	int ret = SKEIN_FAIL;
-	size_t X_len = 0;
-	u64 *X = NULL;
+	size_t x_len = 0;
+	u64 *x = NULL;
 	u64 tree_info = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
 	skein_assert_ret(ctx, SKEIN_FAIL);
@@ -50,8 +50,8 @@ int skein_init(struct skein_ctx *ctx, size_t hash_bit_len)
 	 * contexts are a union in out context and thus have tha maximum
 	 * memory available.  The beauty of C :-) .
 	 */
-	X = ctx->m.s256.X;
-	X_len = ctx->skein_size/8;
+	x = ctx->m.s256.x;
+	x_len = ctx->skein_size/8;
 	/*
 	 * If size is the same and hash bit length is zero then reuse
 	 * the save chaining variables.
@@ -76,7 +76,7 @@ int skein_init(struct skein_ctx *ctx, size_t hash_bit_len)
 		 * Save chaining variables for this combination of size and
 		 * hash_bit_len
 		 */
-		memcpy(ctx->X_save, X, X_len);
+		memcpy(ctx->x_save, x, x_len);
 	}
 	return ret;
 }
@@ -85,14 +85,14 @@ int skein_mac_init(struct skein_ctx *ctx, const u8 *key, size_t key_len,
 		   size_t hash_bit_len)
 {
 	int ret = SKEIN_FAIL;
-	u64 *X = NULL;
-	size_t X_len = 0;
+	u64 *x = NULL;
+	size_t x_len = 0;
 	u64 tree_info = SKEIN_CFG_TREE_INFO_SEQUENTIAL;
 
 	skein_assert_ret(ctx, SKEIN_FAIL);
 
-	X = ctx->m.s256.X;
-	X_len = ctx->skein_size/8;
+	x = ctx->m.s256.x;
+	x_len = ctx->skein_size/8;
 
 	skein_assert_ret(hash_bit_len, SKEIN_BAD_HASHLEN);
 
@@ -120,25 +120,25 @@ int skein_mac_init(struct skein_ctx *ctx, const u8 *key, size_t key_len,
 		 * Save chaining variables for this combination of key,
 		 * key_len, hash_bit_len
 		 */
-		memcpy(ctx->X_save, X, X_len);
+		memcpy(ctx->x_save, x, x_len);
 	}
 	return ret;
 }
 
 void skein_reset(struct skein_ctx *ctx)
 {
-	size_t X_len = 0;
-	u64 *X = NULL;
+	size_t x_len = 0;
+	u64 *x = NULL;
 
 	/*
 	 * The following two lines rely of the fact that the real Skein
 	 * contexts are a union in out context and thus have tha maximum
 	 * memory available.  The beautiy of C :-) .
 	 */
-	X = ctx->m.s256.X;
-	X_len = ctx->skein_size/8;
+	x = ctx->m.s256.x;
+	x_len = ctx->skein_size/8;
 	/* Restore the chaing variable, reset byte counter */
-	memcpy(X, ctx->X_save, X_len);
+	memcpy(x, ctx->x_save, x_len);
 
 	/* Setup context to process the message */
 	skein_start_new_type(&ctx->m, MSG);
@@ -200,7 +200,7 @@ int skein_update_bits(struct skein_ctx *ctx, const u8 *msg,
 	 * Skein's real partial block buffer.
 	 * If this layout ever changes we have to adapt this as well.
 	 */
-	up = (u8 *)ctx->m.s256.X + ctx->skein_size / 8;
+	up = (u8 *)ctx->m.s256.x + ctx->skein_size / 8;
 
 	/* set tweak flag for the skein_final call */
 	skein_set_bit_pad_flag(ctx->m.h);
diff --git a/drivers/staging/skein/skein_api.h b/drivers/staging/skein/skein_api.h
index db808ae..e02fa19 100644
--- a/drivers/staging/skein/skein_api.h
+++ b/drivers/staging/skein/skein_api.h
@@ -100,7 +100,7 @@ enum skein_size {
  */
 struct skein_ctx {
 	u64 skein_size;
-	u64 X_save[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
+	u64 x_save[SKEIN_MAX_STATE_WORDS];   /* save area for state variables */
 	union {
 		struct skein_ctx_hdr h;
 		struct skein_256_ctx s256;
diff --git a/drivers/staging/skein/skein_block.c b/drivers/staging/skein/skein_block.c
index 76c4113e..f49eb2e 100644
--- a/drivers/staging/skein/skein_block.c
+++ b/drivers/staging/skein/skein_block.c
@@ -32,7 +32,8 @@
 #define ts              (kw + KW_TWK_BASE)
 
 #ifdef SKEIN_DEBUG
-#define debug_save_tweak(ctx) { ctx->h.T[0] = ts[0]; ctx->h.T[1] = ts[1]; }
+#define debug_save_tweak(ctx) { \
+                        ctx->h.tweak[0] = ts[0]; ctx->h.tweak[1] = ts[1]; }
 #else
 #define debug_save_tweak(ctx)
 #endif
@@ -71,8 +72,8 @@ void skein_256_process_block(struct skein_256_ctx *ctx, const u8 *blk_ptr,
 	X_ptr[0] = &X0;  X_ptr[1] = &X1;  X_ptr[2] = &X2;  X_ptr[3] = &X3;
 #endif
 	skein_assert(blk_cnt != 0); /* never call with blk_cnt == 0! */
-	ts[0] = ctx->h.T[0];
-	ts[1] = ctx->h.T[1];
+	ts[0] = ctx->h.tweak[0];
+	ts[1] = ctx->h.tweak[1];
 	do  {
 		/*
 		 * this implementation only supports 2**64 input bytes
@@ -81,10 +82,10 @@ void skein_256_process_block(struct skein_256_ctx *ctx, const u8 *blk_ptr,
 		ts[0] += byte_cnt_add; /* update processed length */
 
 		/* precompute the key schedule for this block */
-		ks[0] = ctx->X[0];
-		ks[1] = ctx->X[1];
-		ks[2] = ctx->X[2];
-		ks[3] = ctx->X[3];
+		ks[0] = ctx->x[0];
+		ks[1] = ctx->x[1];
+		ks[2] = ctx->x[2];
+		ks[3] = ctx->x[3];
 		ks[4] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^ SKEIN_KS_PARITY;
 
 		ts[2] = ts[0] ^ ts[1];
@@ -92,7 +93,7 @@ void skein_256_process_block(struct skein_256_ctx *ctx, const u8 *blk_ptr,
 		/* get input block in little-endian format */
 		skein_get64_lsb_first(w, blk_ptr, WCNT);
 		debug_save_tweak(ctx);
-		skein_show_block(BLK_BITS, &ctx->h, ctx->X, blk_ptr, w, ks, ts);
+		skein_show_block(BLK_BITS, &ctx->h, ctx->x, blk_ptr, w, ks, ts);
 
 		X0 = w[0] + ks[0]; /* do the first full key injection */
 		X1 = w[1] + ks[1] + ts[0];
@@ -101,7 +102,7 @@ void skein_256_process_block(struct skein_256_ctx *ctx, const u8 *blk_ptr,
 
 		/* show starting state values */
 		skein_show_r_ptr(BLK_BITS, &ctx->h, SKEIN_RND_KEY_INITIAL,
-				 X_ptr);
+				 x_ptr);
 
 		blk_ptr += SKEIN_256_BLOCK_BYTES;
 
@@ -220,17 +221,17 @@ do { \
 	#endif
 		}
 		/* do the final "feedforward" xor, update context chaining */
-		ctx->X[0] = X0 ^ w[0];
-		ctx->X[1] = X1 ^ w[1];
-		ctx->X[2] = X2 ^ w[2];
-		ctx->X[3] = X3 ^ w[3];
+		ctx->x[0] = X0 ^ w[0];
+		ctx->x[1] = X1 ^ w[1];
+		ctx->x[2] = X2 ^ w[2];
+		ctx->x[3] = X3 ^ w[3];
 
-		skein_show_round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+		skein_show_round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->x);
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
 	} while (--blk_cnt);
-	ctx->h.T[0] = ts[0];
-	ctx->h.T[1] = ts[1];
+	ctx->h.tweak[0] = ts[0];
+	ctx->h.tweak[1] = ts[1];
 }
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
@@ -282,8 +283,8 @@ void skein_512_process_block(struct skein_512_ctx *ctx, const u8 *blk_ptr,
 #endif
 
 	skein_assert(blk_cnt != 0); /* never call with blk_cnt == 0! */
-	ts[0] = ctx->h.T[0];
-	ts[1] = ctx->h.T[1];
+	ts[0] = ctx->h.tweak[0];
+	ts[1] = ctx->h.tweak[1];
 	do  {
 		/*
 		 * this implementation only supports 2**64 input bytes
@@ -292,14 +293,14 @@ void skein_512_process_block(struct skein_512_ctx *ctx, const u8 *blk_ptr,
 		ts[0] += byte_cnt_add; /* update processed length */
 
 		/* precompute the key schedule for this block */
-		ks[0] = ctx->X[0];
-		ks[1] = ctx->X[1];
-		ks[2] = ctx->X[2];
-		ks[3] = ctx->X[3];
-		ks[4] = ctx->X[4];
-		ks[5] = ctx->X[5];
-		ks[6] = ctx->X[6];
-		ks[7] = ctx->X[7];
+		ks[0] = ctx->x[0];
+		ks[1] = ctx->x[1];
+		ks[2] = ctx->x[2];
+		ks[3] = ctx->x[3];
+		ks[4] = ctx->x[4];
+		ks[5] = ctx->x[5];
+		ks[6] = ctx->x[6];
+		ks[7] = ctx->x[7];
 		ks[8] = ks[0] ^ ks[1] ^ ks[2] ^ ks[3] ^
 			ks[4] ^ ks[5] ^ ks[6] ^ ks[7] ^ SKEIN_KS_PARITY;
 
@@ -308,7 +309,7 @@ void skein_512_process_block(struct skein_512_ctx *ctx, const u8 *blk_ptr,
 		/* get input block in little-endian format */
 		skein_get64_lsb_first(w, blk_ptr, WCNT);
 		debug_save_tweak(ctx);
-		skein_show_block(BLK_BITS, &ctx->h, ctx->X, blk_ptr, w, ks, ts);
+		skein_show_block(BLK_BITS, &ctx->h, ctx->x, blk_ptr, w, ks, ts);
 
 		X0   = w[0] + ks[0]; /* do the first full key injection */
 		X1   = w[1] + ks[1];
@@ -448,20 +449,20 @@ do { \
 		}
 
 		/* do the final "feedforward" xor, update context chaining */
-		ctx->X[0] = X0 ^ w[0];
-		ctx->X[1] = X1 ^ w[1];
-		ctx->X[2] = X2 ^ w[2];
-		ctx->X[3] = X3 ^ w[3];
-		ctx->X[4] = X4 ^ w[4];
-		ctx->X[5] = X5 ^ w[5];
-		ctx->X[6] = X6 ^ w[6];
-		ctx->X[7] = X7 ^ w[7];
-		skein_show_round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+		ctx->x[0] = X0 ^ w[0];
+		ctx->x[1] = X1 ^ w[1];
+		ctx->x[2] = X2 ^ w[2];
+		ctx->x[3] = X3 ^ w[3];
+		ctx->x[4] = X4 ^ w[4];
+		ctx->x[5] = X5 ^ w[5];
+		ctx->x[6] = X6 ^ w[6];
+		ctx->x[7] = X7 ^ w[7];
+		skein_show_round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->x);
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
 	} while (--blk_cnt);
-	ctx->h.T[0] = ts[0];
-	ctx->h.T[1] = ts[1];
+	ctx->h.tweak[0] = ts[0];
+	ctx->h.tweak[1] = ts[1];
 }
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
@@ -520,8 +521,8 @@ void skein_1024_process_block(struct skein_1024_ctx *ctx, const u8 *blk_ptr,
 #endif
 
 	skein_assert(blk_cnt != 0); /* never call with blk_cnt == 0! */
-	ts[0] = ctx->h.T[0];
-	ts[1] = ctx->h.T[1];
+	ts[0] = ctx->h.tweak[0];
+	ts[1] = ctx->h.tweak[1];
 	do  {
 		/*
 		 * this implementation only supports 2**64 input bytes
@@ -530,22 +531,22 @@ void skein_1024_process_block(struct skein_1024_ctx *ctx, const u8 *blk_ptr,
 		ts[0] += byte_cnt_add; /* update processed length */
 
 		/* precompute the key schedule for this block */
-		ks[0]  = ctx->X[0];
-		ks[1]  = ctx->X[1];
-		ks[2]  = ctx->X[2];
-		ks[3]  = ctx->X[3];
-		ks[4]  = ctx->X[4];
-		ks[5]  = ctx->X[5];
-		ks[6]  = ctx->X[6];
-		ks[7]  = ctx->X[7];
-		ks[8]  = ctx->X[8];
-		ks[9]  = ctx->X[9];
-		ks[10] = ctx->X[10];
-		ks[11] = ctx->X[11];
-		ks[12] = ctx->X[12];
-		ks[13] = ctx->X[13];
-		ks[14] = ctx->X[14];
-		ks[15] = ctx->X[15];
+		ks[0]  = ctx->x[0];
+		ks[1]  = ctx->x[1];
+		ks[2]  = ctx->x[2];
+		ks[3]  = ctx->x[3];
+		ks[4]  = ctx->x[4];
+		ks[5]  = ctx->x[5];
+		ks[6]  = ctx->x[6];
+		ks[7]  = ctx->x[7];
+		ks[8]  = ctx->x[8];
+		ks[9]  = ctx->x[9];
+		ks[10] = ctx->x[10];
+		ks[11] = ctx->x[11];
+		ks[12] = ctx->x[12];
+		ks[13] = ctx->x[13];
+		ks[14] = ctx->x[14];
+		ks[15] = ctx->x[15];
 		ks[16] =  ks[0] ^  ks[1] ^  ks[2] ^  ks[3] ^
 			  ks[4] ^  ks[5] ^  ks[6] ^  ks[7] ^
 			  ks[8] ^  ks[9] ^ ks[10] ^ ks[11] ^
@@ -556,7 +557,7 @@ void skein_1024_process_block(struct skein_1024_ctx *ctx, const u8 *blk_ptr,
 		/* get input block in little-endian format */
 		skein_get64_lsb_first(w, blk_ptr, WCNT);
 		debug_save_tweak(ctx);
-		skein_show_block(BLK_BITS, &ctx->h, ctx->X, blk_ptr, w, ks, ts);
+		skein_show_block(BLK_BITS, &ctx->h, ctx->x, blk_ptr, w, ks, ts);
 
 		X00    =  w[0] +  ks[0]; /* do the first full key injection */
 		X01    =  w[1] +  ks[1];
@@ -735,30 +736,30 @@ do { \
 		}
 		/* do the final "feedforward" xor, update context chaining */
 
-		ctx->X[0] = X00 ^ w[0];
-		ctx->X[1] = X01 ^ w[1];
-		ctx->X[2] = X02 ^ w[2];
-		ctx->X[3] = X03 ^ w[3];
-		ctx->X[4] = X04 ^ w[4];
-		ctx->X[5] = X05 ^ w[5];
-		ctx->X[6] = X06 ^ w[6];
-		ctx->X[7] = X07 ^ w[7];
-		ctx->X[8] = X08 ^ w[8];
-		ctx->X[9] = X09 ^ w[9];
-		ctx->X[10] = X10 ^ w[10];
-		ctx->X[11] = X11 ^ w[11];
-		ctx->X[12] = X12 ^ w[12];
-		ctx->X[13] = X13 ^ w[13];
-		ctx->X[14] = X14 ^ w[14];
-		ctx->X[15] = X15 ^ w[15];
-
-		skein_show_round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->X);
+		ctx->x[0] = X00 ^ w[0];
+		ctx->x[1] = X01 ^ w[1];
+		ctx->x[2] = X02 ^ w[2];
+		ctx->x[3] = X03 ^ w[3];
+		ctx->x[4] = X04 ^ w[4];
+		ctx->x[5] = X05 ^ w[5];
+		ctx->x[6] = X06 ^ w[6];
+		ctx->x[7] = X07 ^ w[7];
+		ctx->x[8] = X08 ^ w[8];
+		ctx->x[9] = X09 ^ w[9];
+		ctx->x[10] = X10 ^ w[10];
+		ctx->x[11] = X11 ^ w[11];
+		ctx->x[12] = X12 ^ w[12];
+		ctx->x[13] = X13 ^ w[13];
+		ctx->x[14] = X14 ^ w[14];
+		ctx->x[15] = X15 ^ w[15];
+
+		skein_show_round(BLK_BITS, &ctx->h, SKEIN_RND_FEED_FWD, ctx->x);
 
 		ts[1] &= ~SKEIN_T1_FLAG_FIRST;
 		blk_ptr += SKEIN_1024_BLOCK_BYTES;
 	} while (--blk_cnt);
-	ctx->h.T[0] = ts[0];
-	ctx->h.T[1] = ts[1];
+	ctx->h.tweak[0] = ts[0];
+	ctx->h.tweak[1] = ts[1];
 }
 
 #if defined(SKEIN_CODE_SIZE) || defined(SKEIN_PERF)
-- 
1.9.0



-- 
Jake Edge - LWN - jake@lwn.net - http://lwn.net

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] staging/skein: more cleanup
  2014-05-20 13:56 [PATCH 0/3] staging/skein: more cleanup Jake Edge
                   ` (2 preceding siblings ...)
  2014-05-20 14:02 ` [PATCH 3/3] staging/skein: variable/member name cleanup Jake Edge
@ 2014-05-20 14:47 ` Jason Cooper
  2014-05-20 16:24   ` Jake Edge
  3 siblings, 1 reply; 11+ messages in thread
From: Jason Cooper @ 2014-05-20 14:47 UTC (permalink / raw)
  To: Jake Edge
  Cc: Greg Kroah-Hartman, devel, linux-kernel, Joe Perches,
	Dan Carpenter, Anton Saraev

Jake,

On Tue, May 20, 2014 at 07:56:12AM -0600, Jake Edge wrote:
> 
> Clean up a few more things in skein to get it closer to mainline
> inclusion.  The first may be questionable (so I probably should have
> put it last -- oh well, I can always respin), but it seemed like
> putting all of the threefish block functions in one file, like the
> skein block functions are all in one file, made sense.

Fine by me.  For the whole series:

Acked-by: Jason Cooper <jason@lakedaemon.net>

Thanks for the help!

Do you have any other series pending for this driver?  I suspect Anton
does and I'd like to deconflict the two efforts.

thx,

Jason.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] staging/skein: more cleanup
  2014-05-20 14:47 ` [PATCH 0/3] staging/skein: more cleanup Jason Cooper
@ 2014-05-20 16:24   ` Jake Edge
  2014-05-20 17:47     ` Jason Cooper
  2014-05-20 21:52     ` Anton Saraev
  0 siblings, 2 replies; 11+ messages in thread
From: Jake Edge @ 2014-05-20 16:24 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Greg Kroah-Hartman, devel, linux-kernel, Joe Perches,
	Dan Carpenter, Anton Saraev

On Tue, 20 May 2014 10:47:57 -0400 Jason Cooper wrote:

> Do you have any other series pending for this driver?

No and I won't be doing anything else for the next couple of days --
some darn weekly edition to deal with :)

It seems like most of the straightforward stuff has been dealt with at
this point.  That rats nest of ifdefs in skein_block.c needs attention,
but some kind of tests are needed to ensure nothing breaks before
digging into that ...

jake

-- 
Jake Edge - LWN - jake@lwn.net - http://lwn.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] staging/skein: more cleanup
  2014-05-20 16:24   ` Jake Edge
@ 2014-05-20 17:47     ` Jason Cooper
  2014-05-20 21:52     ` Anton Saraev
  1 sibling, 0 replies; 11+ messages in thread
From: Jason Cooper @ 2014-05-20 17:47 UTC (permalink / raw)
  To: Jake Edge
  Cc: Greg Kroah-Hartman, devel, linux-kernel, Joe Perches,
	Dan Carpenter, Anton Saraev

On Tue, May 20, 2014 at 10:24:11AM -0600, Jake Edge wrote:
> On Tue, 20 May 2014 10:47:57 -0400 Jason Cooper wrote:
> 
> > Do you have any other series pending for this driver?
> 
> No and I won't be doing anything else for the next couple of days --
> some darn weekly edition to deal with :)

:)

> It seems like most of the straightforward stuff has been dealt with at
> this point.  That rats nest of ifdefs in skein_block.c needs attention,
> but some kind of tests are needed to ensure nothing breaks before
> digging into that ...

Something like objdiff [1]?  It landed in v3.15-rc1, so you can use it
from staging/staging-next.

I'd also like to consolidate all the macros that are declared in the
middle of functions and such.  After that, test vectors and crypto API
integration.

thx,

Jason.

[1] https://git.kernel.org/cgit/linux/kernel/git/gregkh/staging.git/commit/scripts/objdiff?id=79192ca8ebd9a25c583aa46024a250fef1e7766f

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] staging/skein: more cleanup
  2014-05-20 16:24   ` Jake Edge
  2014-05-20 17:47     ` Jason Cooper
@ 2014-05-20 21:52     ` Anton Saraev
  2014-05-22 16:52       ` Jake Edge
  1 sibling, 1 reply; 11+ messages in thread
From: Anton Saraev @ 2014-05-20 21:52 UTC (permalink / raw)
  To: Jake Edge
  Cc: Jason Cooper, Greg Kroah-Hartman, devel, linux-kernel,
	Joe Perches, Dan Carpenter

On Tue, May 20, 2014 at 10:24:11AM -0600, Jake Edge wrote:
> On Tue, 20 May 2014 10:47:57 -0400 Jason Cooper wrote:
>
> but some kind of tests are needed to ensure nothing breaks before
> digging into that ...

I have some test: slightly modified version of tests from
https://github.com/wernerd/Skein3Fish. It works as debugfs entry
and require some modification needed for module support
("public" function must be extern). As I can understand Jason
has his own tests. That would be logical to share them but
I don't know where.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] staging/skein: more cleanup
  2014-05-20 21:52     ` Anton Saraev
@ 2014-05-22 16:52       ` Jake Edge
  2014-05-22 17:04         ` Jason Cooper
  0 siblings, 1 reply; 11+ messages in thread
From: Jake Edge @ 2014-05-22 16:52 UTC (permalink / raw)
  To: Anton Saraev
  Cc: Jason Cooper, Greg Kroah-Hartman, devel, linux-kernel,
	Joe Perches, Dan Carpenter

On Wed, 21 May 2014 01:52:17 +0400 Anton Saraev wrote:
> On Tue, May 20, 2014 at 10:24:11AM -0600, Jake Edge wrote:
> > On Tue, 20 May 2014 10:47:57 -0400 Jason Cooper wrote:
> >
> > but some kind of tests are needed to ensure nothing breaks before
> > digging into that ...
> 
> I have some test: slightly modified version of tests from
> https://github.com/wernerd/Skein3Fish. It works as debugfs entry
> and require some modification needed for module support
> ("public" function must be extern). As I can understand Jason
> has his own tests. That would be logical to share them but
> I don't know where.

well, it seems to me that we want tests that eventually can be added
into the crypto test framework once skein moves out of staging and into
crypto ... Jason's objdiff seems like it would be used as another
development tool ...

so do you have your tests anywhere that we can look at them?  or,
failing that, maybe you can just email me a copy off-list or something?

do you have patches pending for skein?  i might get a chance to hack on
this some over the weekend, and we may as well try to avoid duplicating
each other's efforts ...

jake

-- 
Jake Edge - LWN - jake@lwn.net - http://lwn.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] staging/skein: more cleanup
  2014-05-22 16:52       ` Jake Edge
@ 2014-05-22 17:04         ` Jason Cooper
  2014-05-24 14:50           ` Anton Saraev
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Cooper @ 2014-05-22 17:04 UTC (permalink / raw)
  To: Jake Edge
  Cc: Anton Saraev, Greg Kroah-Hartman, devel, linux-kernel,
	Joe Perches, Dan Carpenter

On Thu, May 22, 2014 at 10:52:06AM -0600, Jake Edge wrote:
> On Wed, 21 May 2014 01:52:17 +0400 Anton Saraev wrote:
> > On Tue, May 20, 2014 at 10:24:11AM -0600, Jake Edge wrote:
> > > On Tue, 20 May 2014 10:47:57 -0400 Jason Cooper wrote:
> > >
> > > but some kind of tests are needed to ensure nothing breaks before
> > > digging into that ...
> > 
> > I have some test: slightly modified version of tests from
> > https://github.com/wernerd/Skein3Fish. It works as debugfs entry
> > and require some modification needed for module support
> > ("public" function must be extern). As I can understand Jason
> > has his own tests. That would be logical to share them but
> > I don't know where.
> 
> well, it seems to me that we want tests that eventually can be added
> into the crypto test framework once skein moves out of staging and into
> crypto ... Jason's objdiff seems like it would be used as another
> development tool ...
> 
> so do you have your tests anywhere that we can look at them?  or,
> failing that, maybe you can just email me a copy off-list or something?

https://github.com/wernerd/Skein3Fish/tree/master/c/test

This is the same repo I pulled the original source files from.  I looked
at integrating into the crypto test framework a few months back and
realized the crypto API needed to be modified to handle tweakable block
ciphers.

> do you have patches pending for skein?  i might get a chance to hack on
> this some over the weekend, and we may as well try to avoid duplicating
> each other's efforts ...

I do not, I'm unsure if Anton has anything pending capable of
upstreaming.

thx,

Jason.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] staging/skein: more cleanup
  2014-05-22 17:04         ` Jason Cooper
@ 2014-05-24 14:50           ` Anton Saraev
  0 siblings, 0 replies; 11+ messages in thread
From: Anton Saraev @ 2014-05-24 14:50 UTC (permalink / raw)
  To: Jason Cooper
  Cc: Jake Edge, Greg Kroah-Hartman, devel, linux-kernel, Joe Perches,
	Dan Carpenter

On Thu, May 22, 2014 at 01:04:41PM -0400, Jason Cooper wrote:
> I do not, I'm unsure if Anton has anything pending capable of
> upstreaming.

I have two simple patches. But I don't know is it necessary to send
this patches. First is export of functions in *api.c files.
But it is trivial part of api cleanup. Second is fixing of remaining
uppercase X in skein_block.c. But this variables lie on SKEIN_DEBUG
compile path and maybe will be completely removed (or we need add
some output in the appropriate functions).

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-05-24 10:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-20 13:56 [PATCH 0/3] staging/skein: more cleanup Jake Edge
2014-05-20 13:58 ` [PATCH 1/3] staging/skein: move all threefish block functions to one file Jake Edge
2014-05-20 14:00 ` [PATCH 2/3] staging/skein: comment typos Jake Edge
2014-05-20 14:02 ` [PATCH 3/3] staging/skein: variable/member name cleanup Jake Edge
2014-05-20 14:47 ` [PATCH 0/3] staging/skein: more cleanup Jason Cooper
2014-05-20 16:24   ` Jake Edge
2014-05-20 17:47     ` Jason Cooper
2014-05-20 21:52     ` Anton Saraev
2014-05-22 16:52       ` Jake Edge
2014-05-22 17:04         ` Jason Cooper
2014-05-24 14:50           ` Anton Saraev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).